Closed danyow-cheung closed 11 months ago
There is no problem with single GPU inference. I believe it is highly likely that the CPU memory is insufficient, leading to an interrupt and forced termination. The OVIS dataset contains many videos with hundreds of frames, and DVIS processes all frames in testing before converting the results from the highly memory-consuming mask format to RLE format. Therefore, there is a high memory requirement.
any solution
The test pipeline needs to be modified to support inference clip by clip. You can refer to demo_long_video.py..
I had tried this resolution but all failed
On my edited code I changed something like this
and then i encountered an error
Traceback (most recent call last):
File "/home/hs/AIGC/DVIS-main/demo_video/demo_long_video.py", line 133, in <module>
predictions, visualized_output = demo.run_on_video(vid_frames, keep=False)
File "/home/hs/AIGC/DVIS-main/demo_video/predictor.py", line 217, in run_on_video
vis_output = visualizer.draw_instance_predictions(predictions=ins, ids=pred_ids)
File "/home/hs/AIGC/DVIS-main/demo_video/visualizer.py", line 92, in draw_instance_predictions
masks = [GenericMask(x, self.output.height, self.output.width) for x in masks]
File "/home/hs/AIGC/DVIS-main/demo_video/visualizer.py", line 92, in <listcomp>
masks = [GenericMask(x, self.output.height, self.output.width) for x in masks]
File "/home/hs/AIGC/detectron2/detectron2/utils/visualizer.py", line 90, in __init__
assert m.shape == (
AssertionError: mask shape: (3, 2160, 3840), target dims: 2160, 3840
would you like to share the details about the predictions
and my pytorch version is 1.11
Please refer to lines 829-836 in meta_architecture.py, where predictions
refers to the prediction results directly returned by the network.
predictions = {
"image_size": (output_height, output_width),
"pred_scores": out_scores, # is a list, length is n_obj, i.e., [obj1_score,... , obj_n_score]
"pred_labels": out_labels, # is a list, length is n_obj, i.e., [obj1_label,... , obj_n_label]
"pred_masks": out_masks, # is a list, length is n_obj, i.e., [torch.Tensor(n_frames, H, W),... , torch.Tensor(n_frames, H, W)]
"pred_ids": out_ids, # is a list, length is n_obj, i.e., [obj1_id,... , obj_n_id]
"task": "vis",
}
You can also refer to the function _get_objects_from_outputs
(line 21) in the file predictor.py to understand the meaning of information in the predictions
.
If you only need to obtain predictions for a portion of the video, I recommend directly extracting the prediction results from demo_long_video.py and storing them locally. This way, you will not need to modify the code extensively.
got the one frame prediction information , thx anyway
For an object, the entire video has only one score and one category. However, please note that the size of the mask is (T, H, W).
如题,输入命令
python train_net_video.py --num-gpus 1 --config-file configs/ovis/DVIS_Offline_R50.yaml --eval-only MODEL.WEIGHTS checkpoints/DVIS_offline_ovis_r50.pth
返回