zhang-tao-whu / DVIS

DVIS: Decoupled Video Instance Segmentation Framework
MIT License
127 stars 6 forks source link

单卡gpu 不支持推理吗 #13

Closed danyow-cheung closed 11 months ago

danyow-cheung commented 1 year ago

如题,输入命令 python train_net_video.py --num-gpus 1 --config-file configs/ovis/DVIS_Offline_R50.yaml --eval-only MODEL.WEIGHTS checkpoints/DVIS_offline_ovis_r50.pth 返回

[09/04 15:40:30 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from checkpoints/DVIS_offline_ovis_r50.pth ...
[09/04 15:40:30 fvcore.common.checkpoint]: [Checkpointer] Loading from checkpoints/DVIS_offline_ovis_r50.pth ...
[09/04 15:40:32 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[09/04 15:40:32 d2.data.common]: Serializing 140 elements to byte tensors and concatenating them all ...
[09/04 15:40:32 d2.data.common]: Serialized dataset takes 0.42 MiB
COCO Evaluator instantiated using config, this is deprecated behavior. Please pass in explicit arguments instead.
[09/04 15:40:32 d2.evaluation.evaluator]: Start inference on 140 batches
/home/hs/AIGC/DVIS_ENV/lib/python3.10/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2228.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
已杀死
zhang-tao-whu commented 1 year ago

There is no problem with single GPU inference. I believe it is highly likely that the CPU memory is insufficient, leading to an interrupt and forced termination. The OVIS dataset contains many videos with hundreds of frames, and DVIS processes all frames in testing before converting the results from the highly memory-consuming mask format to RLE format. Therefore, there is a high memory requirement.

danyow-cheung commented 1 year ago

image image any solution

zhang-tao-whu commented 1 year ago

The test pipeline needs to be modified to support inference clip by clip. You can refer to demo_long_video.py..

danyow-cheung commented 1 year ago

I had tried this resolution but all failed

On my edited code I changed something like this image

and then i encountered an error

Traceback (most recent call last):
  File "/home/hs/AIGC/DVIS-main/demo_video/demo_long_video.py", line 133, in <module>
    predictions, visualized_output = demo.run_on_video(vid_frames, keep=False)
  File "/home/hs/AIGC/DVIS-main/demo_video/predictor.py", line 217, in run_on_video
    vis_output = visualizer.draw_instance_predictions(predictions=ins, ids=pred_ids)
  File "/home/hs/AIGC/DVIS-main/demo_video/visualizer.py", line 92, in draw_instance_predictions
    masks = [GenericMask(x, self.output.height, self.output.width) for x in masks]
  File "/home/hs/AIGC/DVIS-main/demo_video/visualizer.py", line 92, in <listcomp>
    masks = [GenericMask(x, self.output.height, self.output.width) for x in masks]
  File "/home/hs/AIGC/detectron2/detectron2/utils/visualizer.py", line 90, in __init__
    assert m.shape == (
AssertionError: mask shape: (3, 2160, 3840), target dims: 2160, 3840

would you like to share the details about the predictions

danyow-cheung commented 1 year ago

and my pytorch version is 1.11

zhang-tao-whu commented 1 year ago

Please refer to lines 829-836 in meta_architecture.py, where predictions refers to the prediction results directly returned by the network.

predictions = {
            "image_size": (output_height, output_width),
            "pred_scores": out_scores,  # is a list, length is n_obj, i.e., [obj1_score,... , obj_n_score]
            "pred_labels": out_labels,  # is a list, length is n_obj, i.e., [obj1_label,... , obj_n_label]
            "pred_masks": out_masks,  # is a list, length is n_obj, i.e., [torch.Tensor(n_frames, H, W),... , torch.Tensor(n_frames, H, W)]
            "pred_ids": out_ids,  # is a list, length is n_obj, i.e., [obj1_id,... , obj_n_id]
            "task": "vis",
        }

You can also refer to the function _get_objects_from_outputs (line 21) in the file predictor.py to understand the meaning of information in the predictions.

zhang-tao-whu commented 1 year ago

If you only need to obtain predictions for a portion of the video, I recommend directly extracting the prediction results from demo_long_video.py and storing them locally. This way, you will not need to modify the code extensively.

danyow-cheung commented 1 year ago

got the one frame prediction information , thx anyway

zhang-tao-whu commented 1 year ago

For an object, the entire video has only one score and one category. However, please note that the size of the mask is (T, H, W).