janghyuncho / DECOLA

Code release for "Language-conditioned Detection Transformer"
https://arxiv.org/abs/2311.17902
79 stars 4 forks source link

IndexError: list index out of range #10

Open hilaryliang opened 3 months ago

hilaryliang commented 3 months ago

Thank you for your great work!

I want to predict a video or a list of images using demo.py.

command: python demo.py --config-file configs/DECOLA_PHASE1_L_CLIP_SwinB_4x.yaml --video-input ./test_videos/output2.mp4 --output ./test_videos/output/result.mkv --vocabulary custom --custom_vocabulary sea\ urchin --confidence-threshold 0.3 --language-condition --opts MODEL.WEIGHTS weights/DECOLA_PHASE1_L_CLIP_SwinB_4x.pth

output:

[04/23 19:55:00 detectron2]: Arguments: Namespace(c2=False, confidence_threshold=0.3, config_file='configs/DECOLA_PHASE1_L_CLIP_SwinB_4x.yaml', cpu=False, custom_vocabulary='sea urchin', input=None, language_condition=True, opts=['MODEL.WEIGHTS', 'weights/DECOLA_PHASE1_L_CLIP_SwinB_4x.pth'], output='./test_videos/output/result.mkv', pred_all_class=False, sam_checkpoint='weights/sam/sam_vit_h_4b8939.pth', use_sam=False, video_input='./test_videos/output2.mp4', vocabulary='custom', webcam=None) Loading pretrained CLIP /homes/hilary/anaconda3/envs/decola2/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2895.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] [04/23 19:55:13 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from weights/DECOLA_PHASE1_L_CLIP_SwinB_4x.pth ... [04/23 19:55:13 fvcore.common.checkpoint]: [Checkpointer] Loading from weights/DECOLA_PHASE1_L_CLIP_SwinB_4x.pth ... custom weight normalized. (shape: torch.Size([2, 512])) [ERROR:0@17.975] global cap_ffmpeg_impl.hpp:3130 open Could not find encoder for codec_id=27, error: Encoder not found [ERROR:0@17.975] global cap_ffmpeg_impl.hpp:3208 open VIDEOIO/FFMPEG: Failed to initialize VideoWriter [ERROR:0@17.976] global cap.cpp:643 open VIDEOIO(CV_IMAGES): raised OpenCV exception:

OpenCV(4.9.0) /io/opencv/modules/videoio/src/cap_images.cpp:430: error: (-215:Assertion failed) !filename_pattern.empty() in function 'open'

0%| | 0/221 [00:00<?, ?it/s] Traceback (most recent call last): File "demo.py", line 240, in for vis_frame in tqdm.tqdm(demo.run_on_video(video), total=num_frames): File "/homes/hilary/anaconda3/envs/decola2/lib/python3.8/site-packages/tqdm/std.py", line 1181, in iter for obj in iterable: File "/homes/hilary/marinedet/sota_ovd/DECOLA/decola/predictor.py", line 190, in run_on_video yield process_predictions(frame, self.predictor(frame)) File "/homes/hilary/anaconda3/envs/decola2/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 319, in call predictions = self.model([inputs])[0] IndexError: list index out of range

I use the same way to install the env for Detic and DECOLA. also use similar commands. Detic works, but DECOLA does not work. Is there anything else I haven't noticed?

looking forward to your help!

janghyuncho commented 2 months ago

Hello, unfortunately, we don't support video. The video support that we have is from the Detic codebase (DECOLA demo code is based on the one from Detic). It won't be hard to make it support so I will take a look in the future, but cannot promise anytime soon.

hilaryliang commented 2 weeks ago

Hi, I found not only for video, but a single image has the same error. So I believe there is a potential bug somewhere, hope you can help to take a look.

when the custom_vocabulary includes the object that can be detected is ok: python demo.py --config-file configs/DECOLA_PHASE2_LI_CLIP_SwinB_4x_ft4x.yaml --input trash_test/000732.jpg --output trash_test/outputs_custom/1.jpg --vocabulary custom --custom_vocabulary water_bottle,wallet,webcam,mug,headphone,drawer,keyboard,laptop,plastic_bag --confidence-threshold 0.2 --opts MODEL.WEIGHTS weights/DECOLA_PHASE2_LI_CLIP_SwinB_4x_ft4x.pth

but when the custom_vocabulary not include in the image, which means if image contains 0 object from the custom_vocabulary, the error will happen python demo.py --config-file configs/DECOLA_PHASE2_LI_CLIP_SwinB_4x_ft4x.yaml --input trash_test/000732.jpg --output trash_test/outputs_custom/1.jpg --vocabulary custom --custom_vocabulary water_bottle,wallet,webcam,mug,headphone,drawer,keyboard,laptop --confidence-threshold 0.2 --opts MODEL.WEIGHTS weights/DECOLA_PHASE2_LI_CLIP_SwinB_4x_ft4x.pth the only difference of this two command is custom_vocabulary plastic_bag

output: 7

ERROR: Traceback (most recent call last): File "demo.py", line 173, in predictions, visualized_output = demo.run_on_image(img) File "/homes/hilary/marinedet/sota_ovd/DECOLA/decola/predictor.py", line 95, in run_on_image predictions = self.predictor(image) File "/homes/hilary/marinedet/sota_ovd/detectron2/detectron2/engine/defaults.py", line 319, in call predictions = self.model([inputs])[0] IndexError: list index out of range