yifanlu0227 / ChatSim

[CVPR2024 Highlight] Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration
https://yifanlu0227.github.io/ChatSim
263 stars 13 forks source link

RuntimeError: The size of tensor a (50) must match the size of tensor b (10) at non-singleton dimension 1 #25 #26

Open nevergone123 opened 1 month ago

nevergone123 commented 1 month ago

I change the frames from 50 to 10, and run the following command: python main.py -y config/waymo-1006.yaml -p 'Remove all cars.Viewpoints ahead slowly and A chevrolet driving away from me fast.' -s demo

I got the following output:

/root/AImodel/wenke/ChatSim/chatsim/background/inpainting/Inpaint-Anything/segment_anything/segment_anything/modeling/tiny_vit_sam.py:657: UserWarning: Overwriting tiny[0/1927]_512 in registry with segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_512. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
sttn
Traceback (most recent call last):
  File "remove_anything_video_npy.py", line 288, in <module>
    all_frame_rm_w_mask = model.forward_inpainter(frames, masks)
  File "remove_anything_video_npy.py", line 132, in forward_inpainter
    frames = inpaint_video_with_builded_sttn(
  File "/root/miniconda3/envs/chatsim/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/root/AImodel/wenke/ChatSim/chatsim/background/inpainting/Inpaint-Anything/sttn_video_inpaint.py", line 91, in inpaint_video_with_builded_sttn
    feats = (feats * (1 - _masks).float()).view(video_length, 3, h, w)
RuntimeError: The size of tensor a (50) must match the size of tensor b (10) at non-singleton dimension 1

This won‘t happen if I set frames as 50. I think there's a hardcode remains, but I failed to find it. Do you know how to generate a video with a different frame number? Many thanks!

yifanlu0227 commented 1 month ago

Sorry, that shouldn't happen. We will check it immediately.

yifanlu0227 commented 1 month ago

Very sorry for the late reply, we had a hard time rushing another DDL days before.

The reason for this problem is that we are not emptying the rendered image, and scene.current_images is read from the same folder. When you first render 50 frames and then switch to 10 frames, the output folder remains with 40 images rendered in the previous.

(chatsim) yfl@fsh-System-Product-Name:~/workspace/ChatSim$ ls chatsim/background/mcnerf/exp/segment-10061305430875486848_1080_000_1100_000_with_camera_labels/exp_coeff_0.15/wide_angle_novel_images
50000_000.png  50000_004.png  50000_008.png  50000_012.png  50000_016.png  50000_020.png  50000_024.png  50000_028.png  50000_032.png  50000_036.png  50000_040.png  50000_044.png  50000_048.png
50000_001.png  50000_005.png  50000_009.png  50000_013.png  50000_017.png  50000_021.png  50000_025.png  50000_029.png  50000_033.png  50000_037.png  50000_041.png  50000_045.png  50000_049.png
50000_002.png  50000_006.png  50000_010.png  50000_014.png  50000_018.png  50000_022.png  50000_026.png  50000_030.png  50000_034.png  50000_038.png  50000_042.png  50000_046.png
50000_003.png  50000_007.png  50000_011.png  50000_015.png  50000_019.png  50000_023.png  50000_027.png  50000_031.png  50000_035.png  50000_039.png  50000_043.png  50000_047.png

We fix this problem in these lines by cleaning the rendered folder. Sorry again for being late.

nevergone123 commented 1 month ago

Cool, I successfully generated the video.Really appreciate your reply! By the way, are you considering other alternative LLM models to be the agents,gpt4 api is a little bit expensive in such multi-agent collaborate system.

yifanlu0227 commented 1 month ago

Great suggestion! We'll add it to the to-do list :)

yifanlu0227 commented 1 month ago

@nevergone123 Hi, you can use many free LLM inference provided by NVIDIA.