showlab / UniVTG

[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
https://arxiv.org/abs/2307.16715
MIT License
300 stars 22 forks source link

RuntimeError: Given normalized_shape=[2818], expected input with shape [*, 2818], but got input of size[1, 298, 514] #12

Closed zhangtao22 closed 11 months ago

zhangtao22 commented 11 months ago

Regardless of whether I run Slowfast R50 + CLIP-B/16 or Slowfast R50 + CLIP-B/16 QVHL + Charades + NLQ + TACoS + ActivityNet + DiDeMo,I got this error Total number of frames: 298 Traceback (most recent call last): File "/opt/disk1/UniVTG/main_gradio.py", line 180, in forward(vtg_model, "./examples/", 'A man takes a photo on the bottom of the sea and sees a lot of fish.') File "/opt/disk1/UniVTG/main_gradio.py", line 91, in forward output = model(src_vid=src_vid, src_txt=src_txt, src_vid_mask=src_vid_mask, src_txt_mask=src_txt_mask) File "/root/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/opt/disk1/UniVTG/model/univtg.py", line 107, in forward src_vid = self.input_vid_proj(src_vid) File "/root/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/root/.local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/root/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/opt/disk1/UniVTG/model/univtg.py", line 402, in forward x = self.LayerNorm(x) File "/root/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/root/.local/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 189, in forward return F.layer_norm( File "/root/.local/lib/python3.9/site-packages/torch/nn/functional.py", line 2503, in layer_norm return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: Given normalized_shape=[2818], expected input with shape [, 2818], but got input of size[1, 298, 514] the video I deployed is youtube.mp4

QinghongLin commented 11 months ago

Hi there is because, currently the main_gradio.py only supports clip checkpoint (as following)

image

while the clip+slowfast checkpoint requires additional slowfast feature extractor, which is not included yet. I will update this part in recent.

zhangtao22 commented 11 months ago

Thx. Waiting for you!

Coronal-Halo commented 4 months ago

Hi, have you included the additional feature extractor?