Closed zhangtao22 closed 11 months ago
Hi there is because, currently the main_gradio.py only supports clip checkpoint (as following)
while the clip+slowfast checkpoint requires additional slowfast feature extractor, which is not included yet. I will update this part in recent.
Thx. Waiting for you!
Hi, have you included the additional feature extractor?
Regardless of whether I run Slowfast R50 + CLIP-B/16 or Slowfast R50 + CLIP-B/16 QVHL + Charades + NLQ + TACoS + ActivityNet + DiDeMo,I got this error Total number of frames: 298 Traceback (most recent call last): File "/opt/disk1/UniVTG/main_gradio.py", line 180, in
forward(vtg_model, "./examples/", 'A man takes a photo on the bottom of the sea and sees a lot of fish.')
File "/opt/disk1/UniVTG/main_gradio.py", line 91, in forward
output = model(src_vid=src_vid, src_txt=src_txt, src_vid_mask=src_vid_mask, src_txt_mask=src_txt_mask)
File "/root/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/opt/disk1/UniVTG/model/univtg.py", line 107, in forward
src_vid = self.input_vid_proj(src_vid)
File "/root/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/root/.local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/root/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/opt/disk1/UniVTG/model/univtg.py", line 402, in forward
x = self.LayerNorm(x)
File "/root/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/root/.local/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 189, in forward
return F.layer_norm(
File "/root/.local/lib/python3.9/site-packages/torch/nn/functional.py", line 2503, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Given normalized_shape=[2818], expected input with shape [, 2818], but got input of size[1, 298, 514]
the video I deployed is youtube.mp4