Closed wangzhilong closed 1 year ago
Sorry for the inconvenience. I think that the pretrained weight is from Moment-DETR not from our GitHub repository.
Can you try again with the weights provided in our repository?
Video only weights : https://www.dropbox.com/s/yygwyljw8514d9r/videoonly.ckpt?dl=0 V + A weights : https://www.dropbox.com/s/hsc7jk21ppqasjt/videoaudio.ckpt?dl=0
Thank your reply. I use videoaudio.ckpt, get the error:
File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for QDDETR:
size mismatch for input_vid_proj.0.LayerNorm.weight: copying a param with shape torch.Size([4868]) from checkpoint, the shape in current model is torch.Size([2818]).
size mismatch for input_vid_proj.0.LayerNorm.bias: copying a param with shape torch.Size([4868]) from checkpoint, the shape in current model is torch.Size([2818]).
size mismatch for input_vid_proj.0.net.1.weight: copying a param with shape torch.Size([256, 4868]) from checkpoint, the shape in current model is torch.Size([256, 2818]).
Can you try with the checkpoint trained only with video? To use the video+audio checkpoint, you may have to change some code and your dataset to have extracted audio features.
I have tried whe the checkpoint trained only with video: videoonly.ckpt, but error still happen。The shape of the model and the weights not match.
File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/normalization.py", line 190, in forward
input, self.normalized_shape, self.weight, self.bias, self.eps)
File "/usr/local/lib64/python3.6/site-packages/torch/nn/functional.py", line 2347, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Given normalized_shape=[2818], expected input with shape [*, 2818], but got input of size[1, 75, 514]
If you see the given train script, shape of features should be 2304(slowfast)+512(clip). It looks like you only have clip features.
I also have an error when running run_on_video/run.py. I have used both videoonly.ckpt (https://www.dropbox.com/s/yygwyljw8514d9r/videoonly.ckpt?dl=0) and video_model_best.ckpt (run_on_video/qd_detr_ckpt/)
Error logs are below:
File "run_on_video/run.py", line 126, in
run_example() File "run_on_video/run.py", line 109, in run_example predictions = qd_detr_predictor.localize_moment( File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "run_on_video/run.py", line 57, in localize_moment outputs = self.model(model_inputs) File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/ubuntu/projects/moment-retrieval/QD-DETR/qd_detr/model.py", line 110, in forward src_vid = self.input_vid_proj(src_vid) File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/ubuntu/projects/moment-retrieval/QD-DETR/qd_detr/model.py", line 505, in forward x = self.LayerNorm(x) File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward return F.layer_norm( File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/functional.py", line 2503, in layer_norm return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: Given normalized_shape=[2818], expected input with shape [, 2818], but got input of size[1, 75, 514]
It seems that your feature size is also 512 that you also need to extract slowfast feature
I have the same issue. I believe the script written on the repo should not produce this error if used as is.
Hello. For all of you in this thread, thank you for your interest, and sorry for the inconvenience. I'll let you know through this thread when the model checkpoint trained only with CLIP features is ready.
Thanks.
We've uploaded pretrained model only trained with CLIP features to support run on video. You may try an example with it! Thank you.
Which one is it ?
model_best.ckpt is the model trained with only Clip features.
It now works thanks. I suggest to change the default model used on master.
Thank you for the suggestion. Do you mean to change the default loaded model in run_on_video/run.py?
In the file
run_on_video/model_utils.py
, the import statement forMomentDETR
is incorrect. The import statement isfrom qd_detr.model import build_transformer, build_position_encoding, MomentDETR
, butMomentDETR
is not present inqd_detr.model
. Instead,QDDETR
is present inqd_detr.model
which should be used instead ofMomentDETR
.But when i use
QDDETR
instead ofMomentDETR
,from qd_detr.model import build_transformer, build_position_encoding, QDDETR as MomentDETR
,then run the
run.py
, a error happen, how to fix?