microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
https://arxiv.org/abs/2002.06353
MIT License
339 stars 54 forks source link

How can I create my video feature pickle #38

Closed tingchihc closed 2 years ago

tingchihc commented 2 years ago

In the caption task, I see you have youcookii_videos_features.pickle to record video features. Now, I want to test this model in my own video dataset. How can I build up this file? I follow this github(https://github.com/ArrowLuo/VideoFeatureExtractor) to extract feature and build up the pickle. However, I have an error message like this. It looks like the tensor size problem. Could you help me to fit it up?

Traceback (most recent call last): File "main_task_caption.py", line 689, in <module> main() File "main_task_caption.py", line 667, in main scheduler, global_step, nlgEvalObj=nlgEvalObj, local_rank=args.local_rank) File "main_task_caption.py", line 361, in train_epoch output_caption_ids=pairs_output_caption_ids) File "/home/tingchih/anaconda3/envs/py_univl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/tingchih/anaconda3/envs/py_univl/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 886, in forward output = self.module(*inputs[0], **kwargs[0]) File "/home/tingchih/anaconda3/envs/py_univl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/tingchih/github_clone/UniVL/modules/modeling.py", line 196, in forward video = self.normalize_video(video) File "/home/tingchih/anaconda3/envs/py_univl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/tingchih/github_clone/UniVL/modules/modeling.py", line 91, in forward video = self.visual_norm2d(video) File "/home/tingchih/anaconda3/envs/py_univl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/tingchih/github_clone/UniVL/modules/until_module.py", line 53, in forward return self.weight * x + self.bias RuntimeError: The size of tensor a (1024) must match the size of tensor b (2048) at non-singleton dimension 2

thanks,

ArrowLuo commented 2 years ago

Hi @ting-chih, sorry for the delayed reply. It is indeed the problem caused by the tensor size. Is the feature dim 1024? If so, the hyperparameter --video_dim should be 1024.

tingchihc commented 2 years ago

thanks for your help. Now, I can use the caption to test my video dataset. Now, I have a question. How can I check the output? I want to know their results.

ArrowLuo commented 2 years ago

Hi @ting-chih, this line will save the results, which will be saved under --output_dir by default.

tingchihc commented 2 years ago

thanks