Closed youngfly11 closed 4 years ago
Hi, linjieli;
Thanks for your reply! I have some questions:
Please find the answers to your questions below:
As mentioned in Appendix A.5 of our paper, we extract video features at a fixed frame rate (TV: 2/3 frame per second, HowTo100M: 1/2 frame per second). For downstream tasks, you can check vfeat_interval
in each config to get the corresponding frame rate (frame_rate = 1/vfeat_interval
). For example:
https://github.com/linjieli222/HERO/blob/bc4aec5af1d8eafeb468e78a033f56cd37210097/config/train-didemo_video_only-4gpu.json#L26
As mentioned in Section 4.1 of our paper, we only cut the HowTo videos into 60s-clip. All other videos are kept as their original length. For example, if a TV video is of length 90-second, they you will get a 3D/2D video feature of length 60.
We use the original fps in SlowFast to get the 3D video feature. Note that the frame rate mentioned above for example 2/3 frame per second means that we get one frame feature every 1.5 seconds. At a high level, a 1.5-second video clip is fed into SlowFast to get a feature vector. And we repeat this process to get the features for the whole video.
Thanks, Linjie
Thanks for your interest. We plan to release feature extraction code but cannot guarantee a timeline.
If you are in urgent need of extracting the video features in the same format as HERO, you can follow the following repos to build your own feature extraction pipeline:
Thanks, Linjie