simon-ging / coot-videotext

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Apache License 2.0
288 stars 55 forks source link

Have you tried pre-trained features on ActivityNet? #8

Closed PKULiuHui closed 3 years ago

PKULiuHui commented 3 years ago

From your experiments,on YouCookII, Howto100m pre-trained features help a lot than 2D3D features. Have you tried using pre-trained features on ActivityNet? Is it because Howto100m pre-trained features can't generalize well to ActivityNet dataset?

simon-ging commented 3 years ago

Hi, we didn't try this since we didn't have the time for downloading and processing all the 20k ActivityNet videos. We are planning to do this in the near future.

CreatorGhost commented 3 years ago

Hi, can you please provide your pre-trained model? I will be a great help as I don't have much data to download the entire dataset. So if it is possible then please provide the pre-trained model in pickle format. Thanks

simon-ging commented 3 years ago

Hi, please read the readme, models are provided already. Also please open a new issue next time. Cheers

simon-ging commented 3 years ago

Using HowTo100m pretrained features on ActivityNet did not work well out of the box.