jayleicn / ClipBERT

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
https://arxiv.org/abs/2102.06183
MIT License
704 stars 86 forks source link

no pre-training on Howto100M dataset? #15

Closed PapaMadeleine2022 closed 3 years ago

PapaMadeleine2022 commented 3 years ago

hello, I have some questions. 1.When you do 'Text-to-Video Retrieval' experiment on MSRVTT 1K test set, you do not pre-training the model end-to-end on Howto100M dataset. It is because the computation cost limit? And Can you provide the experiments later when you have enough computation resource?

  1. Because of the computation cost limit, so you do not directly make 'Text-to-Video Retrieval' experiment on Howto100M test dataset ?
jayleicn commented 3 years ago

Hi @IvyGongoogle,

  1. Yes, we did not pre-train our model on HowTo100M due to the huge computation cost, and it is still not possible for us to do so.
  2. There is no standard "Text-to-video retrieval" task defined for the HowTo100M dataset. Please let me know if I am wrong about this. Thanks!

Best, Jie

PapaMadeleine2022 commented 3 years ago

@jayleicn 2. Sorry, my mistake. Thank you.