snap-research / Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
https://snap-research.github.io/Panda-70M/
462 stars 17 forks source link

Dose detailed caption improve the performance? #59

Open g-jing opened 3 weeks ago

g-jing commented 3 weeks ago

I have a few questions about the video caption.

  1. I noticed that during the training, the caption in the video CSV is quiet short. Will the performance improve if we use a detailed caption during the test time?
  2. If we also use detailed caption during the training time, will that improve the model performance?
  3. The caption used now focuses more on the frame description rather than the video dynamics. Will that be improved with a capture describing the dynamics? If so, do you have any suggestions on generating that captions?

Thanks a lot!