snap-research / Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
https://snap-research.github.io/Panda-70M/
438 stars 15 forks source link

About the caption the video with subtitles. #35

Open mutonix opened 3 months ago

mutonix commented 3 months ago

Great thanks to the great contribution of your work! I have some doubts about how you collect the subtitles. Do you directly download the subtitles from the youtube website or use some ASR models?

tsaishien-chen commented 3 months ago

Hi @mutonix, Thanks for your interest about this dataset! The subtitles are directly from youtube and we don't use another ASR model to get them. If you use the script in this repo to download the dataset, you can also get the youtube subtitles.