microsoft / XPretrain

Multi-modality pre-training
Other
471 stars 37 forks source link

HD-VILA-100M dataset, where is the text corresponding to each video? #2

Closed Qiliqing closed 2 years ago

Qiliqing commented 2 years ago

Hi, @bei21 @msftdata

Thank you for your great paper.

I downloaded and decompressed the "hdvila100m.zip", but could not find the transcriptions corresponding to each video.

Did I miss something? Could you let me know how to get the transcriptions? Thank you.

bei21 commented 2 years ago

Hi @Qiliqing

Thanks very much for your interest. We will not release the text part due to copyright issues. The texts (as well as the videos) can be downloaded by yourself with the URLs we provided. But we will soon release the code for text processing. Please stay tuned for it.

Thank you.

dome272 commented 2 years ago

Hey there, great work you started! Are there any updates on how to also download the annotations? I see there is a script for downloading the videos, but what about the corresponding text?