Hi, how to understand the LF-hdvila-8m?

microsoft / XPretrain

Multi-modality pre-training

Other

471 stars 37 forks source link

Hi, how to understand the LF-hdvila-8m? #38

Open sunwhw opened 7 months ago

sunwhw commented 7 months ago

Is the line in 'lfvila8m_clipid.jsonl' a video clips-sentence pair? And I see an variational number of video-clips per row. So how the video-clips of 'lfvila8m_clipid.jsonl' is divided from the original ‘hdvila_clip_text_100m.jsonl’？ In addition to the selection of videos with more than 4 clips mentioned in the paper, are there any details?

GXYM commented 6 months ago

Is the line in 'lfvila8m_clipid.jsonl' a video clips-sentence pair? And I see an variational number of video-clips per row. So how the video-clips of 'lfvila8m_clipid.jsonl' is divided from the original ‘hdvila_clip_text_100m.jsonl’？ In addition to the selection of videos with more than 4 clips mentioned in the paper, are there any details?

Where can I find annotation files containing video captions， "hdvila_clip_text_100m.jsonl" ? Thanks