microsoft / XPretrain

Multi-modality pre-training
Other
467 stars 36 forks source link

Questions about HD-VILA #3

Closed HenryHZY closed 2 years ago

HenryHZY commented 2 years ago

Hi @bei21 @TiankaiHang , I would like to ask some questions about HD-VILA.

  1. How large are the HD-VILA video files and subtitle files respectively? I guess there are at least 10TB or above. Are you using SSD to store them?
  2. It is mentioned in the paper that 64 V100 GPUs are required. How much time does it take for pre-training stage1, pre-training stage2 and downstream tasks respectively?
  3. Do you have a plan to share the code of HD-VILA? Thank you very much!
TiankaiHang commented 2 years ago

Thanks for your attention! :-) @HenryHZY

  1. The storage for HD-VILA video files and subtitle files are 250T and 27G, respectively. The compressed version with 240p and 3 FPS is 50T. Actually, they are more than 10TB, we store them in Microsoft Azure Storage.
  2. We re-train the model for the camera-ready version. For pre-training stage one, we train our model on 128 V100 GPUs for 6 epochs in 6 days (compressed version), 1 epoch in 3 days (original version). For pre-training stage two, it takes 7 days to train 4 epochs (compressed version) on 32 V100 GPUs. For downstream tasks, it usually takes < 1 day for finetuning on 8 V100 GPUs.
  3. Of course, we will release the code. The code has been prepared and is waiting for company's approval.

Best.

HenryHZY commented 2 years ago

@TiankaiHang Thanks for your quick reply! I believe that HD-VILA will be impactful in Video-Language Pre-training. Looking forward to your training code and data processing code:)

SCZwangxiao commented 1 year ago
  1. we store them in Microsoft Azure Storage

Since the dataset is stored in Microsoft Azure Storage, Maybe you could put them in object storage for users to download? Downloading and processing videos from youtube is quite a hard job.

bei21 commented 1 year ago

Due to copyright issues, we cannot make the videos public. Sorry about that.