How long does it take for pre-training on TV with MLM+MNCE from scratch?

HenryHZY commented 2 years ago

@linjieli222 Hi, thanks for your great project! As mentioned in your paper, the best pre-trained HERO needs to be trained on 16 V100 GPUs for about 3 weeks. Due to the limitation of GPU and memory, I would like to conduct pre-training on TV with MLM+MNCE firstly. (that is, L2 in Table 1 in your paper)

I would like to ask three questions:

How long does it take for pre-training on TV with MLM+MNCE from scratch? (L2 in Table 1 in your paper)

Could you please show me the commands to conduct pre-training on TV with MLM+MNCE and fine-tuning on TVR from scratch? I am a novice in pre-training projects. :)

I think I need to conduct this experiment by 7 steps:

1/ download TV dataset
2/ Text & Video feature extraction from TV dataset
  or directly use the Text & Video features provided by you
3/ pre-training on TV with MLM+MNCE

4/ download TVR dataset
5/ Text & Video feature extraction from TVR dataset
  or directly use the Text & Video features provided by you
6/ fine-tuning & inference on TVR
7/ submit results to TVR codalab

I find that the downloading of bash scripts/download_tvr.sh $PATH_TO_STORAGE is too slow, less than 1m/s. Do you have another download server? [Done. No need to reply this question.]

HenryHZY commented 2 years ago

@linjieli222 For question 2, are the following commands correct? (Just copy from your README.md)

1/ download TV dataset 2/ Text & Video feature extraction from TV dataset Here, I directly use the Text & Video features provided by you:

# outside of the container
bash scripts/download_tv_pretrain.sh $PATH_TO_STORAGE

3/ pre-training on TV with MLM+MNCE

# inside of the container
horovodrun -np 16 python pretrain.py --config config/pretrain-tv-16gpu.json --output_dir $PRETRAIN_EXP

https://github.com/linjieli222/HERO/blob/32c1c523c7a9f547a29f14c8e33dec24ebd14156/config/pretrain-tv-16gpu.json#L11

from
"tasks": ["mlm", "mfm-nce", "fom", "vsm"]
to
"tasks": ["mlm", "mfm-nce"]

4/ download TVR dataset 5/ Text & Video feature extraction from TVR dataset Here, I directly use the Text & Video features provided by you

bash scripts/download_tvr.sh $PATH_TO_STORAGE

6/ fine-tuning & inference on TVR

# fine-tunin, inside the container
horovodrun -np 8 python train_vcmr.py --config config/train-tvr-8gpu.json

# inference, inside the container
horovodrun -np 8 python eval_vcmr.py --query_txt_db /txt/tvr_val.db/ --split val \
    --vfeat_db /video/tv/ --sub_txt_db /txt/tv_subtitles.db/ \
    --output_dir /storage/tvr_default/ --checkpoint 4800 --fp16 --pin_mem

7/ submit results to TVR codalab

linjieli222 commented 2 years ago

It was more a year ago when we conducted the pretraining ablation experiments. From what I recall, it may take about 2-3 day on 8 GPUs.

Note that you will need to reduce the pre-training steps by half for MLM+MFM-NCE if you want to strictly follow our settings in the pre-training ablation table.

And remember to change the pretrained checkpoints in the config/train-tvr-8gpu.json for finetuning.

Another useful information, please use azcopy to download, if you ever find it slow. You can refer to VALUE-Leaderboard/StarterCode/scripts/download_tvr.sh.

HenryHZY commented 2 years ago

It was more a year ago when we conducted the pretraining ablation experiments. From what I recall, it may take about 2-3 day on 8 GPUs.

Note that you will need to reduce the pre-training steps by half for MLM+MFM-NCE if you want to strictly follow our settings in the pre-training ablation table.

And remember to change the pretrained checkpoints in the config/train-tvr-8gpu.json for finetuning.

Another useful information, please use azcopy to download, if you ever find it slow. You can refer to VALUE-Leaderboard/StarterCode/scripts/download_tvr.sh.

Thanks for your quick reply! The VALUE is really a great project, which contains VALUE-StarterCode and VALUE-DataRelease. Maybe I could use the VALUE-StarterCode for a better beginning of my adventure towards video pre-training.

HenryHZY commented 2 years ago

I would like to temporarily close this issue, and reopen it if there are any other questions later, thanks again.

linjieli222 / HERO

How long does it take for pre-training on TV with MLM+MNCE from scratch? #36