mugen-org / MUGEN_baseline

multimodal video-audio-text generation and retrieval between every pair of modalities on the MUGEN dataset. The repo. contains the training, evaluation and inference codes for these baselines.
Other
38 stars 2 forks source link

training time #6

Closed xiaoqian-shen closed 1 year ago

xiaoqian-shen commented 1 year ago

Hi, great work! Thanks for your efforts on such a fine-grained large-scale dataset. I am wondering what is the GPU consumption and training time for 3D VQ-VAE and transformer.

Sy-Zhang commented 1 year ago

Hi, great work! Thanks for your efforts on such a fine-grained large-scale dataset. I am wondering what is the GPU consumption and training time for 3D VQ-VAE and transformer.

All models are trained on a single node with 8xV100. 3D VQ-VAE will cost 1 week for training and the transformer will cost 2-4 days depending on the task.

cvpr2023-MoStGAN commented 1 year ago

Thanks for your reply!