Open tanbuzheng opened 3 months ago
Hi, please see below for my answers:
Q: Computing resources and training time: my models were trained with 3 V100 GPUs with the default settings from the configuration files. Training time usually takes about 5 days for the transformer, and 2-3 days for other modules under this setting.
Q: Are all your models trained on three 3 NVidia V100 GPUs with a batch size of 8 ?
The "batch size" in the training parameters refers to batch size per GPU, so if it is set to 8 and trained on 3 V100, effectively the model is trained with a bs of 24. If I remember correctly, our models were trained with a total batch size of 36 for the VQGAN part (and the bootstrapped encoder/decoder) and a bs of 24 for the transformer.
Dear author, Thanks for sharing the code. I am greatly interested in your work. I am troubled by computing resources and training time, so I have some questions for you and would like your reply.
Are all your models trained on three 3 NVidia V100 GPUs with a batch size of 8 ? How long did it take you to train VQGAN and Transformer on places2 dataset? I have also tried to train vqgan on the places2 dataset, but found it very time consuming. When you train a transformer, is it appropriate to set the batch size to 8? Because a large batch size is generally used.
Waiting for your reply!