Closed ENJOY-Yin-jiong closed 2 years ago
Thank you for kindly sharing! And I am curious about how much computational resources are needed for training the model and the corresponding training time. Because it has a relatively large scale of parameters as introduced in the paper.
For the largest model, we use 8xV100 for training. Smaller models require less computational resources.
Thank you for kindly sharing! And I am curious about how much computational resources are needed for training the model and the corresponding training time. Because it has a relatively large scale of parameters as introduced in the paper.
For the largest model, we use 8xV100 for training. Smaller models require less computational resources.
Thank you for your patience in answering
Thank you for kindly sharing! And I am curious about how much computational resources are needed for training the model and the corresponding training time. Because it has a relatively large scale of parameters as introduced in the paper.