Open daizuozhuo opened 1 year ago
Paper states that "We train our model with 32 NVIDIA Tesla V100 GPUs in a batch size of 1024", but it doesn't tell how long the pretraining takes in this setting. Could you tell me the pretraining cost?
I have the same question. Curious how long it takes.
Paper states that "We train our model with 32 NVIDIA Tesla V100 GPUs in a batch size of 1024", but it doesn't tell how long the pretraining takes in this setting. Could you tell me the pretraining cost?