Closed letaozhang closed 4 months ago
hardware resources of the model in the paper
hi, thank you for your interest to our work. To our empirical findings, training our model with 4 A100-80G GPUs for less than 10 hours is enough for model convergence.
thank you
hardware resources of the model in the paper