Closed wyhsleep closed 2 weeks ago
Dear authors, thank you for your wonderful work, may I ask how many GPUs (A100) you used for pretraining?
Max we ever used 8, most experiments were done on A6000 or other consumer experiments.
Similar question: How long it took to pre-train the model using the 8 A100?
Roughly 7-8 hours
Dear authors, thank you for your wonderful work, may I ask how many GPUs (A100) you used for pretraining?