Closed junphine closed 3 months ago
We do not recommend training with this codebase, because it is written in pure PyTorch without any systems optimization, so training will be slow, especially when the per-device batch size is small. For faster training code, or to replicate results from our paper, please view our JAX codebase.
V100 1B 0.06it/s