kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku
Apache License 2.0
6.27k stars 890 forks source link

How to do pre-train from scratch ? #166

Closed kamalkraj closed 2 years ago

kingoflolz commented 2 years ago

train.py can be used for training from scratch. most of the other setup is similar to the setup described for finetuning.