kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku
Apache License 2.0
6.29k stars 892 forks source link

Clarify val_batches #122

Closed nostalgebraist closed 3 years ago

nostalgebraist commented 3 years ago

In discord, I see a lot of people keeping the val_batches value from 6B_roto_256.json in their finetuning configs. This probably won't do what they expect.

Updates the finetuning guide to explain how to set this number.

kingoflolz commented 3 years ago

Thanks!