kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku
Apache License 2.0
6.29k stars 892 forks source link

GPT-J: perplexity for checkpoints #211

Closed danyaljj closed 2 years ago

danyaljj commented 2 years ago

Thanks for sharing the checkpoints! Wondering if there is a plot of perplexity as a function of steps #?

kingoflolz commented 2 years ago

Information collected during training (ppl, evals etc) can be seen here: https://wandb.ai/eleutherai/mesh-transformer-jax/reports/6B-Rotary--Vmlldzo2NDQxNzY