google / maxtext

A simple, performant and scalable Jax LLM!
Apache License 2.0
1.44k stars 263 forks source link

Sharding the llama2 70b on v5e-16 more efficiently. #706

Closed zhihaoshan-google closed 2 months ago

zhihaoshan-google commented 2 months ago

https://arxiv.org/pdf/2211.05102 https://arxiv.org/pdf/1909.08053

vipannalla commented 2 months ago

Talked off line, George is OOO soon and doesn't time right now. He can make these changes once he is back in 2 weeks. I'm ok with merging this as a short-term fix, will let @gobbleturk decide.

zhihaoshan-google commented 2 months ago

Thanks for the review, Matt, Vipan and Morgan!