distributed training - Githubissues

speed1313 / jax-llm

JAX implementation of Large Language Models. You can train GPT-2-like model with 青空文庫 (aozora bunko-clean dataset) or any other text dataset.

MIT License

9 stars 1 forks source link

Open speed1313 opened 5 months ago

speed1313 commented 4 months ago

Reference

speed1313 commented 4 months ago

テンソル、パイプライン並列、FSDP