speed1313 / jax-llm

JAX implementation of Large Language Models. You can train GPT-2-like model with 青空文庫 (aozora bunko-clean dataset) or any other text dataset.
https://speed1313.github.io/posts/llm-from-scratch/
MIT License
10 stars 2 forks source link

dropout 0にしてみる #17

Closed speed1313 closed 6 months ago