karpathy / llm.c

LLM training in simple, raw C/CUDA
MIT License
21.28k stars 2.31k forks source link

On-device reductions #635

Closed ngc92 closed 5 days ago

ngc92 commented 6 days ago

Moves loss calculation to backward, and ensures we can do more on-device reductions and fewer host<->device transfers. Also enables a micro-optimization, that validate does not calculate dlogits anymore.