tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥
https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov
MIT License
2.09k stars 139 forks source link

Rename cores to workers and set to opimal #46

Closed jackos closed 10 months ago

jackos commented 10 months ago

Renamed cores to workers to avoid confusion.

Workers being half the threads appears to be the sweet spot. Achieving 900tk/s on macOS m2 max now with mojo v0.4.0, 0.3.1 without global runtime was running at 700tk/s.