tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥
https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov
MIT License
2.09k stars 140 forks source link

Use simd value directly in matmul #1

Closed davors72 closed 1 year ago

davors72 commented 1 year ago

It's possible to just use a simd value directly without needing to allocate/load/store the buffer. This gives me a 10-15% lift on ubuntu which takes me to ~300 tok/s

tairov commented 1 year ago

Thanks for PR ! It got me around 10% spedup on playground environment. But on other VM I didn't see speedup. But anyway, this change makes sense, I'm happy to merge.