tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥
https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov
MIT License
2.09k stars 139 forks source link

Unroll vectorisation #56

Closed miili closed 9 months ago

miili commented 10 months ago

Hi there,

awesome port and demonstrator. Have you compared the performance of vectorize and vectorize_unroll?

While tinkering around with demanding algos I saw that unrolling the partial loop 12x I got a 10% performance increase. Maybe enough to beat cpp? 😁

tairov commented 10 months ago

Hi @miili ! Sounds amazing :)

I was keeping my eye on vectorize_unroll as well as to some other incredible features of Mojo , like autotune, tiles and alignment. But I didn't have bandwidth to try out all cases . It would be really cool if you can share a PR, I'd love to merge