tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥
https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov
MIT License
2.09k stars 139 forks source link

Vectorize temperatures logits #43

Closed rd4com closed 10 months ago

rd4com commented 10 months ago

Hello, it need to run some benchmarks to confirm it (work in progress)

PriNova commented 10 months ago

With this change I got an improvement of about 1.8% from 530 tok/sec to 540 tok/sec. So I think merging this PR after some testing on other devices makes sense for x86

tairov commented 10 months ago

@rd4com would you mind resolve conflicts ?

tairov commented 10 months ago

still can't merge.. need help? 😄

image
rd4com commented 10 months ago

yes i am quite new to github, if you know what to do tell me? or you can just incorporate the changes yourself into the repo no big deal :+1: i use github desktop