tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥

https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov

MIT License

2.09k stars 139 forks source link

adapt to mojo 0.4 #41

Closed rd4com closed 10 months ago

rd4com commented 11 months ago

hope it helps, i also added parallel to:

Apply the temperature to the logits

https://docs.modular.com/mojo/changelog.html#v0.4.0-2023-10-05

rd4com commented 10 months ago

I implemented the vectorize, as for the benchmarks i have no idea how to do it, i have mojo 0.4.0 right now, any idea ? (keep in mind i don't understand ai at all, just trying to contribute with mojo code and learn)

mikowals commented 10 months ago

@rd4com , there are examples of how to benchmark in the matmul notebook in the Mojo documentation. For this change doing a small benchmark that only tests the changed loop code at vocab_size of 32000 iterations would be interesting. Also just running mojo llama2.mojo stories15M.bin 5-10 times from this PR branch and again from master branch and reporting the results is useful information.

tairov commented 10 months ago

@rd4com would you mind to run some benchmarks ? Or better let's focus on Mojo 0.4.0 migration, for this let's not include vectorize change for now. Feel free to send another PR.

rd4com commented 10 months ago

Ok, i updated the repo, do you want me to close this pr and create another one ? please verify the code is running fine before