tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥
https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov
MIT License
2.09k stars 139 forks source link

reduce memory use by streaming file to create weight tensors #81

Closed mikowals closed 7 months ago

mikowals commented 8 months ago

Use the new read_bytes function to read checkpoint in pieces so weight tensors can take ownership without copying.

A couple of minor changes made:

mikowals commented 7 months ago

Not sure if anyone has looked at this but my earlier commit was broken. It is fixed now with a few other small changes to work for Mojo 0.7.0.

tairov commented 7 months ago

Thanks for updating compatibility!