reduce memory use by streaming file to create weight tensors

tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥

MIT License

2.09k stars 139 forks source link

Closed mikowals closed 7 months ago

mikowals commented 8 months ago

Use the new read_bytes function to read checkpoint in pieces so weight tensors can take ownership without copying.

A couple of minor changes made:

read checkpoint file in Config.init and TransformerWeights.init. Removed the config_init function.
checkpoint file size is no longer available in main so I added some printing of eqiuvalent output in TransformerWeights init.

mikowals commented 7 months ago

Not sure if anyone has looked at this but my earlier commit was broken. It is fixed now with a few other small changes to work for Mojo 0.7.0.

tairov commented 7 months ago

Thanks for updating compatibility!