tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥
https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov
MIT License
2.09k stars 139 forks source link

TODO: Support for gguf models #52

Open babycommando opened 10 months ago

babycommando commented 10 months ago

hey team, incredible work being done here.

Wondering if you only support .bin models, or would it also manage to work with gguf quantized models as well.

If not, then that's a real feature request. Mostly everyone uses gguf models to work nowadays, as they are easier to run on consumer-grade hardware.

thanks.

tairov commented 10 months ago

Hi . Thanks for you question. GGUF is not yet supported. AFAIK, gguf models are just originally bin models converted to gguf, so. It also depends on the exact "bin" model architecture, I think there is no strict agreement how bin models should be implemnted, all projects has its own format.