tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥
https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov
MIT License
2.09k stars 140 forks source link

Remove falcon style rope #34

Closed magician-blue closed 11 months ago

magician-blue commented 11 months ago

All HF llama model are falcon style ROPE and we can convert them to original llama style ROPE with a permutation. This pull request solve the bug when converting HF GQA to gguf format. I learned idea from it and fix the similar bug in the llama2.c's exports.py. Now I successfully convert Tinyllama-1.1B-chat to llama style ROPE. So, we can remove the falcon ROPE part. I have upload the new export.py and llama2.mojo.

Details: python export.py tl-chat.bin --hf PY007/TinyLlama-1.1B-Chat-v0.2 --version 0 to conver the model

magician-blue commented 11 months ago

Why there is a difference is readme??

magician-blue commented 11 months ago

I have update my model on huggingface.