Closed magician-blue closed 11 months ago
I have updated the model on huggingface.
Hi @magician-blue , so do you mean the tl-chat model on HF is not compatible with this repo anymore ?
Hi @magician-blue , so do you mean the tl-chat model on HF is not compatible with this repo anymore ?
@tairov We still can run with our repo.
Change from
mojo llama2.mojo tl-chat.bin \
-r falcon \
-z tok_tl-chat.bin \
-n 256 -t 0 -s 100 -i "<|im_start|>user\nGive me a python function to generate Fibonacci sequence<|im_end|>\n<|im_start|>assistant\n"
to
mojo llama2.mojo tl-chat.bin \
-r llama \
-z tok_tl-chat.bin \
-n 256 -t 0 -s 100 -i "<|im_start|>user\nGive me a python function to generate Fibonacci sequence<|im_end|>\n<|im_start|>assistant\n"
If we can convert all HF llama model(they use falcon style rope) to llama style rope. Then we only need to implementone type of rope in our repo. This is what llama2.c and llama.cpp are doing.
Looks cool. Could you share some details where is this convert.py
file came from? I see it has some dependencies. Probably we can remove it from the PR, and then keep only link to a converted model in the README file so that the overall process will be simpler?
The original convert file comes from llama2.c and I modify some part of it to support GQA. I have already make a pull request to llama2.c, but not merged yet. We can wait for a while.
The next thing I will do is to convert openllama3b(12G RAM), llama2-chat-7b(28G RAM), vicuna-7b to test my convertor and our llama2.mojo. Besides, I'll focus on the tokenizer part of llama.cpp and llama2.c in order to find a way to remove the hardcode part of our tokenizer.
In this case I guess the convert.py is not needed in the repo. And it's cool that you have plans to research other types of models support
model could be converted using script from llama2c And for llama2.mojo we have a URL in the readme file
thank you!
All HF llama model are falcon style ROPE and we can convert them to original llama style ROPE with a permutation. This pull request solve the bug when converting HF GQA to gguf format. I learned idea from it and fix the similar bug in the llama2.c's exports.py. Now I successfully convert Tinyllama-1.1B-chat to llama style ROPE. So, we can remove the falcon ROPE part. I have upload the new export.py and llama2.mojo.
Details:
python export.py tl-chat.bin --hf PY007/TinyLlama-1.1B-Chat-v0.2 --version 0
to convert the model