Closed maxime-fleury closed 11 months ago
Hi, thank you for trying our model. As for gguf format, perhaps you can find the convert.py file in https://github.com/ggerganov/llama.cpp and run python3 convert.py path/to/TinyLlama
to convert our model into gguf format. Besides, you can also run ./quantize path/to/TinyLlama/ggml-model-f32.gguf path/to/TinyLlama/ggml-model-q4_0.gguf q4_0
to convert it into 4-bit. You can do all the stuff without a GPU.
Hello, tiny lama takes all my ram and has very very poor perfs' like lower than 7b models, it takes a very long time to load and is worse than most model, I don't unstand what I'm doing wrong, usually I use ggml gguf ? version but you have bin that is 4GB for 1B .... I guess that's the issue, maybe you have somewhere the ggml or gguf model ? I'm pretty sure something is wrong ... Maybe I can convert it ? (the real issue is that I have an AMD high end gpu, it useless .............)
I used the base model Last version and not the chat model, since it's a 1b params maybe I can convert it to gguf ?