jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Apache License 2.0
7.64k stars 446 forks source link

Very very poor perf using faraday and amd gpu ? #57

Closed maxime-fleury closed 11 months ago

maxime-fleury commented 11 months ago

Hello, tiny lama takes all my ram and has very very poor perfs' like lower than 7b models, it takes a very long time to load and is worse than most model, I don't unstand what I'm doing wrong, usually I use ggml gguf ? version but you have bin that is 4GB for 1B .... I guess that's the issue, maybe you have somewhere the ggml or gguf model ? I'm pretty sure something is wrong ... Maybe I can convert it ? (the real issue is that I have an AMD high end gpu, it useless .............)

I used the base model Last version and not the chat model, since it's a 1b params maybe I can convert it to gguf ?

ChaosCodes commented 11 months ago

Hi, thank you for trying our model. As for gguf format, perhaps you can find the convert.py file in https://github.com/ggerganov/llama.cpp and run python3 convert.py path/to/TinyLlama to convert our model into gguf format. Besides, you can also run ./quantize path/to/TinyLlama/ggml-model-f32.gguf path/to/TinyLlama/ggml-model-q4_0.gguf q4_0 to convert it into 4-bit. You can do all the stuff without a GPU.