Very very poor perf using faraday and amd gpu ?

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Apache License 2.0

7.64k stars 446 forks source link

Hello, tiny lama takes all my ram and has very very poor perfs' like lower than 7b models, it takes a very long time to load and is worse than most model, I don't unstand what I'm doing wrong, usually I use ggml gguf ? version but you have bin that is 4GB for 1B .... I guess that's the issue, maybe you have somewhere the ggml or gguf model ? I'm pretty sure something is wrong ... Maybe I can convert it ? (the real issue is that I have an AMD high end gpu, it useless .............)

I used the base model Last version and not the chat model, since it's a 1b params maybe I can convert it to gguf ?

jzhang38 / TinyLlama

Very very poor perf using faraday and amd gpu ? #57