Closed laosuan closed 11 months ago
To support metal, I need to update the sources of llama. When I do this I will try to load openllama with metal.
Got it, built with metal support seem need a lot of work.
I found the following program commands on https://github.com/ggerganov/llama.cpp, Can I put this model(openllmplayground/openalpaca_3b_600bt_preview) quantize in this way?
# convert the 7B model to ggml FP16 format
python3 convert.py models/7B/
# quantize the model to 4-bits (using q4_0 method)
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0
Thanks, I tried these code, it worked. but I found a little problem: It does not support generate according to history of conversation.
By the way, I rec ommend an excellent mini model: https://twitter.com/erhartford/status/1672672018779766784
Thanks for the tip, good model. I'm working on context support like interactive mode in llama.cpp
Please update llama.cpp again - they've recently landed improvements for OpenML + Metal together for the fastest perf yet
https://huggingface.co/openllmplayground/openalpaca_3b_600bt_preview