How to quantize and built with Metal support for this model?

guinmoon / LLMFarm

llama and other large language models on iOS and MacOS offline using GGML library.

https://llmfarm.site

MIT License

1.06k stars 64 forks source link

How to quantize and built with Metal support for this model? #1

Closed laosuan closed 11 months ago

laosuan commented 1 year ago

https://huggingface.co/openllmplayground/openalpaca_3b_600bt_preview

guinmoon commented 1 year ago

To support metal, I need to update the sources of llama. When I do this I will try to load openllama with metal.

laosuan commented 1 year ago

Got it, built with metal support seem need a lot of work.

I found the following program commands on https://github.com/ggerganov/llama.cpp, Can I put this model(openllmplayground/openalpaca_3b_600bt_preview) quantize in this way?

# convert the 7B model to ggml FP16 format
python3 convert.py models/7B/
# quantize the model to 4-bits (using q4_0 method)
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0

guinmoon commented 1 year ago

there is a quantizatized model that works fine with this application

laosuan commented 1 year ago

Thanks, I tried these code, it worked. but I found a little problem: It does not support generate according to history of conversation.

By the way, I rec ommend an excellent mini model: https://twitter.com/erhartford/status/1672672018779766784

guinmoon commented 1 year ago

Thanks for the tip, good model. I'm working on context support like interactive mode in llama.cpp

aehlke commented 9 months ago

Please update llama.cpp again - they've recently landed improvements for OpenML + Metal together for the fastest perf yet