build(dep): Update llama.cpp past its May 11th ggml format change

rendezqueue / rendezllama

CLI for llama.cpp with various commands to guide, edit, and regenerate tokens on the fly.

ISC License

11 stars 1 forks source link

Requantizing still works as documented (https://github.com/ggerganov/llama.cpp#prepare-data--run):

# Generate ggml-model-f16.bin from consolidated.*.pth files.
pipenv run python convert-pth-to-ggml.py "${model_dir}/" 1
# Generate ggml-model-q5_0.bin from ggml-model-f16.bin.
./quantize "${model_dir}/ggml-model-f16.bin" "${model_dir}/ggml-model-q5_0.bin" q5_0

I guess the ggml-model-f16.bin doesn't have to be regenerated, so anyone with that file should be able to generate a q5_0 or q4_0 or whatever without much hassle.

I'll probably merge the format change to trunk early next week. Until then, find it on the update branch https://github.com/rendezqueue/rendezllama/tree/update.

rendezqueue / rendezllama

build(dep): Update llama.cpp past its May 11th ggml format change #26