rendezqueue / rendezllama

CLI for llama.cpp with various commands to guide, edit, and regenerate tokens on the fly.
ISC License
10 stars 1 forks source link

build(dep): Update llama.cpp past its May 11th ggml format change #26

Closed grencez closed 1 year ago

grencez commented 1 year ago

Looks like we'll get an error message about trying to load the old format (https://github.com/ggerganov/llama.cpp/issues/1408) rather than silently failing or giving weird results.

Let's see how it plays out over the weekend. Hopefully the popular models on huggingface get requantized and we can all just assume that everything uploaded after May 11th uses the new format.

grencez commented 1 year ago

Requantizing still works as documented (https://github.com/ggerganov/llama.cpp#prepare-data--run):

# Generate ggml-model-f16.bin from consolidated.*.pth files.
pipenv run python convert-pth-to-ggml.py "${model_dir}/" 1
# Generate ggml-model-q5_0.bin from ggml-model-f16.bin.
./quantize "${model_dir}/ggml-model-f16.bin" "${model_dir}/ggml-model-q5_0.bin" q5_0

I guess the ggml-model-f16.bin doesn't have to be regenerated, so anyone with that file should be able to generate a q5_0 or q4_0 or whatever without much hassle.

I'll probably merge the format change to trunk early next week. Until then, find it on the update branch https://github.com/rendezqueue/rendezllama/tree/update.