Closed grencez closed 1 year ago
Requantizing still works as documented (https://github.com/ggerganov/llama.cpp#prepare-data--run):
# Generate ggml-model-f16.bin from consolidated.*.pth files.
pipenv run python convert-pth-to-ggml.py "${model_dir}/" 1
# Generate ggml-model-q5_0.bin from ggml-model-f16.bin.
./quantize "${model_dir}/ggml-model-f16.bin" "${model_dir}/ggml-model-q5_0.bin" q5_0
I guess the ggml-model-f16.bin
doesn't have to be regenerated, so anyone with that file should be able to generate a q5_0
or q4_0
or whatever without much hassle.
I'll probably merge the format change to trunk early next week. Until then, find it on the update
branch https://github.com/rendezqueue/rendezllama/tree/update.
Looks like we'll get an error message about trying to load the old format (https://github.com/ggerganov/llama.cpp/issues/1408) rather than silently failing or giving weird results.
Let's see how it plays out over the weekend. Hopefully the popular models on huggingface get requantized and we can all just assume that everything uploaded after May 11th uses the new format.