Does having the default quant type being Q4_0 (a legacy format) on the model hub still make sense?

sammcj commented 5 months ago

The Ollama model hub still has the default quant type of Q4_0 which is a legacy format that under-performs compared to K-quants (Qn_K, e.g. Q4_K_M, Q6_K, Q5_K_L etc...).

Would it perhaps make sense to change the default quant to Q4_K_M for future models uploaded to the hub?

Reference

https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix (Note that the legacy quant types don't even appear on the feature matrix).
https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
https://www.reddit.com/r/LocalLLaMA/comments/1ba55rj/overview_of_gguf_quantization_methods/
https://github.com/ggerganov/llama.cpp/discussions/406#discussioncomment-6176448
https://github.com/ggerganov/llama.cpp/discussions/2094
https://huggingface.co/datasets/christopherthompson81/quant_exploration

(Sorry if an issue already exists for this - if it did my search-foo let me down)

DuckyBlender commented 4 months ago

I 100% agree on this. This decision should have been made a long time ago. The default on all of my models on Ollama is q4_K_M for this reason

mahenning commented 2 months ago

Any updates on this? Would be great if the k-quants will be handled as defaults, as I personally see no reason for the q_0 quants to remain default. It's more typing to get the k-quants right now, and users with less experience in quantization miss out on an arguably better model if they just use the default model names. If the decision went against k-quants as default, I'd be interested in the reason.

ollama / ollama

Does having the default quant type being Q4_0 (a legacy format) on the model hub still make sense? #5425