Closed the-crypt-keeper closed 3 months ago
vLLM issue: https://github.com/vllm-project/vllm/issues/6689
Gathered some early results which only confirmed by fears: there's likely bugs. 8B q6k did very poorly and 70B nf4 also looks suspect. Note that 70B NF4 did not fit into either 2x24GB or 40GB only an 80GB.
https://github.com/ggerganov/llama.cpp/commit/b5e95468b1676e1e5c9d80d1eeeb26f542a38f42
GGUF metadata has been extended to support precalculated RoPEs. New GGUFs need to get made.
8B works with llama.cpp 705b7ecf and kobold.cpp e47477fd4d the 70B looks suspicious still
Going to give this a week to settle, there's always bugs when quants first land.