Closed ikawrakow closed 1 month ago
Else some models (e.g., Qwen2-7B-Instruct) produce garbage. Borrowed from PR-9595 in mainline llama.cpp.
llama.cpp
Strangely enough, K*Q is done using fp16 in my ARM_NEON FA implementation, and it works just fine there.
K*Q
fp16
ARM_NEON
Else some models (e.g., Qwen2-7B-Instruct) produce garbage. Borrowed from PR-9595 in mainline
llama.cpp
.Strangely enough,
K*Q
is done usingfp16
in myARM_NEON
FA implementation, and it works just fine there.