We don't gain as much as on a Zen4 system as there aren't as many vector registers, so we need to load/store data much more often. Still, we do get a small gain in performance.
For now it supports only fp16 kv-cache. Quantized kv-cache will be added later.
We don't gain as much as on a Zen4 system as there aren't as many vector registers, so we need to load/store data much more often. Still, we do get a small gain in performance.
For now it supports only
fp16
kv-cache. Quantized kv-cache will be added later.