Closed barronalex closed 2 weeks ago
No worries! Totally agree with the comment -- I updated it.
The benchmark is with group size 64 since that worked with main
. The qmv size is (192, 576) so it should never go to qmv_fast
(also I don't think group size affects the routing at the moment).
This was breaking quantized generation with
SmolLM2-135M-Instruct
specifically with group size 32.No clear difference in performance before and after (tested with group size 64 which still worked on
main
).Before:
After: