Closed yyfcc17 closed 11 months ago
Thanks for the contribution! Could you please provide some performance comparison w/ and w/o auto search?
the main improvement is that we don't need to set alpha multiple times, measure the results, then choose the best alpha
the manul setting-testing loop take a lot of time when model becomes large
and also, per-layer alpha seems to be a more resonable solution
on pileval dataset, tested 1000 samples, accuracy measured with last word prediction:
chatglm2-6b W8A8
chatglm2-66b W8A8
the accuracy improvement may be minor, and not tested on other tasks and models (don't have to time to do this right now), the main improvement is auto search instead of manually set.
maybe you can test on your own tasks and models, then decide whether this pr can be merged or not. i think it can be a feature at least, for the user to choose.
update:
chatglm2-6b W4A8
it seems auto search for per-layer alpha and clip value is important under lower bits setting.
i feel W4A8 without loss is within reach. AWQ is a powerful method! 👍