Closed ChenMnZ closed 7 months ago
Add --w_clip
solve my problem.
Amazing work, so excellent performance!
Thanks @ChenMnZ for using our code. Glad to hear that you fixed the issue. I will close this issue
Hi,
As I have aforementioned before, when taking A16W4 quantization without w_clip
, simple RTN
obtains 6.11
WikiText2 perplexity. However, rotate + RTN
obtains worse results, 6.99
Wikitext2 perplexity.
In my understanding, after rotating(incoherence processing), the distribution of weights should be more uniform and the quantize-ability of weights should be improved. So, what is the potential reasons behind that RTN+rotate
achieve worse results than RTN
.
Thank you.
@ChenMnZ
Please check the results in the paper (check Tables 1 and 5). 6.10
is the case where we have A4W4 with 4-bit KV-caches (in Table 1). However, 6.99
is the A16W4 with FP16 KV-caches (Table 5). Rotation will reduces that number to 6.76
as stated in the paper.
@sashkboos
I know the reported results. I find that --w_clip
bring performance degeneration for RTN. Simply remove --w_clip
can boost the RTN perplexity from 6.99
to 6.11
.
Specially, some reproduced A16W4 results are as follows:
w_clip
+ rotate
+ rtn
: 6.76 (same with paper)rtn
+ w_clip
: 6.99 (same with paper)rtn
: 6.11 (remove w_clip
can improve the RTN
performance)rtn
+ rotate
: 9.52 (why rotation damages the performance of RTN
when without w_clip
)So, what is the potential reasons behind that rtn
+rotate
achieve worse results than rtn
when without w_clip
.
@ChenMnZ @sashkboos I also encounted the same issue. It seems that the hadamard transform will make the weight harder to quantize when direct quantizing with no further improved tricks are applied.
@sashkboos
I know the reported results. I find that
--w_clip
bring performance degeneration for RTN. Simply remove--w_clip
can boost the RTN perplexity from6.99
to6.11
.Specially, some reproduced A16W4 results are as follows:
w_clip
+rotate
+rtn
: 6.76 (same with paper)rtn
+w_clip
: 6.99 (same with paper)rtn
: 6.11 (removew_clip
can improve theRTN
performance)rtn
+rotate
: 9.52 (why rotation damages the performance ofRTN
when withoutw_clip
)So, what is the potential reasons behind that
rtn
+rotate
achieve worse results thanrtn
when withoutw_clip
.
Dear Authors,
Thanks for your outstanding work. I like it and have learned a lot from it!
I try to reproduce the weight-only quantization results in Table 5. However, I obtained some results that are inconsistent with your paper.
For example,
For A16W4 with RTN quantization, I ran the following command. The obtained WikiText-2 perplexity is
6.11
, while6.99
in your paper.:For A16W4 with QuaRot-GPTQ, I ran the following command. The obtained WikiText-2 perplexity is
5.72
, while5.60
in your paper.:For A16W3 with QuaRot-GPTQ, I ran the following command. The obtained WikiText-2 perplexity is
7.19
, while6.09
in your paper.:I want to know if i am missing some details. Thank you.