Closed seannz closed 2 months ago
Hello Sean,
I'm sorry for not getting back to you sooner. Here are the WikiText2 perplexity results for the Llama-2 model.
LLaMA-2 | 7B | 13B |
---|---|---|
FP | 5.12 | 4.57 |
OWQ 3.01 | 6.21 | 5.25 |
OWQ 3.01 g128 | 5.81 | 4.98 |
OWQ 3.1 | 5.75 | 5.07 |
OWQ 4.01 | 5.40 | 4.73 |
OWQ 4.01 g128 | 5.25 | 4.66 |
OWQ 4.1 | 5.29 | 4.69 |
I hope this helps you in your cross-checking.
Best regards, Changhun
Thanks Changhun!
Hello again Changhun — are the results posted above for WT2-test? For FP16 I have PPLs of 5.47 (7B) and 4.88 (13B).
Many thanks, Sean
Curiously, the AWQ paper (https://arxiv.org/abs/2306.00978, Table 4) also reports the same FP16 WikiText2 PPLs as me. I also ran your code and it is indeed giving me the FP16 results shown above, so I wonder where the discrepancy is coming from!
Hello Sean,
I used args.seqlen = 4096
for the results above, as Llama-2 supports 4K context windows! I confirmed that I also got 5.47 (7B) and 4.88 (13B) when using args.seqlen = 2048
.
You can try this by manually modifying the code here.
I'm not sure which option is more common, but I think you can choose according to your preference. Since Llama-1 has a 2K sequence window, the change will not affect the scores reported by OWQ.
I hope this answers to your question!
Sincerely, Changhun
Makes sense now, thanks so much Changhun!
Hello,
Would you happen to already have the OWQ (3–4 bits) perplexity results for Llama 2 models on the WikiText2 dataset? It'd be great have my results cross-checked against yours.
Best,
Sean