xvyaward / owq

Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models".
https://arxiv.org/abs/2306.02272
53 stars 5 forks source link

Llama 2 perplexity results on wikitext2? #3

Closed seannz closed 2 months ago

seannz commented 3 months ago

Hello,

Would you happen to already have the OWQ (3–4 bits) perplexity results for Llama 2 models on the WikiText2 dataset? It'd be great have my results cross-checked against yours.

Best,

Sean

xvyaward commented 3 months ago

Hello Sean,

I'm sorry for not getting back to you sooner. Here are the WikiText2 perplexity results for the Llama-2 model.

LLaMA-2 7B 13B
FP 5.12 4.57
OWQ 3.01 6.21 5.25
OWQ 3.01 g128 5.81 4.98
OWQ 3.1 5.75 5.07
OWQ 4.01 5.40 4.73
OWQ 4.01 g128 5.25 4.66
OWQ 4.1 5.29 4.69

I hope this helps you in your cross-checking.

Best regards, Changhun

seannz commented 3 months ago

Thanks Changhun!

seannz commented 3 months ago

Hello again Changhun — are the results posted above for WT2-test? For FP16 I have PPLs of 5.47 (7B) and 4.88 (13B).

Many thanks, Sean

seannz commented 3 months ago

Curiously, the AWQ paper (https://arxiv.org/abs/2306.00978, Table 4) also reports the same FP16 WikiText2 PPLs as me. I also ran your code and it is indeed giving me the FP16 results shown above, so I wonder where the discrepancy is coming from!

xvyaward commented 3 months ago

Hello Sean,

I used args.seqlen = 4096 for the results above, as Llama-2 supports 4K context windows! I confirmed that I also got 5.47 (7B) and 4.88 (13B) when using args.seqlen = 2048.

You can try this by manually modifying the code here.

I'm not sure which option is more common, but I think you can choose according to your preference. Since Llama-1 has a 2K sequence window, the change will not affect the scores reported by OWQ.

I hope this answers to your question!

Sincerely, Changhun

seannz commented 3 months ago

Makes sense now, thanks so much Changhun!