mit-han-lab / lmquant

Apache License 2.0
117 stars 10 forks source link

evaluate kv4 quantization accuracy #6

Closed SherrySwift closed 4 months ago

SherrySwift commented 5 months ago

Thanks for your great work! I want to evaluate accuracy when only performing kv4 quantization (i.e., w16 a16 kv4). To achieve this, how should I modify the configuration files, or should I modify the code? Thanks a lot.

Golden-Wang commented 5 months ago

Disclaimer: I'm not the author so my understanding may not be correct.

To perform w16 a16 kv4 quantization, change the following lines in the yaml file.

quant:
  wgts:
    dtype: sint16
  ipts:
    dtype: sint16
  opts:
    dtype: sint4

Note that sint mean symmetric quantization. Use zint for asymmetric quantization with zero point. You can also edit the config using command line options. (eg. --opts-dytpe sint4) I haven't tested this command but I believe this should work with no changes needed to config file or code.

python -m lmquant.llm.run configs/llm.yaml --model-name llama2-7b --opts-dytpe sint4
SherrySwift commented 4 months ago

Disclaimer: I'm not the author so my understanding may not be correct.

To perform w16 a16 kv4 quantization, change the following lines in the yaml file.

quant:
  wgts:
    dtype: sint16
  ipts:
    dtype: sint16
  opts:
    dtype: sint4

Note that sint mean symmetric quantization. Use zint for asymmetric quantization with zero point. You can also edit the config using command line options. (eg. --opts-dytpe sint4) I haven't tested this command but I believe this should work with no changes needed to config file or code.

python -m lmquant.llm.run configs/llm.yaml --model-name llama2-7b --opts-dytpe sint4

Got it, thanks a lot!