Closed SherrySwift closed 4 months ago
Disclaimer: I'm not the author so my understanding may not be correct.
To perform w16 a16 kv4 quantization, change the following lines in the yaml file.
quant:
wgts:
dtype: sint16
ipts:
dtype: sint16
opts:
dtype: sint4
Note that sint mean symmetric quantization. Use zint for asymmetric quantization with zero point. You can also edit the config using command line options. (eg. --opts-dytpe sint4) I haven't tested this command but I believe this should work with no changes needed to config file or code.
python -m lmquant.llm.run configs/llm.yaml --model-name llama2-7b --opts-dytpe sint4
Disclaimer: I'm not the author so my understanding may not be correct.
To perform w16 a16 kv4 quantization, change the following lines in the yaml file.
quant: wgts: dtype: sint16 ipts: dtype: sint16 opts: dtype: sint4
Note that sint mean symmetric quantization. Use zint for asymmetric quantization with zero point. You can also edit the config using command line options. (eg. --opts-dytpe sint4) I haven't tested this command but I believe this should work with no changes needed to config file or code.
python -m lmquant.llm.run configs/llm.yaml --model-name llama2-7b --opts-dytpe sint4
Got it, thanks a lot!
Thanks for your great work! I want to evaluate accuracy when only performing kv4 quantization (i.e., w16 a16 kv4). To achieve this, how should I modify the configuration files, or should I modify the code? Thanks a lot.