I tried to run weight-only quantization on OPT models by using scripts in examples/cpu/inference/python/llm
OMP_NUM_THREADS=48 numactl -m 0 -C 0-47 python run.py --benchmark -m <path-to-opt-file> --ipex-weight-only-quantization --output-dir ./opt_6dot7_int8_model --int8
I got the following error message:
Versions
The version of intel_extension_for_pytorch is 2.1.1, and the version of transformers is 4.31.0.
Describe the bug
I tried to run weight-only quantization on OPT models by using scripts in examples/cpu/inference/python/llm
OMP_NUM_THREADS=48 numactl -m 0 -C 0-47 python run.py --benchmark -m <path-to-opt-file> --ipex-weight-only-quantization --output-dir ./opt_6dot7_int8_model --int8
I got the following error message:![image](https://github.com/intel/intel-extension-for-pytorch/assets/47907647/8a7be227-152d-4d0a-ae2e-796b6a329a84)
Versions
The version of intel_extension_for_pytorch is 2.1.1, and the version of transformers is 4.31.0.