Report errors when running weight-only quantization on OPT

intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Apache License 2.0

1.46k stars 225 forks source link

Report errors when running weight-only quantization on OPT #501

Open doudoubobo opened 6 months ago

doudoubobo commented 6 months ago

Describe the bug

I tried to run weight-only quantization on OPT models by using scripts in examples/cpu/inference/python/llm OMP_NUM_THREADS=48 numactl -m 0 -C 0-47 python run.py --benchmark -m <path-to-opt-file> --ipex-weight-only-quantization --output-dir ./opt_6dot7_int8_model --int8

I got the following error message:

Versions

The version of intel_extension_for_pytorch is 2.1.1, and the version of transformers is 4.31.0.

doudoubobo commented 6 months ago

I have tried to fix this issue in #502 .

jingxu10 commented 5 months ago

requested code review