intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
https://intel.github.io/neural-compressor/
Apache License 2.0
2.12k stars 251 forks source link

AssertionError: The optimizer should not be given for inference mode #1520

Closed jinz2014 closed 7 months ago

jinz2014 commented 7 months ago

Running the script "bertmini_dense_fintune.sh" shows the following error. Thank you for your suggestion.

File "/storage/usersb/user/neural-compressor/examples/pytorch/nlp/huggingface_models/question-answering/pruning/eager/run_qa_no_trainer.py", line 1177, in main() File "/storage/usersb/user/neural-compressor/examples/pytorch/nlp/huggingface_models/question-answering/pruning/eager/run_qa_no_trainer.py", line 895, in main model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare( File "/home/users/user/anaconda3/envs/triton-env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1207, in prepare args = self._prepare_ipex(*args) File "/home/users/user/anaconda3/envs/triton-env/lib/python3.10/site-packages/accelerate/accelerator.py", line 1740, in _prepare_ipex model, optimizer = torch.xpu.optimize( File "/home/users/user/anaconda3/envs/triton-env/lib/python3.10/site-packages/intel_extension_for_pytorch/xpu/utils.py", line 221, in optimize return frontend.optimize( File "/home/users/user/anaconda3/envs/triton-env/lib/python3.10/site-packages/intel_extension_for_pytorch/frontend.py", line 476, in optimize assert optimizer is None, "The optimizer should not be given for inference mode" AssertionError: The optimizer should not be given for inference mode

yiliu30 commented 7 months ago

Hi @jinz2014, thanks for bringing up this issue. We have fixed it by https://github.com/intel/neural-compressor/pull/1525.