intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Apache License 2.0
2.05k stars 200 forks source link

ImportError: cannot import name 'WeightOnlyQuantizedLinear' from 'intel_extension_for_pytorch.nn.utils._quantize_convert' #1630

Open junruizh2021 opened 2 weeks ago

junruizh2021 commented 2 weeks ago

I try to run the TTS (English and Multi Language Text-to-Speech) in my PC.

https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/README.md

It occured the cannot import name 'WeightOnlyQuantizedLinear' error info as below.

~/WorkSpace/TTS$ python eng-tts.py 
Traceback (most recent call last):
  File "/home/anna/WorkSpace/TTS/eng-tts.py", line 1, in <module>
    from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio.tts import TextToSpeech
  File "/home/anna/.local/lib/python3.10/site-packages/intel_extension_for_transformers/neural_chat/__init__.py", line 26, in <module>
    from .chatbot import build_chatbot
  File "/home/anna/.local/lib/python3.10/site-packages/intel_extension_for_transformers/neural_chat/chatbot.py", line 19, in <module>
    from intel_extension_for_transformers.transformers.llm.quantization.optimization import Optimization
  File "/home/anna/.local/lib/python3.10/site-packages/intel_extension_for_transformers/transformers/__init__.py", line 59, in <module>
    from .modeling import (
  File "/home/anna/.local/lib/python3.10/site-packages/intel_extension_for_transformers/transformers/modeling/__init__.py", line 21, in <module>
    from .modeling_auto import (AutoModel, AutoModelForCausalLM,
  File "/home/anna/.local/lib/python3.10/site-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py", line 94, in <module>
    from intel_extension_for_pytorch.nn.utils._quantize_convert import (
ImportError: cannot import name 'WeightOnlyQuantizedLinear' from 'intel_extension_for_pytorch.nn.utils._quantize_convert' (/opt/python-3.10.13/lib/python3.10/site-packages/intel_extension_for_pytorch/nn/utils/_quantize_convert.py)
jketreno commented 3 days ago

I am seeing a similar problem when using intel/intel-extension-for-pytorch:2.1.20-xpu-pip-jupyter. After installing needed modules via:

!pip install intel_extension_for_transformers accelerate uvicorn yacs fastapi datasets

And then running the following neural_chat example code:

from intel_extension_for_transformers.neural_chat import build_chatbot
from intel_extension_for_transformers.neural_chat import PipelineConfig
hf_access_token = "<put in your huggingface access token to download models"
config = PipelineConfig(device='xpu', hf_access_token=hf_access_token)

I see the following:

File /usr/local/lib/python3.10/dist-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py:94
     90 from typing import Union
     92 if is_ipex_available() and is_intel_gpu_available():
     93     # pylint: disable=E0401
---> 94     from intel_extension_for_pytorch.nn.utils._quantize_convert import (
     95         WeightOnlyQuantizedLinear,
     96     )
     98 torch = LazyImport("torch")
    101 def recover_export_model(model, current_key_name=None):

ImportError: cannot import name 'WeightOnlyQuantizedLinear' from 'intel_extension_for_pytorch.nn.utils._quantize_convert' (/usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/nn/utils/_quantize_convert.py)

image