ImportError: cannot import name 'WeightOnlyQuantizedLinear' from 'intel_extension_for_pytorch.nn.utils._quantize_convert'

junruizh2021 commented 5 months ago

I try to run the TTS (English and Multi Language Text-to-Speech) in my PC.

https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/README.md

It occured the cannot import name 'WeightOnlyQuantizedLinear' error info as below.

~/WorkSpace/TTS$ python eng-tts.py 
Traceback (most recent call last):
  File "/home/anna/WorkSpace/TTS/eng-tts.py", line 1, in <module>
    from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio.tts import TextToSpeech
  File "/home/anna/.local/lib/python3.10/site-packages/intel_extension_for_transformers/neural_chat/__init__.py", line 26, in <module>
    from .chatbot import build_chatbot
  File "/home/anna/.local/lib/python3.10/site-packages/intel_extension_for_transformers/neural_chat/chatbot.py", line 19, in <module>
    from intel_extension_for_transformers.transformers.llm.quantization.optimization import Optimization
  File "/home/anna/.local/lib/python3.10/site-packages/intel_extension_for_transformers/transformers/__init__.py", line 59, in <module>
    from .modeling import (
  File "/home/anna/.local/lib/python3.10/site-packages/intel_extension_for_transformers/transformers/modeling/__init__.py", line 21, in <module>
    from .modeling_auto import (AutoModel, AutoModelForCausalLM,
  File "/home/anna/.local/lib/python3.10/site-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py", line 94, in <module>
    from intel_extension_for_pytorch.nn.utils._quantize_convert import (
ImportError: cannot import name 'WeightOnlyQuantizedLinear' from 'intel_extension_for_pytorch.nn.utils._quantize_convert' (/opt/python-3.10.13/lib/python3.10/site-packages/intel_extension_for_pytorch/nn/utils/_quantize_convert.py)

jketreno commented 5 months ago

I am seeing a similar problem when using intel/intel-extension-for-pytorch:2.1.20-xpu-pip-jupyter. After installing needed modules via:

!pip install intel_extension_for_transformers accelerate uvicorn yacs fastapi datasets

And then running the following neural_chat example code:

from intel_extension_for_transformers.neural_chat import build_chatbot
from intel_extension_for_transformers.neural_chat import PipelineConfig
hf_access_token = "<put in your huggingface access token to download models"
config = PipelineConfig(device='xpu', hf_access_token=hf_access_token)

I see the following:

File /usr/local/lib/python3.10/dist-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py:94
     90 from typing import Union
     92 if is_ipex_available() and is_intel_gpu_available():
     93     # pylint: disable=E0401
---> 94     from intel_extension_for_pytorch.nn.utils._quantize_convert import (
     95         WeightOnlyQuantizedLinear,
     96     )
     98 torch = LazyImport("torch")
    101 def recover_export_model(model, current_key_name=None):

ImportError: cannot import name 'WeightOnlyQuantizedLinear' from 'intel_extension_for_pytorch.nn.utils._quantize_convert' (/usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/nn/utils/_quantize_convert.py)

jketreno commented 4 months ago

I don't know if this will help the OP, however I was able to get things to work. I'm trying to use IPEX (intel-ext-for-python) and ITREX (intel-ext-for-transformers) on an Intel Arc A770M, which means I'm using the +xpu version of IPEX, which is older than the +cpu version.

I started looking into the WOC (WeightOnlyQuantizedLiner) implementation in IPEX and noted that there had been several code changes to it, so I thought maybe there is an API conflict between the more recent ITREX and the older version of IPEX needed for xpu.

This is what I'm using in my Dockerfile to build an image that seems to work:

FROM ubuntu:jammy

# First, setup Python and install other required packages (pip, venv, git, etc.)
RUN apt-get update \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    git \
    less \
    nano \
    gpg-agent \
    python3 \
    python3-pip \
    python3-venv \
    python3-dev \
    wget \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/{apt,dpkg,cache,log}

# Install Intel graphics driver for Linux
RUN wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
    gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg \
    && echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy/lts/2350 unified" \
    > /etc/apt/sources.list.d/intel-gpu-jammy.list

RUN apt-get update \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    intel-level-zero-gpu \
    intel-opencl-icd \
    clinfo \
    level-zero \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/{apt,dpkg,cache,log}

# For IPEX v2.1.10+xpu
# https://intel.github.io/intel-extension-for-pytorch/#installation?platform=gpu&version=v2.1.10%2bxpu&os=linux%2fwsl2&package=pip
# * oneAPI 2024.0
ENV oneapi_pkgs="intel-oneapi-dpcpp-cpp-2024.0 intel-oneapi-mkl-devel=2024.0.0-49656"
ENV python_modules="torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"

RUN wget -qO- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor > /usr/share/keyrings/oneapi-archive-keyring.gpg \
    && echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" > /etc/apt/sources.list.d/oneAPI.list \
    && apt-get update \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    ${oneapi_pkgs} \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/{apt,dpkg,cache,log}

RUN pip3 install \
    ${python_modules}

# 1.4.2 has the WOQ bug
# 1.4.1 has the WOQ bug
# 1.4 has the WOQ bug
# 1.3.2 works!
ENV itrex_version=1.3.2
RUN pip install intel-extension-for-transformers==${itrex_version}

# Install system package dependencies that must be met for 1.3.2:
RUN apt-get update \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    libgl1-mesa-glx \
    libglib2.0-0 \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/{apt,dpkg,cache,log}

intel / intel-extension-for-transformers

ImportError: cannot import name 'WeightOnlyQuantizedLinear' from 'intel_extension_for_pytorch.nn.utils._quantize_convert' #1630