Open junruizh2021 opened 5 months ago
I am seeing a similar problem when using intel/intel-extension-for-pytorch:2.1.20-xpu-pip-jupyter. After installing needed modules via:
!pip install intel_extension_for_transformers accelerate uvicorn yacs fastapi datasets
And then running the following neural_chat example code:
from intel_extension_for_transformers.neural_chat import build_chatbot
from intel_extension_for_transformers.neural_chat import PipelineConfig
hf_access_token = "<put in your huggingface access token to download models"
config = PipelineConfig(device='xpu', hf_access_token=hf_access_token)
I see the following:
File /usr/local/lib/python3.10/dist-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py:94
90 from typing import Union
92 if is_ipex_available() and is_intel_gpu_available():
93 # pylint: disable=E0401
---> 94 from intel_extension_for_pytorch.nn.utils._quantize_convert import (
95 WeightOnlyQuantizedLinear,
96 )
98 torch = LazyImport("torch")
101 def recover_export_model(model, current_key_name=None):
ImportError: cannot import name 'WeightOnlyQuantizedLinear' from 'intel_extension_for_pytorch.nn.utils._quantize_convert' (/usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/nn/utils/_quantize_convert.py)
I don't know if this will help the OP, however I was able to get things to work. I'm trying to use IPEX (intel-ext-for-python) and ITREX (intel-ext-for-transformers) on an Intel Arc A770M, which means I'm using the +xpu version of IPEX, which is older than the +cpu version.
I started looking into the WOC (WeightOnlyQuantizedLiner) implementation in IPEX and noted that there had been several code changes to it, so I thought maybe there is an API conflict between the more recent ITREX and the older version of IPEX needed for xpu.
This is what I'm using in my Dockerfile to build an image that seems to work:
FROM ubuntu:jammy
# First, setup Python and install other required packages (pip, venv, git, etc.)
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
git \
less \
nano \
gpg-agent \
python3 \
python3-pip \
python3-venv \
python3-dev \
wget \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/{apt,dpkg,cache,log}
# Install Intel graphics driver for Linux
RUN wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg \
&& echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy/lts/2350 unified" \
> /etc/apt/sources.list.d/intel-gpu-jammy.list
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
intel-level-zero-gpu \
intel-opencl-icd \
clinfo \
level-zero \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/{apt,dpkg,cache,log}
# For IPEX v2.1.10+xpu
# https://intel.github.io/intel-extension-for-pytorch/#installation?platform=gpu&version=v2.1.10%2bxpu&os=linux%2fwsl2&package=pip
# * oneAPI 2024.0
ENV oneapi_pkgs="intel-oneapi-dpcpp-cpp-2024.0 intel-oneapi-mkl-devel=2024.0.0-49656"
ENV python_modules="torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/"
RUN wget -qO- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor > /usr/share/keyrings/oneapi-archive-keyring.gpg \
&& echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" > /etc/apt/sources.list.d/oneAPI.list \
&& apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
${oneapi_pkgs} \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/{apt,dpkg,cache,log}
RUN pip3 install \
${python_modules}
# 1.4.2 has the WOQ bug
# 1.4.1 has the WOQ bug
# 1.4 has the WOQ bug
# 1.3.2 works!
ENV itrex_version=1.3.2
RUN pip install intel-extension-for-transformers==${itrex_version}
# Install system package dependencies that must be met for 1.3.2:
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
libgl1-mesa-glx \
libglib2.0-0 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/{apt,dpkg,cache,log}
I try to run the TTS (English and Multi Language Text-to-Speech) in my PC.
https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/README.md
It occured the
cannot import name 'WeightOnlyQuantizedLinear'
error info as below.