intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.26k stars 1.23k forks source link

Converting mistralai/Mistral-7B-Instruct-v0.2 to lower 4 bit running into error #10613

Open tsantra opened 3 months ago

tsantra commented 3 months ago

Hi,

I am trying to convert and save model: "mistralai/Mistral-7B-Instruct-V0.2" in 4bit and running into an error. I am using Flex GPU. Could you please kindly help.

The error:

image

I am using the code "/home/ceed-user/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load/generate.py" command: python ./generate.py --save-path /home/ceed-user/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load/int4/ --repo-id-or-model-path "mistralai/Mistral-7B-Instruct-v0.2"

Modified the generate.py to use: AutoTokenizer, AutoModelForCausalLM

This is the output of my env-check.sh :


PYTHON_VERSION=3.9.0

transformers=4.39.2

torch=2.1.0a0+cxx11.abi

ipex-llm Version: 2.1.0b20240326

/home/ceed-user/anaconda3/envs/ipex_llm_gpu/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( ipex=2.1.10+xpu

CPU Information: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 6430L CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 2 Stepping: 7 CPU max MHz: 3400.0000 CPU min MHz: 800.0000 BogoMIPS: 3800.00

MemTotal: 131585340 kB

ulimit: real-time non-blocking time (microseconds, -R) unlimited core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 513519 max locked memory (kbytes, -l) 16448164 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 513519 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

Operating System: Ubuntu 22.04.4 LTS \n \l


Environment Variable: SHELL=/bin/bash TBBROOT=/opt/intel/oneapi/tbb/2021.11/env/.. no_proxy=localhost,127.0.0.0/8 ONEAPI_ROOT=/opt/intel/oneapi CONDA_EXE=/home/ceed-user/anaconda3/bin/conda _CE_M= PKG_CONFIG_PATH=/opt/intel/oneapi/tbb/2021.11/env/../lib/pkgconfig:/opt/intel/oneapi/mpi/2021.11/lib/pkgconfig:/opt/intel/oneapi/mkl/2024.0/lib/pkgconfig:/opt/intel/oneapi/dpl/2022.3/lib/pkgconfig:/opt/intel/oneapi/dnnl/2024.0/lib/pkgconfig:/opt/intel/oneapi/compiler/2024.0/lib/pkgconfig:/opt/intel/oneapi/ccl/2021.11/lib/pkgconfig/:/opt/intel/oneapi/tbb/2021.11/env/../lib/pkgconfig:/opt/intel/oneapi/mpi/2021.11/lib/pkgconfig:/opt/intel/oneapi/mkl/2024.0/lib/pkgconfig:/opt/intel/oneapi/dpl/2022.3/lib/pkgconfig:/opt/intel/oneapi/dnnl/2024.0/lib/pkgconfig:/opt/intel/oneapi/compiler/2024.0/lib/pkgconfig:/opt/intel/oneapi/ccl/2021.11/lib/pkgconfig/ ACL_BOARD_VENDOR_PATH=/opt/Intel/OpenCLFPGA/oneAPI/Boards FPGA_VARS_DIR=/opt/intel/oneapi/compiler/2024.0/opt/oclfpga CCL_ROOT=/opt/intel/oneapi/ccl/2021.11 I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.11 FI_PROVIDER_PATH=/opt/intel/oneapi/mpi/2021.11/opt/mpi/libfabric/lib/prov:/usr/lib/x86_64-linux-gnu/libfabric DNNLROOT=/opt/intel/oneapi/dnnl/2024.0 DIAGUTIL_PATH=/opt/intel/oneapi/debugger/2024.0/etc/debugger/sys_check/sys_check.py:/opt/intel/oneapi/compiler/2024.0/etc/compiler/sys_check/sys_check.sh:/opt/intel/oneapi/debugger/2024.0/etc/debugger/sys_check/sys_check.py:/opt/intel/oneapi/compiler/2024.0/etc/compiler/sys_check/sys_check.sh PWD=/home/ceed-user/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load CCL_CONFIGURATION=cpu_gpu_dpcpp LOGNAME=ceed-user DPL_ROOT=/opt/intel/oneapi/dpl/2022.3 XDG_SESSION_TYPE=tty CONDA_PREFIX=/home/ceed-user/anaconda3/envs/ipex_llm_gpu MANPATH=/opt/intel/oneapi/mpi/2021.11/share/man:/opt/intel/oneapi/debugger/2024.0/share/man:/opt/intel/oneapi/compiler/2024.0/documentation/en/man/common:/opt/intel/oneapi/mpi/2021.11/share/man:/opt/intel/oneapi/debugger/2024.0/share/man:/opt/intel/oneapi/compiler/2024.0/documentation/en/man/common: MOTD_SHOWN=pam HOME=/home/ceed-user GDB_INFO=/opt/intel/oneapi/debugger/2024.0/share/info/:/opt/intel/oneapi/debugger/2024.0/share/info/ CCL_CONFIGURATION_PATH= LANG=en_US.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36: SETVARS_COMPLETED=1 CONDA_PROMPT_MODIFIER=(ipex_llm_gpu) CMAKE_PREFIX_PATH=/opt/intel/oneapi/tbb/2021.11/env/..:/opt/intel/oneapi/mkl/2024.0/lib/cmake:/opt/intel/oneapi/dpl/2022.3/lib/cmake/oneDPL:/opt/intel/oneapi/dnnl/2024.0/lib/cmake:/opt/intel/oneapi/compiler/2024.0:/opt/intel/oneapi/tbb/2021.11/env/..:/opt/intel/oneapi/mkl/2024.0/lib/cmake:/opt/intel/oneapi/dpl/2022.3/lib/cmake/oneDPL:/opt/intel/oneapi/dnnl/2024.0/lib/cmake:/opt/intel/oneapi/compiler/2024.0 https_proxy=http://proxy-chain.intel.com:911 SSH_CONNECTION=10.209.101.31 59525 10.72.13.153 22 CMPLR_ROOT=/opt/intel/oneapi/compiler/2024.0 FPGA_VARS_ARGS= INFOPATH=/opt/intel/oneapi/debugger/2024.0/opt/debugger/lib:/opt/intel/oneapi/debugger/2024.0/opt/debugger/lib LESSCLOSE=/usr/bin/lesspipe %s %s XDG_SESSION_CLASS=user TERM=xterm _CE_CONDA= LESSOPEN=| /usr/bin/lesspipe %s USER=ceed-user NO_PROXY=127.0.0.1,localhost,192.168.102.1/16,10.0.0.0/8,certificates.intel.com,amr-registry.caas.intel.com,ubit-artifactory-or.intel.com,.maestro.intel.com,files.internal.ledgepark.intel.com,192.168.102.13 LIBRARY_PATH=/opt/intel/oneapi/tbb/2021.11/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.11/lib:/opt/intel/oneapi/mkl/2024.0/lib/:/opt/intel/oneapi/dpl/2022.3/lib:/opt/intel/oneapi/dnnl/2024.0/lib:/opt/intel/oneapi/compiler/2024.0/lib:/opt/intel/oneapi/ccl/2021.11/lib/:/opt/intel/oneapi/tbb/2021.11/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.11/lib:/opt/intel/oneapi/mkl/2024.0/lib/:/opt/intel/oneapi/dpl/2022.3/lib:/opt/intel/oneapi/dnnl/2024.0/lib:/opt/intel/oneapi/compiler/2024.0/lib:/opt/intel/oneapi/ccl/2021.11/lib/ CONDA_SHLVL=1 DISPLAY=localhost:10.0 SHLVL=2 HTTPS_PROXY=http://proxy-dmz.intel.com:912 HTTP_PROXY=http://proxy-dmz.intel.com:911 OCL_ICD_FILENAMES=libintelocl_emu.so:libalteracl.so:/opt/intel/oneapi/compiler/2024.0/lib/libintelocl.so XDG_SESSION_ID=1 http_proxy=http://proxy-chain.intel.com:911 CONDA_PYTHON_EXE=/home/ceed-user/anaconda3/bin/python CLASSPATH=/opt/intel/oneapi/mpi/2021.11/share/java/mpi.jar:/opt/intel/oneapi/mpi/2021.11/share/java/mpi.jar INTELFPGAOCLSDKROOT=/opt/intel/oneapi/compiler/2024.0/opt/oclfpga LD_LIBRARY_PATH=/opt/intel/oneapi/tbb/2021.11/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.11/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.11/lib:/opt/intel/oneapi/mkl/2024.0/lib:/opt/intel/oneapi/dpl/2022.3/lib:/opt/intel/oneapi/dnnl/2024.0/lib:/opt/intel/oneapi/debugger/2024.0/opt/debugger/lib:/opt/intel/oneapi/compiler/2024.0/opt/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2024.0/opt/compiler/lib:/opt/intel/oneapi/compiler/2024.0/lib:/opt/intel/oneapi/ccl/2021.11/lib/:/opt/intel/oneapi/tbb/2021.11/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.11/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.11/lib:/opt/intel/oneapi/mkl/2024.0/lib:/opt/intel/oneapi/dpl/2022.3/lib:/opt/intel/oneapi/dnnl/2024.0/lib:/opt/intel/oneapi/debugger/2024.0/opt/debugger/lib:/opt/intel/oneapi/compiler/2024.0/opt/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2024.0/opt/compiler/lib:/opt/intel/oneapi/compiler/2024.0/lib:/opt/intel/oneapi/ccl/2021.11/lib/ XDG_RUNTIME_DIR=/run/user/1000 SSH_CLIENT=10.209.101.31 59525 22 CONDA_DEFAULT_ENV=ipex_llm_gpu MKLROOT=/opt/intel/oneapi/mkl/2024.0 XDG_DATADIRS=/usr/share/gnome:/home/ceed-user/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share:/var/lib/snapd/desktop NLSPATH=/opt/intel/oneapi/mkl/2024.0/share/locale/%l%t/%N:/opt/intel/oneapi/compiler/2024.0/lib/locale/%l%t/%N:/opt/intel/oneapi/mkl/2024.0/share/locale/%l%t/%N:/opt/intel/oneapi/compiler/2024.0/lib/locale/%l_%t/%N PATH=/opt/intel/oneapi/mpi/2021.11/opt/mpi/libfabric/bin:/opt/intel/oneapi/mpi/2021.11/bin:/opt/intel/oneapi/mkl/2024.0/bin/:/opt/intel/oneapi/dev-utilities/2024.0/bin:/opt/intel/oneapi/debugger/2024.0/opt/debugger/bin:/opt/intel/oneapi/compiler/2024.0/opt/oclfpga/bin:/opt/intel/oneapi/compiler/2024.0/bin:/home/ceed-user/anaconda3/envs/ipex_llm_gpu/bin:/opt/intel/oneapi/mpi/2021.11/opt/mpi/libfabric/bin:/opt/intel/oneapi/mpi/2021.11/bin:/opt/intel/oneapi/mkl/2024.0/bin:/opt/intel/oneapi/dev-utilities/2024.0/bin:/opt/intel/oneapi/debugger/2024.0/opt/debugger/bin:/opt/intel/oneapi/compiler/2024.0/opt/oclfpga/bin:/opt/intel/oneapi/compiler/2024.0/bin:/home/ceed-user/.local/bin:/home/ceed-user/bin:/home/ceed-user/bin:/home/ceed-user/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin INTEL_PYTHONHOME=/opt/intel/oneapi/debugger/2024.0/opt/debugger DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus SSHTTY=/dev/pts/0 CPATH=/opt/intel/oneapi/tbb/2021.11/env/../include:/opt/intel/oneapi/mpi/2021.11/include:/opt/intel/oneapi/mkl/2024.0/include:/opt/intel/oneapi/dpl/2022.3/include:/opt/intel/oneapi/dnnl/2024.0/include:/opt/intel/oneapi/dev-utilities/2024.0/include:/opt/intel/oneapi/compiler/2024.0/opt/oclfpga/include:/opt/intel/oneapi/ccl/2021.11/include:/opt/intel/oneapi/tbb/2021.11/env/../include:/opt/intel/oneapi/mpi/2021.11/include:/opt/intel/oneapi/mkl/2024.0/include:/opt/intel/oneapi/dpl/2022.3/include:/opt/intel/oneapi/dnnl/2024.0/include:/opt/intel/oneapi/dev-utilities/2024.0/include:/opt/intel/oneapi/compiler/2024.0/opt/oclfpga/include:/opt/intel/oneapi/ccl/2021.11/include OLDPWD=/home/ceed-user =/usr/bin/printenv

xpu-smi is properly installed.

+-----------+--------------------------------------------------------------------------------------+ | Device ID | Device Information | +-----------+--------------------------------------------------------------------------------------+ | 0 | Device Name: Intel(R) Data Center GPU Flex 170 | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0000-6cf6-5109f1c50433 | | | PCI BDF Address: 0000:ae:00.0 | | | DRM Device: /dev/dri/card1 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+

qiuxin2012 commented 3 months ago

Please downgrade transformers to 4.34.0, pip install transformers==4.34.0 https://github.com/qiuxin2012/BigDL/tree/gemma-example/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral