Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.26k
stars
1.23k
forks
source link
Converting mistralai/Mistral-7B-Instruct-v0.2 to lower 4 bit running into error #10613
I am trying to convert and save model: "mistralai/Mistral-7B-Instruct-V0.2" in 4bit and running into an error. I am using Flex GPU.
Could you please kindly help.
The error:
I am using the code "/home/ceed-user/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load/generate.py"
command:
python ./generate.py --save-path /home/ceed-user/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load/int4/ --repo-id-or-model-path "mistralai/Mistral-7B-Instruct-v0.2"
Modified the generate.py to use:
AutoTokenizer, AutoModelForCausalLM
This is the output of my env-check.sh :
PYTHON_VERSION=3.9.0
transformers=4.39.2
torch=2.1.0a0+cxx11.abi
ipex-llm Version: 2.1.0b20240326
/home/ceed-user/anaconda3/envs/ipex_llm_gpu/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
ipex=2.1.10+xpu
CPU Information:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Gold 6430L
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
Stepping: 7
CPU max MHz: 3400.0000
CPU min MHz: 800.0000
BogoMIPS: 3800.00
MemTotal: 131585340 kB
ulimit:
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 513519
max locked memory (kbytes, -l) 16448164
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 513519
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Hi,
I am trying to convert and save model: "mistralai/Mistral-7B-Instruct-V0.2" in 4bit and running into an error. I am using Flex GPU. Could you please kindly help.
The error:
I am using the code "/home/ceed-user/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load/generate.py" command: python ./generate.py --save-path /home/ceed-user/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load/int4/ --repo-id-or-model-path "mistralai/Mistral-7B-Instruct-v0.2"
Modified the generate.py to use: AutoTokenizer, AutoModelForCausalLM
This is the output of my env-check.sh :
PYTHON_VERSION=3.9.0
transformers=4.39.2
torch=2.1.0a0+cxx11.abi
ipex-llm Version: 2.1.0b20240326
/home/ceed-user/anaconda3/envs/ipex_llm_gpu/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from
torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source? warn( ipex=2.1.10+xpuCPU Information: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 6430L CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 2 Stepping: 7 CPU max MHz: 3400.0000 CPU min MHz: 800.0000 BogoMIPS: 3800.00
MemTotal: 131585340 kB
ulimit: real-time non-blocking time (microseconds, -R) unlimited core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 513519 max locked memory (kbytes, -l) 16448164 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 513519 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
Operating System: Ubuntu 22.04.4 LTS \n \l
Environment Variable: SHELL=/bin/bash TBBROOT=/opt/intel/oneapi/tbb/2021.11/env/.. no_proxy=localhost,127.0.0.0/8 ONEAPI_ROOT=/opt/intel/oneapi CONDA_EXE=/home/ceed-user/anaconda3/bin/conda _CE_M= PKG_CONFIG_PATH=/opt/intel/oneapi/tbb/2021.11/env/../lib/pkgconfig:/opt/intel/oneapi/mpi/2021.11/lib/pkgconfig:/opt/intel/oneapi/mkl/2024.0/lib/pkgconfig:/opt/intel/oneapi/dpl/2022.3/lib/pkgconfig:/opt/intel/oneapi/dnnl/2024.0/lib/pkgconfig:/opt/intel/oneapi/compiler/2024.0/lib/pkgconfig:/opt/intel/oneapi/ccl/2021.11/lib/pkgconfig/:/opt/intel/oneapi/tbb/2021.11/env/../lib/pkgconfig:/opt/intel/oneapi/mpi/2021.11/lib/pkgconfig:/opt/intel/oneapi/mkl/2024.0/lib/pkgconfig:/opt/intel/oneapi/dpl/2022.3/lib/pkgconfig:/opt/intel/oneapi/dnnl/2024.0/lib/pkgconfig:/opt/intel/oneapi/compiler/2024.0/lib/pkgconfig:/opt/intel/oneapi/ccl/2021.11/lib/pkgconfig/ ACL_BOARD_VENDOR_PATH=/opt/Intel/OpenCLFPGA/oneAPI/Boards FPGA_VARS_DIR=/opt/intel/oneapi/compiler/2024.0/opt/oclfpga CCL_ROOT=/opt/intel/oneapi/ccl/2021.11 I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.11 FI_PROVIDER_PATH=/opt/intel/oneapi/mpi/2021.11/opt/mpi/libfabric/lib/prov:/usr/lib/x86_64-linux-gnu/libfabric DNNLROOT=/opt/intel/oneapi/dnnl/2024.0 DIAGUTIL_PATH=/opt/intel/oneapi/debugger/2024.0/etc/debugger/sys_check/sys_check.py:/opt/intel/oneapi/compiler/2024.0/etc/compiler/sys_check/sys_check.sh:/opt/intel/oneapi/debugger/2024.0/etc/debugger/sys_check/sys_check.py:/opt/intel/oneapi/compiler/2024.0/etc/compiler/sys_check/sys_check.sh PWD=/home/ceed-user/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load CCL_CONFIGURATION=cpu_gpu_dpcpp LOGNAME=ceed-user DPL_ROOT=/opt/intel/oneapi/dpl/2022.3 XDG_SESSION_TYPE=tty CONDA_PREFIX=/home/ceed-user/anaconda3/envs/ipex_llm_gpu MANPATH=/opt/intel/oneapi/mpi/2021.11/share/man:/opt/intel/oneapi/debugger/2024.0/share/man:/opt/intel/oneapi/compiler/2024.0/documentation/en/man/common:/opt/intel/oneapi/mpi/2021.11/share/man:/opt/intel/oneapi/debugger/2024.0/share/man:/opt/intel/oneapi/compiler/2024.0/documentation/en/man/common: MOTD_SHOWN=pam HOME=/home/ceed-user GDB_INFO=/opt/intel/oneapi/debugger/2024.0/share/info/:/opt/intel/oneapi/debugger/2024.0/share/info/ CCL_CONFIGURATION_PATH= LANG=en_US.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36: SETVARS_COMPLETED=1 CONDA_PROMPT_MODIFIER=(ipex_llm_gpu) CMAKE_PREFIX_PATH=/opt/intel/oneapi/tbb/2021.11/env/..:/opt/intel/oneapi/mkl/2024.0/lib/cmake:/opt/intel/oneapi/dpl/2022.3/lib/cmake/oneDPL:/opt/intel/oneapi/dnnl/2024.0/lib/cmake:/opt/intel/oneapi/compiler/2024.0:/opt/intel/oneapi/tbb/2021.11/env/..:/opt/intel/oneapi/mkl/2024.0/lib/cmake:/opt/intel/oneapi/dpl/2022.3/lib/cmake/oneDPL:/opt/intel/oneapi/dnnl/2024.0/lib/cmake:/opt/intel/oneapi/compiler/2024.0 https_proxy=http://proxy-chain.intel.com:911 SSH_CONNECTION=10.209.101.31 59525 10.72.13.153 22 CMPLR_ROOT=/opt/intel/oneapi/compiler/2024.0 FPGA_VARS_ARGS= INFOPATH=/opt/intel/oneapi/debugger/2024.0/opt/debugger/lib:/opt/intel/oneapi/debugger/2024.0/opt/debugger/lib LESSCLOSE=/usr/bin/lesspipe %s %s XDG_SESSION_CLASS=user TERM=xterm _CE_CONDA= LESSOPEN=| /usr/bin/lesspipe %s USER=ceed-user NO_PROXY=127.0.0.1,localhost,192.168.102.1/16,10.0.0.0/8,certificates.intel.com,amr-registry.caas.intel.com,ubit-artifactory-or.intel.com,.maestro.intel.com,files.internal.ledgepark.intel.com,192.168.102.13 LIBRARY_PATH=/opt/intel/oneapi/tbb/2021.11/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.11/lib:/opt/intel/oneapi/mkl/2024.0/lib/:/opt/intel/oneapi/dpl/2022.3/lib:/opt/intel/oneapi/dnnl/2024.0/lib:/opt/intel/oneapi/compiler/2024.0/lib:/opt/intel/oneapi/ccl/2021.11/lib/:/opt/intel/oneapi/tbb/2021.11/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.11/lib:/opt/intel/oneapi/mkl/2024.0/lib/:/opt/intel/oneapi/dpl/2022.3/lib:/opt/intel/oneapi/dnnl/2024.0/lib:/opt/intel/oneapi/compiler/2024.0/lib:/opt/intel/oneapi/ccl/2021.11/lib/ CONDA_SHLVL=1 DISPLAY=localhost:10.0 SHLVL=2 HTTPS_PROXY=http://proxy-dmz.intel.com:912 HTTP_PROXY=http://proxy-dmz.intel.com:911 OCL_ICD_FILENAMES=libintelocl_emu.so:libalteracl.so:/opt/intel/oneapi/compiler/2024.0/lib/libintelocl.so XDG_SESSION_ID=1 http_proxy=http://proxy-chain.intel.com:911 CONDA_PYTHON_EXE=/home/ceed-user/anaconda3/bin/python CLASSPATH=/opt/intel/oneapi/mpi/2021.11/share/java/mpi.jar:/opt/intel/oneapi/mpi/2021.11/share/java/mpi.jar INTELFPGAOCLSDKROOT=/opt/intel/oneapi/compiler/2024.0/opt/oclfpga LD_LIBRARY_PATH=/opt/intel/oneapi/tbb/2021.11/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.11/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.11/lib:/opt/intel/oneapi/mkl/2024.0/lib:/opt/intel/oneapi/dpl/2022.3/lib:/opt/intel/oneapi/dnnl/2024.0/lib:/opt/intel/oneapi/debugger/2024.0/opt/debugger/lib:/opt/intel/oneapi/compiler/2024.0/opt/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2024.0/opt/compiler/lib:/opt/intel/oneapi/compiler/2024.0/lib:/opt/intel/oneapi/ccl/2021.11/lib/:/opt/intel/oneapi/tbb/2021.11/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mpi/2021.11/opt/mpi/libfabric/lib:/opt/intel/oneapi/mpi/2021.11/lib:/opt/intel/oneapi/mkl/2024.0/lib:/opt/intel/oneapi/dpl/2022.3/lib:/opt/intel/oneapi/dnnl/2024.0/lib:/opt/intel/oneapi/debugger/2024.0/opt/debugger/lib:/opt/intel/oneapi/compiler/2024.0/opt/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2024.0/opt/compiler/lib:/opt/intel/oneapi/compiler/2024.0/lib:/opt/intel/oneapi/ccl/2021.11/lib/ XDG_RUNTIME_DIR=/run/user/1000 SSH_CLIENT=10.209.101.31 59525 22 CONDA_DEFAULT_ENV=ipex_llm_gpu MKLROOT=/opt/intel/oneapi/mkl/2024.0 XDG_DATADIRS=/usr/share/gnome:/home/ceed-user/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share:/var/lib/snapd/desktop NLSPATH=/opt/intel/oneapi/mkl/2024.0/share/locale/%l%t/%N:/opt/intel/oneapi/compiler/2024.0/lib/locale/%l%t/%N:/opt/intel/oneapi/mkl/2024.0/share/locale/%l%t/%N:/opt/intel/oneapi/compiler/2024.0/lib/locale/%l_%t/%N PATH=/opt/intel/oneapi/mpi/2021.11/opt/mpi/libfabric/bin:/opt/intel/oneapi/mpi/2021.11/bin:/opt/intel/oneapi/mkl/2024.0/bin/:/opt/intel/oneapi/dev-utilities/2024.0/bin:/opt/intel/oneapi/debugger/2024.0/opt/debugger/bin:/opt/intel/oneapi/compiler/2024.0/opt/oclfpga/bin:/opt/intel/oneapi/compiler/2024.0/bin:/home/ceed-user/anaconda3/envs/ipex_llm_gpu/bin:/opt/intel/oneapi/mpi/2021.11/opt/mpi/libfabric/bin:/opt/intel/oneapi/mpi/2021.11/bin:/opt/intel/oneapi/mkl/2024.0/bin:/opt/intel/oneapi/dev-utilities/2024.0/bin:/opt/intel/oneapi/debugger/2024.0/opt/debugger/bin:/opt/intel/oneapi/compiler/2024.0/opt/oclfpga/bin:/opt/intel/oneapi/compiler/2024.0/bin:/home/ceed-user/.local/bin:/home/ceed-user/bin:/home/ceed-user/bin:/home/ceed-user/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin INTEL_PYTHONHOME=/opt/intel/oneapi/debugger/2024.0/opt/debugger DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus SSHTTY=/dev/pts/0 CPATH=/opt/intel/oneapi/tbb/2021.11/env/../include:/opt/intel/oneapi/mpi/2021.11/include:/opt/intel/oneapi/mkl/2024.0/include:/opt/intel/oneapi/dpl/2022.3/include:/opt/intel/oneapi/dnnl/2024.0/include:/opt/intel/oneapi/dev-utilities/2024.0/include:/opt/intel/oneapi/compiler/2024.0/opt/oclfpga/include:/opt/intel/oneapi/ccl/2021.11/include:/opt/intel/oneapi/tbb/2021.11/env/../include:/opt/intel/oneapi/mpi/2021.11/include:/opt/intel/oneapi/mkl/2024.0/include:/opt/intel/oneapi/dpl/2022.3/include:/opt/intel/oneapi/dnnl/2024.0/include:/opt/intel/oneapi/dev-utilities/2024.0/include:/opt/intel/oneapi/compiler/2024.0/opt/oclfpga/include:/opt/intel/oneapi/ccl/2021.11/include OLDPWD=/home/ceed-user =/usr/bin/printenv
xpu-smi is properly installed.
+-----------+--------------------------------------------------------------------------------------+ | Device ID | Device Information | +-----------+--------------------------------------------------------------------------------------+ | 0 | Device Name: Intel(R) Data Center GPU Flex 170 | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0000-6cf6-5109f1c50433 | | | PCI BDF Address: 0000:ae:00.0 | | | DRM Device: /dev/dri/card1 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+