ipex-llm fast_tokenizer error for loading the model Mistral-7B-Instruct-v0.3

alexoctob commented 4 months ago

found intel-openmp in /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/libiomp5.so found tcmalloc in /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/site-packages/ipex_llm/libs/l ibtcmalloc.so +++++ Env Variables +++++ Internal: ENABLE_IOMP = 1 ENABLE_GPU = 0 ENABLE_JEMALLOC = 0 ENABLE_TCMALLOC = 1 LIB_DIR = /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib BIN_DIR = /home/miniforge3/envs/ipex-llm-langchain-chatchat/bin LLM_DIR = /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/site-packages/ipex_llm

Exported: LD_PRELOAD = /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/libiomp5.so /home/minifo rge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/site-packages/ipex_llm/libs/libtcmalloc.so OMP_NUM_THREADS = 32 MALLOC_CONF = USE_XETLA = ENABLE_SDP_FUSION = SYCL_CACHE_PERSISTENT = BIGDL_LLM_XMX_DISABLED = SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = +++++++++++++++++++++++++ Complete. 2024-07-08 10:34:19,619 - utils.py[line:145] - INFO: Note: detected 128 virtual cores but NumExpr set to maximum of 6 4, check "NUMEXPR_MAX_THREADS" environment variable. 2024-07-08 10:34:19,619 - utils.py[line:148] - INFO: Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not s et, so enforcing safe limit of 8.

==============================Langchain-Chatchat Configuration============================== Operating system: Linux-5.15.0-107-generic-x86_64-with-glibc2.35. Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] Project version: v0.2.10 langchain version: 0.0.354. fastchat version: 0.2.35

Current tokenizer: ChineseRecursiveTextSplitter Current running LLM: ['Llama-2-7b-chat-hf', 'Mistral-7B-Instruct-v0.3'] @ cpu {'device': 'cpu', 'host': '0.0.0.0', 'infer_turbo': False, 'model_path': '/workspace/llm-embed-models/Llama-2-7b-chat-hf', 'model_path_exists': True, 'port': 20002} {'device': 'cpu', 'host': '0.0.0.0', 'infer_turbo': False, 'model_path': '/workspace/llm-embed-models/Mistral-7B-Instruct-v0.3', 'model_path_exists': True, 'port': 20002} Current embbeding model: bge-large-en-v1.5 @ cpu ==============================Langchain-Chatchat Configuration==============================

Oscilloscope98 commented 4 months ago

Hi @alexoctob, we are currently working on reproducing this issue and will keep you updated with any progress :)

JinBridger commented 4 months ago

Hi @alexoctob,

This issue could be solved by upgrading transformers to 4.40.0.

Here's the guide to install langchain-chatchat for linux on CPU:

Download the Langchain-Chatchat with IPEX-LLM integrations from this link. Unzip the content into a directory, e.g. /home/arda/Langchain-Chatchat-ipex-llm.

Create a new conda environment by running following commands:

conda create -n ipex-llm-langchain-chatchat python=3.11
conda activate ipex-llm-langchain-chatchat

Run following commands to install ipex-llm

pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
pip3 install torchvision==0.16.2+cpu torchaudio==2.1.2+cpu --index-url https://download.pytorch.org/whl/cpu

Switch to the root directory of Langchain-Chatchat you've downloaded and run following commands to install dependencies:

cd PATH/TO/Langchain-Chatchat-ipex-llm
pip install -r requirements_ipex_llm.txt 
pip install -r requirements_api_ipex_llm.txt
pip install -r requirements_webui.txt
# install transformers==4.40.0 to use Mistral-7B-Instruct-v0.3
pip install transformers==4.40.0

Please feel free to ask if there's any further problem :)

intel-analytics / ipex-llm

ipex-llm fast_tokenizer error for loading the model Mistral-7B-Instruct-v0.3 #11566