intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.45k stars 1.24k forks source link

ipex-llm fast_tokenizer error for loading the model Mistral-7B-Instruct-v0.3 #11566

Open alexoctob opened 1 month ago

alexoctob commented 1 month ago

found intel-openmp in /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/libiomp5.so found tcmalloc in /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/site-packages/ipex_llm/libs/l ibtcmalloc.so +++++ Env Variables +++++ Internal: ENABLE_IOMP = 1 ENABLE_GPU = 0 ENABLE_JEMALLOC = 0 ENABLE_TCMALLOC = 1 LIB_DIR = /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib BIN_DIR = /home/miniforge3/envs/ipex-llm-langchain-chatchat/bin LLM_DIR = /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/site-packages/ipex_llm

Exported: LD_PRELOAD = /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/libiomp5.so /home/minifo rge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/site-packages/ipex_llm/libs/libtcmalloc.so OMP_NUM_THREADS = 32 MALLOC_CONF = USE_XETLA = ENABLE_SDP_FUSION = SYCL_CACHE_PERSISTENT = BIGDL_LLM_XMX_DISABLED = SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = +++++++++++++++++++++++++ Complete. 2024-07-08 10:34:19,619 - utils.py[line:145] - INFO: Note: detected 128 virtual cores but NumExpr set to maximum of 6 4, check "NUMEXPR_MAX_THREADS" environment variable. 2024-07-08 10:34:19,619 - utils.py[line:148] - INFO: Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not s et, so enforcing safe limit of 8.

==============================Langchain-Chatchat Configuration============================== Operating system: Linux-5.15.0-107-generic-x86_64-with-glibc2.35. Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] Project version: v0.2.10 langchain version: 0.0.354. fastchat version: 0.2.35

Current tokenizer: ChineseRecursiveTextSplitter Current running LLM: ['Llama-2-7b-chat-hf', 'Mistral-7B-Instruct-v0.3'] @ cpu {'device': 'cpu', 'host': '0.0.0.0', 'infer_turbo': False, 'model_path': '/workspace/llm-embed-models/Llama-2-7b-chat-hf', 'model_path_exists': True, 'port': 20002} {'device': 'cpu', 'host': '0.0.0.0', 'infer_turbo': False, 'model_path': '/workspace/llm-embed-models/Mistral-7B-Instruct-v0.3', 'model_path_exists': True, 'port': 20002} Current embbeding model: bge-large-en-v1.5 @ cpu ==============================Langchain-Chatchat Configuration==============================

2024-07-08 10:34:23,349 - startup.py[line:705] - INFO: Starting the service: 2024-07-08 10:34:23,349 - startup.py[line:706] - INFO: To view the llm_api logs, please go to /workspace/Langcha in-Chatchat-ipex-llm/logs /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/site-packages/langchain_core/_api/deprecation.p y:117: LangChainDeprecationWarning: The model startup functionality will be rewritten in Langchain-Chatchat 0.3.x to support more modes and accelerate startup. The related functionality in 0.2.x will be deprecated. warn_deprecated( 2024-07-08 10:34:27 | ERROR | stderr | INFO: Started server process [399056] 2024-07-08 10:34:27 | ERROR | stderr | INFO: Waiting for application startup. 2024-07-08 10:34:27 | ERROR | stderr | INFO: Application startup complete. 2024-07-08 10:34:27 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:20000 (Press CTRL+C to quit) 2024-07-08 10:34:28 | INFO | model_worker | Loading the model ['Mistral-7B-Instruct-v0.3'] on worker c47e1f03, worker type: BigDLLLM worker... 2024-07-08 10:34:28 | INFO | model_worker | Using low bit format: sym_int4, device: cpu 2024-07-08 10:34:28 | ERROR | stderr | Process model_worker - Mistral-7B-Instruct-v0.3: 2024-07-08 10:34:28 | ERROR | stderr | Traceback (most recent call last): 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ multiprocessing/process.py", line 314, in _bootstrap 2024-07-08 10:34:28 | ERROR | stderr | self.run() 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ multiprocessing/process.py", line 108, in run 2024-07-08 10:34:28 | ERROR | stderr | self._target(self._args, self._kwargs) 2024-07-08 10:34:28 | ERROR | stderr | File "/workspace/Langchain-Chatchat-ipex-llm/startup.py", line 439, in run_model_worker 2024-07-08 10:34:28 | ERROR | stderr | app = create_model_worker_app(log_level=log_level, kwargs) 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/workspace/Langchain-Chatchat-ipex-llm/startup.py", line 227, in create_model_worker_app 2024-07-08 10:34:28 | ERROR | stderr | worker = BigDLLLMWorker( 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/ipex_llm/serving/fastchat/ipex_llm_worker.py", line 99, in init 2024-07-08 10:34:28 | ERROR | stderr | self.model, self.tokenizer = load_model( 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/ipex_llm/transformers/loader.py", line 78, in load_model 2024-07-08 10:34:28 | ERROR | stderr | tokenizer = tokenizer_cls.from_pretrained(model_path, trust_remote_code=Tr ue) 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/transformers/models/auto/tokenization_auto.py", line 702, in from_pretrained 2024-07-08 10:34:28 | ERROR | stderr | return tokenizer_class.from_pretrained(pretrained_model_name_or_path, inp uts, *kwargs) 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/transformers/tokenization_utils_base.py", line 1841, in from_pretrained 2024-07-08 10:34:28 | ERROR | stderr | return cls._from_pretrained( 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/transformers/tokenization_utils_base.py", line 2004, in _from_pretrained 2024-07-08 10:34:28 | ERROR | stderr | tokenizer = cls(init_inputs, **init_kwargs) 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/transformers/models/llama/tokenization_llama_fast.py", line 115, in init 2024-07-08 10:34:28 | ERROR | stderr | super().init( 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/transformers/tokenization_utils_fast.py", line 111, in init 2024-07-08 10:34:28 | ERROR | stderr | fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | Exception: data did not match any variant of untagged enum PyPreTokenizerTypeW rapper at line 6952 column 3 2024-07-08 10:34:28 | INFO | model_worker | Loading the model ['Llama-2-7b-chat-hf'] on worker 97caf8dd, worker type: BigDLLLM worker... 2024-07-08 10:34:28 | INFO | model_worker | Using low bit format: sym_int4, device: cpu Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|███████████████████████████▌ | 1/2 [00:00<00:00, 8.45it/s] Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 2/2 [00:00<00:00, 10.32it/s] 2024-07-08 10:34:29 | ERROR | stderr | 2024-07-08 10:34:29 | INFO | ipex_llm.transformers.utils | Converting the current model to sym_int4 format...... 2024-07-08 10:34:31 | INFO | stdout | <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'> 2024-07-08 10:34:31 | INFO | model_worker | enable benchmark successfully 2024-07-08 10:34:31 | INFO | model_worker | Register to controller

Oscilloscope98 commented 1 month ago

Hi @alexoctob, we are currently working on reproducing this issue and will keep you updated with any progress :)

JinBridger commented 1 month ago

Hi @alexoctob,

This issue could be solved by upgrading transformers to 4.40.0.

Here's the guide to install langchain-chatchat for linux on CPU:

  1. Download the Langchain-Chatchat with IPEX-LLM integrations from this link. Unzip the content into a directory, e.g. /home/arda/Langchain-Chatchat-ipex-llm.
  2. Create a new conda environment by running following commands:
    conda create -n ipex-llm-langchain-chatchat python=3.11
    conda activate ipex-llm-langchain-chatchat
  3. Run following commands to install ipex-llm
    pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
    pip3 install torchvision==0.16.2+cpu torchaudio==2.1.2+cpu --index-url https://download.pytorch.org/whl/cpu
  4. Switch to the root directory of Langchain-Chatchat you've downloaded and run following commands to install dependencies:
    cd PATH/TO/Langchain-Chatchat-ipex-llm
    pip install -r requirements_ipex_llm.txt 
    pip install -r requirements_api_ipex_llm.txt
    pip install -r requirements_webui.txt
    # install transformers==4.40.0 to use Mistral-7B-Instruct-v0.3
    pip install transformers==4.40.0

Please feel free to ask if there's any further problem :)