Open alexoctob opened 4 months ago
Hi @alexoctob, we are currently working on reproducing this issue and will keep you updated with any progress :)
Hi @alexoctob,
This issue could be solved by upgrading transformers to 4.40.0.
Here's the guide to install langchain-chatchat for linux on CPU:
conda create -n ipex-llm-langchain-chatchat python=3.11
conda activate ipex-llm-langchain-chatchat
ipex-llm
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
pip3 install torchvision==0.16.2+cpu torchaudio==2.1.2+cpu --index-url https://download.pytorch.org/whl/cpu
cd PATH/TO/Langchain-Chatchat-ipex-llm
pip install -r requirements_ipex_llm.txt
pip install -r requirements_api_ipex_llm.txt
pip install -r requirements_webui.txt
# install transformers==4.40.0 to use Mistral-7B-Instruct-v0.3
pip install transformers==4.40.0
Please feel free to ask if there's any further problem :)
found intel-openmp in /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/libiomp5.so found tcmalloc in /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/site-packages/ipex_llm/libs/l ibtcmalloc.so +++++ Env Variables +++++ Internal: ENABLE_IOMP = 1 ENABLE_GPU = 0 ENABLE_JEMALLOC = 0 ENABLE_TCMALLOC = 1 LIB_DIR = /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib BIN_DIR = /home/miniforge3/envs/ipex-llm-langchain-chatchat/bin LLM_DIR = /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/site-packages/ipex_llm
Exported: LD_PRELOAD = /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/libiomp5.so /home/minifo rge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/site-packages/ipex_llm/libs/libtcmalloc.so OMP_NUM_THREADS = 32 MALLOC_CONF = USE_XETLA = ENABLE_SDP_FUSION = SYCL_CACHE_PERSISTENT = BIGDL_LLM_XMX_DISABLED = SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = +++++++++++++++++++++++++ Complete. 2024-07-08 10:34:19,619 - utils.py[line:145] - INFO: Note: detected 128 virtual cores but NumExpr set to maximum of 6 4, check "NUMEXPR_MAX_THREADS" environment variable. 2024-07-08 10:34:19,619 - utils.py[line:148] - INFO: Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not s et, so enforcing safe limit of 8.
==============================Langchain-Chatchat Configuration============================== Operating system: Linux-5.15.0-107-generic-x86_64-with-glibc2.35. Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] Project version: v0.2.10 langchain version: 0.0.354. fastchat version: 0.2.35
Current tokenizer: ChineseRecursiveTextSplitter Current running LLM: ['Llama-2-7b-chat-hf', 'Mistral-7B-Instruct-v0.3'] @ cpu {'device': 'cpu', 'host': '0.0.0.0', 'infer_turbo': False, 'model_path': '/workspace/llm-embed-models/Llama-2-7b-chat-hf', 'model_path_exists': True, 'port': 20002} {'device': 'cpu', 'host': '0.0.0.0', 'infer_turbo': False, 'model_path': '/workspace/llm-embed-models/Mistral-7B-Instruct-v0.3', 'model_path_exists': True, 'port': 20002} Current embbeding model: bge-large-en-v1.5 @ cpu ==============================Langchain-Chatchat Configuration==============================
2024-07-08 10:34:23,349 - startup.py[line:705] - INFO: Starting the service: 2024-07-08 10:34:23,349 - startup.py[line:706] - INFO: To view the llm_api logs, please go to /workspace/Langcha in-Chatchat-ipex-llm/logs /home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/site-packages/langchain_core/_api/deprecation.p y:117: LangChainDeprecationWarning: The model startup functionality will be rewritten in Langchain-Chatchat 0.3.x to support more modes and accelerate startup. The related functionality in 0.2.x will be deprecated. warn_deprecated( 2024-07-08 10:34:27 | ERROR | stderr | INFO: Started server process [399056] 2024-07-08 10:34:27 | ERROR | stderr | INFO: Waiting for application startup. 2024-07-08 10:34:27 | ERROR | stderr | INFO: Application startup complete. 2024-07-08 10:34:27 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:20000 (Press CTRL+C to quit) 2024-07-08 10:34:28 | INFO | model_worker | Loading the model ['Mistral-7B-Instruct-v0.3'] on worker c47e1f03, worker type: BigDLLLM worker... 2024-07-08 10:34:28 | INFO | model_worker | Using low bit format: sym_int4, device: cpu 2024-07-08 10:34:28 | ERROR | stderr | Process model_worker - Mistral-7B-Instruct-v0.3: 2024-07-08 10:34:28 | ERROR | stderr | Traceback (most recent call last): 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ multiprocessing/process.py", line 314, in _bootstrap 2024-07-08 10:34:28 | ERROR | stderr | self.run() 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ multiprocessing/process.py", line 108, in run 2024-07-08 10:34:28 | ERROR | stderr | self._target(self._args, self._kwargs) 2024-07-08 10:34:28 | ERROR | stderr | File "/workspace/Langchain-Chatchat-ipex-llm/startup.py", line 439, in run_model_worker 2024-07-08 10:34:28 | ERROR | stderr | app = create_model_worker_app(log_level=log_level, kwargs) 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/workspace/Langchain-Chatchat-ipex-llm/startup.py", line 227, in create_model_worker_app 2024-07-08 10:34:28 | ERROR | stderr | worker = BigDLLLMWorker( 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/ipex_llm/serving/fastchat/ipex_llm_worker.py", line 99, in init 2024-07-08 10:34:28 | ERROR | stderr | self.model, self.tokenizer = load_model( 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/ipex_llm/transformers/loader.py", line 78, in load_model 2024-07-08 10:34:28 | ERROR | stderr | tokenizer = tokenizer_cls.from_pretrained(model_path, trust_remote_code=Tr ue) 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/transformers/models/auto/tokenization_auto.py", line 702, in from_pretrained 2024-07-08 10:34:28 | ERROR | stderr | return tokenizer_class.from_pretrained(pretrained_model_name_or_path, inp uts, *kwargs) 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/transformers/tokenization_utils_base.py", line 1841, in from_pretrained 2024-07-08 10:34:28 | ERROR | stderr | return cls._from_pretrained( 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/transformers/tokenization_utils_base.py", line 2004, in _from_pretrained 2024-07-08 10:34:28 | ERROR | stderr | tokenizer = cls(init_inputs, **init_kwargs) 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/transformers/models/llama/tokenization_llama_fast.py", line 115, in init 2024-07-08 10:34:28 | ERROR | stderr | super().init( 2024-07-08 10:34:28 | ERROR | stderr | File "/home/miniforge3/envs/ipex-llm-langchain-chatchat/lib/python3.11/ site-packages/transformers/tokenization_utils_fast.py", line 111, in init 2024-07-08 10:34:28 | ERROR | stderr | fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) 2024-07-08 10:34:28 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-07-08 10:34:28 | ERROR | stderr | Exception: data did not match any variant of untagged enum PyPreTokenizerTypeW rapper at line 6952 column 3 2024-07-08 10:34:28 | INFO | model_worker | Loading the model ['Llama-2-7b-chat-hf'] on worker 97caf8dd, worker type: BigDLLLM worker... 2024-07-08 10:34:28 | INFO | model_worker | Using low bit format: sym_int4, device: cpu Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|███████████████████████████▌ | 1/2 [00:00<00:00, 8.45it/s] Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 2/2 [00:00<00:00, 10.32it/s] 2024-07-08 10:34:29 | ERROR | stderr | 2024-07-08 10:34:29 | INFO | ipex_llm.transformers.utils | Converting the current model to sym_int4 format...... 2024-07-08 10:34:31 | INFO | stdout | <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'> 2024-07-08 10:34:31 | INFO | model_worker | enable benchmark successfully 2024-07-08 10:34:31 | INFO | model_worker | Register to controller