Open zcwang opened 5 months ago
Hi @zcwang, we are reproducing this issue and will let you know when there are any updates :)
Hi @zcwang,
This is due to the fact that the one-time wram-up for sycl-cache on MTL iGPU for Linux not actually take effects.
We just updated the Langchain-Chatchat Setup Guide for Linux with Intel Core Ultra integrated GPU, you could follow this guide and have a try again with our latest ipex-llm
(>=2.1.0b20240612).
Please note that instead of a one-time warmup on MTL iGPU for Windows, for MTL iGPU on Linux, the warmup of LLM model will be conducted when you start the first conversation. And the warmup of embedding model may happen either when you create a knowledage base or you start the first Knowledge Base QA/File Chat conversation. Thus, please expect a several-minute warmup time during your first conversation with a LLM model, or when you create a new knowledge base with an embedding model.
Please let us know for any further problems :)
@Oscilloscope98 , I skipped warmup phase via directly running "python startup.py -a" with following environment settings but still got failed.
...
export SYCL_CACHE_PERSISTENT=1
export BIGDL_LLM_XMX_DISABLED=1
export BIGDL_IMPORT_IPEX=0
export no_proxy=localhost,127.0.0.1
export FASTCHAT_WORKER_API_TIMEOUT=600
...
Here is coming the error log.
...
2024-06-20 14:32:51 | INFO | model_worker | Loading the model ['chatglm3-6b'] on worker bcb2cd49, worker type: BigDLLLM worker...
2024-06-20 14:32:51 | INFO | model_worker | Using low bit format: sym_int4, device: xpu
2024-06-20 14:32:51 | WARNING | transformers_modules.chatglm3-6b.tokenization_chatglm | Setting eos_token is not supported, use the default one.
2024-06-20 14:32:51 | WARNING | transformers_modules.chatglm3-6b.tokenization_chatglm | Setting pad_token is not supported, use the default one.
2024-06-20 14:32:51 | WARNING | transformers_modules.chatglm3-6b.tokenization_chatglm | Setting unk_token is not supported, use the default one.
Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s]
Loading checkpoint shards: 14%|██████████████████▏ | 1/7 [00:00<00:00, 6.03it/s]
Loading checkpoint shards: 29%|████████████████████████████████████▎ | 2/7 [00:00<00:00, 6.10it/s]
Loading checkpoint shards: 43%|██████████████████████████████████████████████████████▍ | 3/7 [00:00<00:00, 6.22it/s]
Loading checkpoint shards: 57%|████████████████████████████████████████████████████████████████████████▌ | 4/7 [00:00<00:00, 6.24it/s]
Loading checkpoint shards: 71%|██████████████████████████████████████████████████████████████████████████████████████████▋ | 5/7 [00:00<00:00, 6.22it/s]
Loading checkpoint shards: 86%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 6/7 [00:00<00:00, 6.24it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 6.36it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 6.27it/s]
2024-06-20 14:32:53 | ERROR | stderr |
2024-06-20 14:32:53 | INFO | ipex_llm.transformers.utils | Converting the current model to sym_int4 format......
2024-06-20 14:33:30 | INFO | stdout | Convert model to half precision...
2024-06-20 14:33:31 | ERROR | stderr | /home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecat
ed and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
2024-06-20 14:33:31 | ERROR | stderr | warnings.warn(
2024-06-20 14:33:32 | INFO | stdout | <class 'transformers_modules.chatglm3-6b.modeling_chatglm.ChatGLMForConditionalGeneration'>
2024-06-20 14:33:32 | INFO | model_worker | enable benchmark successfully
2024-06-20 14:33:32 | INFO | model_worker | Register to controller
...
2024-06-20 14:37:07,604 - _client.py[line:1027] - INFO: HTTP Request: POST http://127.0.0.1:7861/chat/knowledge_base_chat "HTTP/1.1 200 OK"
/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The class `langchain_community.chat_models.openai.ChatOp
enAI` was deprecated in langchain-community 0.0.10 and will be removed in 0.2.0. An updated version of the class exists in the langchain-openai package and should be used instead. To use it
run `pip install -U langchain-openai` and import as `from langchain_openai import ChatOpenAI`.
warn_deprecated(
Batches: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.74it/s]
2024-06-20 14:37:08 | INFO | stdout | INFO: 127.0.0.1:50504 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-06-20 14:37:08,423 - _client.py[line:1758] - INFO: HTTP Request: POST http://127.0.0.1:20000/v1/chat/completions "HTTP/1.1 200 OK"
2024-06-20 14:37:08 | INFO | httpx | HTTP Request: POST http://127.0.0.1:20002/worker_generate_stream "HTTP/1.1 200 OK"
/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:392: UserWarning: `do_sample` is set to `False`. However, `temperature` is set
to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:407: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `1
` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
warnings.warn(
LLVM ERROR: Diag: aborted
LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: adl [Intel(R) Core(TM) Ultra 7 155H]
Registry and code: 13 MB
Command: /home/intel/miniconda3/envs/mytest/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=112, pipe_handle=119) --multiprocessing-fork
Uptime: 269.896746 s
2024-06-20 14:37:20 | ERROR | stderr | ERROR: Exception in ASGI application
2024-06-20 14:37:20 | ERROR | stderr | Traceback (most recent call last):
2024-06-20 14:37:20 | ERROR | stderr | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/starlette/responses.py", line 261, in __call__
2024-06-20 14:37:20 | ERROR | stderr | await wrap(partial(self.listen_for_disconnect, receive))
2024-06-20 14:37:20 | ERROR | stderr | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/starlette/responses.py", line 257, in wrap
2024-06-20 14:37:20 | ERROR | stderr | await func()
2024-06-20 14:37:20 | ERROR | stderr | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/starlette/responses.py", line 234, in listen_for_disconnect
2024-06-20 14:37:20 | ERROR | stderr | message = await receive()
2024-06-20 14:37:20 | ERROR | stderr | ^^^^^^^^^^^^^^^
2024-06-20 14:37:20 | ERROR | stderr | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 535, in receive
2024-06-20 14:37:20 | ERROR | stderr | await self.message_event.wait()
2024-06-20 14:37:20 | ERROR | stderr | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/asyncio/locks.py", line 213, in wait
2024-06-20 14:37:20 | ERROR | stderr | await fut
2024-06-20 14:37:20 | ERROR | stderr | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fa121b02810
2024-06-20 14:37:20 | ERROR | stderr |
2024-06-20 14:37:20 | ERROR | stderr | During handling of the above exception, another exception occurred:
2024-06-20 14:37:20 | ERROR | stderr |
2024-06-20 14:37:20 | ERROR | stderr | + Exception Group Traceback (most recent call last):
2024-06-20 14:37:20 | ERROR | stderr | | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi
2024-06-20 14:37:20 | ERROR | stderr | | result = await app( # type: ignore[func-returns-value]
2024-06-20 14:37:20 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-06-20 14:37:20 | ERROR | stderr | | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
2024-06-20 14:37:20 | ERROR | stderr | | return await self.app(scope, receive, send)
2024-06-20 14:37:20 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-06-20 14:37:20 | ERROR | stderr | | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
2024-06-20 14:37:20 | ERROR | stderr | | await super().__call__(scope, receive, send)
2024-06-20 14:37:20 | ERROR | stderr | | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/starlette/applications.py", line 119, in __call__
2024-06-20 14:37:20 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-06-20 14:37:20 | ERROR | stderr | | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
2024-06-20 14:37:20 | ERROR | stderr | | raise exc
2024-06-20 14:37:20 | ERROR | stderr | | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
2024-06-20 14:37:20 | ERROR | stderr | | await self.app(scope, receive, _send)
2024-06-20 14:37:20 | ERROR | stderr | | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/starlette/middleware/cors.py", line 83, in __call__
2024-06-20 14:37:20 | ERROR | stderr | | await self.app(scope, receive, send)
2024-06-20 14:37:20 | ERROR | stderr | | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
2024-06-20 14:37:20 | ERROR | stderr | | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2024-06-20 14:37:20 | ERROR | stderr | | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-06-20 14:37:20 | ERROR | stderr | | raise exc
2024-06-20 14:37:20 | ERROR | stderr | | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-06-20 14:37:20 | ERROR | stderr | | await app(scope, receive, sender)
2024-06-20 14:37:20 | ERROR | stderr | | File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/starlette/routing.py", line 762, in __call__
...
2024-06-20 14:37:20 | ERROR | stderr | +------------------------------------
2024-06-20 14:37:20,032 - utils.py[line:38] - ERROR: peer closed connection without sending complete message body (incomplete chunked read)
Traceback (most recent call last):
File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/httpx/_transports/default.py", line 67, in map_httpcore_exceptions
yield
File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/httpx/_transports/default.py", line 252, in __aiter__
async for part in self._httpcore_stream:
File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 367, in __aiter__
raise exc from None
File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 363, in __aiter__
async for part in self._stream:
File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/httpcore/_async/http11.py", line 349, in __aiter__
raise exc
File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/httpcore/_async/http11.py", line 341, in __aiter__
async for chunk in self._connection._receive_response_body(**kwargs):
File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/httpcore/_async/http11.py", line 210, in _receive_response_body
event = await self._receive_event(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/httpcore/_async/http11.py", line 220, in _receive_event
with map_exceptions({h11.RemoteProtocolError: RemoteProtocolError}):
File "/home/intel/miniconda3/envs/mytest/lib/python3.11/contextlib.py", line 158, in __exit__
self.gen.throw(typ, value, traceback)
File "/home/intel/miniconda3/envs/mytest/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
...
Hi @zcwang,
Please make sure you have create a new conda environment with latest ipex-llm
(>=2.1.0b20240612), and use the latest Langchain-Chatchat repo :)
Hello Sir, I use langchain-chatchat via iGPU for chatglm3-6b LLM running in my MTL 155H and it is suffering issue.
Provided genreated logs,
Test Environment:
BTW the warmup.py works well with iGPU...