Closed dyedd closed 5 months ago
This bug will be fixed in https://github.com/intel-analytics/ipex-llm/pull/10540 :)
You could try again tomorrow with ipex-llm>=2.1.0b20240326
. Refer to here for more installation details regarding ipex-llm
. And please let us know for any further questions :)
This bug will be fixed in #10540 :)
You could try again tomorrow with
ipex-llm>=2.1.0b20240326
. Refer to here for more installation details regardingipex-llm
. And please let us know for any further questions :)
NO,it still have new problem.
--disable-stream
:ipex_llm/transformers/models/chatglm2.py", line 432, in chatglm2_attention_forward_8eb45c new_cache_k[:] = cache_k RuntimeError: Native API failed. Native API returns: -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -6 (PI_ERROR_OUT_OF_HOST_MEMORY);
-------------------- Stream Chat Output --------------------
Traceback (most recent call last):
File "/home/dyedd/projects/agent/./test/streamchat.py", line 57, in <module>
for response, history in model.stream_chat(tokenizer, args.question, history=[]):
File "/home/dyedd/.conda/envs/pt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/home/dyedd/.cache/huggingface/modules/transformers_modules/chatglm3-6b-base/modeling_chatglm.py", line 1078, in stream_chat
response, new_history = self.process_response(response, history)
File "/home/dyedd/.cache/huggingface/modules/transformers_modules/chatglm3-6b-base/modeling_chatglm.py", line 1004, in process_response
metadata, content = response.split("\n", maxsplit=1)
ValueError: not enough values to unpack (expected 2, got 1)
@Oscilloscope98
I also find ipex-llm can't support streamlit now.notice: this code still need replace new api
This bug will be fixed in #10540 :) You could try again tomorrow with
ipex-llm>=2.1.0b20240326
. Refer to here for more installation details regardingipex-llm
. And please let us know for any further questions :)NO,it still have new problem.
- chatglm3/streamchat.py:
--disable-stream
:ipex_llm/transformers/models/chatglm2.py", line 432, in chatglm2_attention_forward_8eb45c new_cache_k[:] = cache_k RuntimeError: Native API failed. Native API returns: -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -6 (PI_ERROR_OUT_OF_HOST_MEMORY);
- chatglm3/streamchat.py: please notice it no disable-stream:
-------------------- Stream Chat Output -------------------- Traceback (most recent call last): File "/home/dyedd/projects/agent/./test/streamchat.py", line 57, in <module> for response, history in model.stream_chat(tokenizer, args.question, history=[]): File "/home/dyedd/.conda/envs/pt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/home/dyedd/.cache/huggingface/modules/transformers_modules/chatglm3-6b-base/modeling_chatglm.py", line 1078, in stream_chat response, new_history = self.process_response(response, history) File "/home/dyedd/.cache/huggingface/modules/transformers_modules/chatglm3-6b-base/modeling_chatglm.py", line 1004, in process_response metadata, content = response.split("\n", maxsplit=1) ValueError: not enough values to unpack (expected 2, got 1)
We haven't been able to reproduce this issue yet on our Arc A770. Would you mind running the python/llm/scripts/env-check.sh
script and paste the output here so that we can have more information regarding your environment?
This bug will be fixed in #10540 :) You could try again tomorrow with
ipex-llm>=2.1.0b20240326
. Refer to here for more installation details regardingipex-llm
. And please let us know for any further questions :)NO,it still have new problem.
- chatglm3/streamchat.py:
--disable-stream
:ipex_llm/transformers/models/chatglm2.py", line 432, in chatglm2_attention_forward_8eb45c new_cache_k[:] = cache_k RuntimeError: Native API failed. Native API returns: -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -6 (PI_ERROR_OUT_OF_HOST_MEMORY);
- chatglm3/streamchat.py: please notice it no disable-stream:
-------------------- Stream Chat Output -------------------- Traceback (most recent call last): File "/home/dyedd/projects/agent/./test/streamchat.py", line 57, in <module> for response, history in model.stream_chat(tokenizer, args.question, history=[]): File "/home/dyedd/.conda/envs/pt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/home/dyedd/.cache/huggingface/modules/transformers_modules/chatglm3-6b-base/modeling_chatglm.py", line 1078, in stream_chat response, new_history = self.process_response(response, history) File "/home/dyedd/.cache/huggingface/modules/transformers_modules/chatglm3-6b-base/modeling_chatglm.py", line 1004, in process_response metadata, content = response.split("\n", maxsplit=1) ValueError: not enough values to unpack (expected 2, got 1)
We haven't been able to reproduce this issue yet on our Arc A770. Would you mind running the
python/llm/scripts/env-check.sh
script and paste the output here so that we can have more information regarding your environment?
NO problem.
Based on the provided environment information, it seems that PyTorch and IPEX are not installed. Could you please set up the correct environment and then run the shell script?
You can follow the guide below to set up the environment or check if the environment is correct: https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux
You can follow the guide below to set up the environment or check if the environment is correct: https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux
I forgot to source oneAPI environment,so the bash don't check truely.
Sorry, we still can't reproduce the error you encountered while running chatglm3/streamchat.py. The error you mentioned is OUT_OF_HOST_MEMORY, indicating a memory overflow on the CPU, but you are actually running model inference on XPU. Therefore, could you please provide further details on the input parameters you used when running chatglm3/streamchat.py, including the question prompt, max_new_token, etc., so that we can further replicate the issue?
Sorry, we still can't reproduce the error you encountered while running chatglm3/streamchat.py. The error you mentioned is OUT_OF_HOST_MEMORY, indicating a memory overflow on the CPU, but you are actually running model inference on XPU. Therefore, could you please provide further details on the input parameters you used when running chatglm3/streamchat.py, including the question prompt, max_new_token, etc., so that we can further replicate the issue?
My config is default,if you can‘t reproduce the error,I can provide ssh information to you?
Sure, you could leave your email address and I'll contact you.
Sure, you could leave your email address and I'll contact you.
Please ensure that the modeling file for chatglm3 is downloaded from the official repository. You could go to ModelScope to download the corresponding file.
Please ensure that the modeling file for chatglm3 is downloaded from the official repository. You could go to ModelScope to download the corresponding file.
Thanks, maybe I downloaded the basic model instead of the chat model
hello, I tried to run the code from
https://gitee.com/Pauntech/chat-glm3/blob/master/chatglm3_web_demo.py
, but I face a problem.you can see that it run on the cpu? but the code clearly offload to the xpu. the result of
sycl-ls
:I used the method from https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/KeyFeatures/multi_gpus_selection.html,but it still fail.
I also run the code from
https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/streamchat.py
. but I face the problem:so,how to run the code in dGPU.?