Open KiwiHana opened 10 months ago
Human: 桌子有左中右3个抽屉;张三,李四,王五,赵六都看到桌子上有一袋巧克力。张三让李四和王五出门后,在赵六面前把这袋巧克力放进了右抽屉;王五回来后,张三让赵六出门去找李四,并在王五面前从左抽屉拿出一盒饼干放进中抽屉里;等李四和赵六返回,张三又让王五和赵六出去买酱油,等二人走后,他告诉李四刚才已将一盒饼干放进中抽屉;张三等了很久,发现王五和赵六还没回来,就派李四去寻找,可最后只有王五和李四回来了。王五告诉张三,一开始他们没有找到卖酱油的店,所以只好分头去买,后来赵六走丢了;回来的路上,王五碰上了李四,两人便先赶了回来。于是,张三让两人出门去找赵六;为防再次走丢,张三叮嘱李四和王五要时刻同行,就算酱油买不到,也要找回赵六。结果,李四和王五在外面找到了赵六,发现他已经买了酱油。三人觉得张三从来不出门跑腿,十分气愤,讨论并达成共识,回去见到张三后,不要告诉他买到了酱油的事情,并让王五把酱油藏到自己的背包里。等三人一同回来后,他们按照计划谎称没有买到酱油,并希望张三以后买东西也要一同出门,不能偷懒,张三答应了。当大家最后站在桌子前,四人分别写下自己知道的物品清单和物品所在位置。问,这四人写下的物品和位置信息是否一致,为什么?
c:\program files\aigc assistant\resources\audiollm\chat_chatglm3_kv.py(155)chatglm3_stream_chat() -> if user_input == "stop": (Pdb) c BigDL-LLM: 这是一个有趣的逻辑谜题。我们可以按照描述的步骤来逐步进行分析:
根据这些步骤,我们可以发现:
问题在于,他们三人写的物品清单和位置信息是否一致。
答案是不一致的。因为:
所以,他们三人写的物品清单和位置信息不一致,因为他们在返回时将饼干放回的抽屉不同。 Human: 桌子有左中右3个抽屉;张三,李四,王五,赵六都看到桌子上有一袋巧克力。张三让李四和王五出门后,在赵六面前把这袋巧克力放进了右抽屉;王五回来后,张三让赵六出门去找李四,并在王五面前从左抽屉拿出一盒饼干放进中抽屉里;等李四和赵六返回,张三又让王五和赵六出去买酱油,等二人走后,他告诉李四刚才已将一盒饼干放进中抽屉;张三等了很久,发现王五和赵六还没回来,就派李四去寻找,可最后只有王五和李四回来了。王五告诉张三,一开始他们没有找到卖酱油的店,所以只好分头去买,后来赵六走丢了;回来的路上,王五碰上了李四,两人便先赶了回来。于是,张三让两人出门去找赵六;为防再次走丢,张三叮嘱李四和王五要时刻同行,就算酱油买不到,也要找回赵六。结果,李四和王五在外面找到了赵六,发现他已经买了酱油。三人觉得张三从来不出门跑腿,十分气愤,讨论并达成共识,回去见到张三后,不要告诉他买到了酱油的事情,并让王五把酱油藏到自己的背包里。等三人一同回来后,他们按照计划谎称没有买到酱油,并希望张三以后买东西也要一同出门,不能偷懒,张三答应了。当大家最后站在桌子前,四人分别写下自己知道的物品清单和物品所在位置。问,这四人写下的物品和位置信息是否一致,为什么?
c:\program files\aigc assistant\resources\audiollm\chat_chatglm3_kv.py(155)chatglm3_stream_chat() -> if user_input == "stop": (Pdb) c BigDL-LLM: Traceback (most recent call last): File "C:\Program Files\AIGC Assistant\resources\audiollm\chat_chatglm3_kv.py", line 308, in
chatglm3_stream_chat(model=model, tokenizer=tokenizer) File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "C:\Program Files\AIGC Assistant\resources\audiollm\chat_chatglm3_kv.py", line 166, in chatglm3_stream_chat for response, chat_history, past_key_values in model.stream_chat(tokenizer, prompt, File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context response = gen.send(None) File "C:\Users\A380/.cache\huggingface\modules\transformers_modules\chatglm3-6b-int4\modeling_chatglm.py", line 1073, in stream_chat for outputs in self.stream_generate(inputs, past_key_values=past_key_values, File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context response = gen.send(None) File "C:\Users\A380/.cache\huggingface\modules\transformers_modules\chatglm3-6b-int4\modeling_chatglm.py", line 1160, in stream_generate outputs = self( File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "C:\Users\A380/.cache\huggingface\modules\transformers_modules\chatglm3-6b-int4\modeling_chatglm.py", line 937, in forward transformer_outputs = self.transformer( File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 153, in chatglm2_model_forward hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder( File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "C:\Users\A380/.cache\huggingface\modules\transformers_modules\chatglm3-6b-int4\modeling_chatglm.py", line 640, in forward layer_ret = layer( File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "C:\Users\A380/.cache\huggingface\modules\transformers_modules\chatglm3-6b-int4\modeling_chatglm.py", line 544, in forward attention_output, kv_cache = self.self_attention( File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 293, in chatglm2_attention_forward_8eb45c key_layer, value_layer = append_kv_cache(cache_k, cache_v, key_layer, value_layer) File "C:\ProgramData\miniconda3\envs\llmsd_env\lib\site-packages\bigdl\llm\transformers\models\utils.py", line 55, in append_kv_cache new_cache_k = cache_k.as_strided(new_size, cache_k.stride(), storage_offset=0) RuntimeError: setStorage: sizes [1, 32, 1558, 128], strides [5251072, 164096, 128, 1], storage offset 0, and itemsize 4 requiring a storage size of 21145600 are out of bounds for storage of size 21004288
Both exists Chatglm3-6B and Baichuan2-7B on Arc and MTL iGPU on windows.
Both exists Chatglm3-6B and Baichuan2-7B on Arc and MTL iGPU on windows.
Hi @KiwiHana . Thank you for submitting this issue! :)
We have reproduced this issue on multiple platforms. The root cause is that we didn't allocate enough KV cache for multiple rounds stream_chat. That leads to OOM on KV cache storage rather than GPU memory. This issue also impacts speculative decoding examples.
PR #10006 will fix this issue.
bigdl-llm 20240128 You can solve by putting libsycl-fallback-bfloat16.spv to your_env\Lib\site-packages\intel_extension_for_pytorch\bin
C:\Program Files\AIGC Assistant\resources\audiollm>..\llmsd_env\python.exe chat_0205.py
C:\Program Files\AIGC Assistant\resources\llmsd_env\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg
or libpng
installed before building torchvision
from source?
warn(
2024-02-05 09:46:23,247 - INFO - intel_extension_for_pytorch auto imported
C:\Program Files\AIGC Assistant\resources\llmsd_env\lib\site-packages\bigdl\llm\transformers\model.py:401: FutureWarning: replace_embedding is deprecated and will be removed in a future version, please use cpu_embedding instead.
warnings.warn("replace_embedding is deprecated and will be removed in a future version,"
2024-02-05 09:46:23,435 - INFO - Converting the current model to sym_int4 format......
Human: hi
BigDL-LLM: Traceback (most recent call last):
File "C:\Program Files\AIGC Assistant\resources\audiollm\chat_0205.py", line 298, in
我用bigdl0128版本with ipex2.1,代码不改动(max past token = 512),https://github.com/intel-analytics/BigDL/pull/10007/files MTL 16GB内存机器,每次输入400 token,输出大概是300token。chatglm3-6B跑到第12次的时候,就报错Native API returns: -999 (Unknown PI error),观察iGPU显存占用从第一次对话到第12次都在5.1~5.4GB之间,显存没有不停增加了,报错的时候是在第12次回答输出的中途报错的。
以下是出现报错的第12次对话
Human: 桌子有左中右3个抽屉;张三,李四,王五,赵六都看到桌子上有一袋巧克力。张三让李四和王五出门后,在赵六面前把这袋巧克力放进了右抽屉;王五回来后,张三让赵六出门去找李四,并在王五面前从左抽屉拿出一盒饼干放进中抽屉里;等李四和赵六返回,张三又让王五和赵六出去买酱油,等二人走后,他告诉李四刚才已将一盒饼干放进中抽屉;张三等了很久,发现王五和赵六还没回来,就派李四去寻找,可最后只有王五和李四回来了。王五告诉张三,一开始他们没有找到卖酱油的店,所以只好分头去买,后来赵六走丢了;回来的路上,王五碰上了李四,两人便先赶了回来。于是,张三让两人出门去找赵六;为防再次走丢,张三叮嘱李四和王五要时刻同行,就算酱油买不到,也要找回赵六。结果,李四和王五在外面找到了赵六,发现他已经买了酱油。三人觉得张三从来不出门跑腿,十分气愤,讨论并达成共识,回去见到张三后,不要告诉他买到了酱油的事情,并让王五把酱油藏到自己的背包里。等三人一同回来后,他们按照计划谎称没有买到酱油,并希望张三以后买东西也要一同出门,不能偷懒,张三答应了。当大家最后站在桌子前,四人分别写下自己知道的物品清单和物品所在位置。问,这四人写下的物品和位置信息是否一致,为什么?
BigDL-LLM: 这是一个经典的逻辑谜题。我们可以通过分析每个人的陈述来找出答案。
首先,我们可以看到四个人的陈述如下:
1. 张三:没有找到卖酱油的店。
2. 李四:没有找到卖酱油的店。
3. 王五:分的的开发票。
4. 赵六:没有找到卖酱油的店,在右抽屉。
让我们来分析一下:
- 张三说“没有找到卖酱油的店”,这意味着他并未找到卖酱油的店,所以他的陈述与实际情况不符。
- 李四说“没有找到卖酱油的店”,这与张三的陈述相符,所以他的陈述是正确的。
- 王五说“分的的开发票”,这与事实相符,所以他的陈述是正确的。
- 赵六说“没有找到卖酱油的店,在右抽屉”,这与事实相符。
由此可知,四个人中只有赵六的陈述与实际情况相符,其他三个人都在说谎。那么,为什么他们的物品和位置Traceback (most recent call last):
File "C:\Program Files\AIGC Assistant\resources\audiollm\chat_0205.py", line 298, in <module>
chatglm3_stream_chat(model=model, tokenizer=tokenizer)
File "C:\Program Files\AIGC Assistant\resources\llmsd_env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Program Files\AIGC Assistant\resources\audiollm\chat_0205.py", line 166, in chatglm3_stream_chat
for response, chat_history, past_key_values in model.stream_chat(tokenizer, prompt,
File "C:\Program Files\AIGC Assistant\resources\llmsd_env\lib\site-packages\torch\utils\_contextlib.py", line 56, in generator_context
response = gen.send(request)
File "C:\Users\test/.cache\huggingface\modules\transformers_modules\chatglm3-6b-int4\modeling_chatglm.py", line 1072, in stream_chat
for outputs in self.stream_generate(**inputs, past_key_values=past_key_values,
File "C:\Program Files\AIGC Assistant\resources\llmsd_env\lib\site-packages\torch\utils\_contextlib.py", line 56, in generator_context
response = gen.send(request)
File "C:\Users\test/.cache\huggingface\modules\transformers_modules\chatglm3-6b-int4\modeling_chatglm.py", line 1170, in stream_generate
next_token_scores = logits_warper(input_ids, next_token_scores)
File "C:\Program Files\AIGC Assistant\resources\llmsd_env\lib\site-packages\transformers\generation\logits_process.py", line 97, in __call__
scores = processor(input_ids, scores)
File "C:\Program Files\AIGC Assistant\resources\llmsd_env\lib\site-packages\transformers\generation\logits_process.py", line 315, in __call__
indices_to_remove = scores < torch.topk(scores, top_k)[0][..., -1, None]
RuntimeError: Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
Hi, OS: windows 10, Arc A750 Driver: 5081 请问chatglm3和Baichuan2-7B随着对话次数增加,内存不停增大。用这个KV cache demo也不能解决: demo link: https://github.com/intel-analytics/BigDL/blob/main/python/llm/portable-zip/chat.py#L201
bigdl-core-xe-21 2.5.0b20240111 bigdl-llm 2.5.0b20240111 intel-extension-for-pytorch 2.1.10+git8ff85d6 torch 2.1.0a0+cxx11.abi torchvision 0.16.0a0+cxx11.abi