Hello, thank you for your work. When I run bash launch_chatglm_cmd.sh
I've got the error:
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Please Enter Your Name:
用户名:Tiantian
欢迎新用户Tiantian!我会记住你的名字,下次见面就能叫你的名字啦!
Welcome to use SiliconFriend model,please enter your question to start conversation,enter "clear" to clear conversation ,enter "stop" to stop program
Tiantian:tell me a joke
Traceback (most recent call last):
File "/home/Hongwei/disk_hdd/Tiantian/LLM_Memory/MemoryBank-SiliconFriend-main/SiliconFriend-ChatGLM-BELLE/cli_demo.py", line 212, in
main()
File "/home/Hongwei/disk_hdd/Tiantian/LLM_Memory/MemoryBank-SiliconFriend-main/SiliconFriend-ChatGLM-BELLE/cli_demo.py", line 198, in main
history_state, history, msg = predict_new(text=query,history=history,top_p=0.95,temperature=1,max_length_tokens=1024,max_context_length_tokens=200,user_name=user_name,user_memory=user_memory,user_memory_index=user_memory_index)
File "/home/Hongwei/disk_hdd/Tiantian/LLM_Memory/MemoryBank-SiliconFriend-main/SiliconFriend-ChatGLM-BELLE/cli_demo.py", line 148, in predict_new
response = chat(model,tokenizer,text,history=history,
File "/home/Hongwei/disk_hdd/Tiantian/LLM_Memory/MemoryBank-SiliconFriend-main/SiliconFriend-ChatGLM-BELLE/cli_demo.py", line 106, in chat
outputs = model.generate(inputs, gen_kwargs)
File "/home/Hongwei/anaconda3/lib/python3.9/site-packages/peft/peft_model.py", line 1022, in generate
outputs = self.base_model.generate(*kwargs)
File "/home/Hongwei/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(args, **kwargs)
File "/home/Hongwei/anaconda3/lib/python3.9/site-packages/transformers/generation/utils.py", line 1308, in generate
model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
File "/home/Hongwei/anaconda3/lib/python3.9/site-packages/transformers/generation/utils.py", line 603, in _prepare_attention_mask_for_generation
is_pad_token_in_inputs = (pad_token_id is not None) and (pad_token_id in inputs)
File "/home/Hongwei/anaconda3/lib/python3.9/site-packages/torch/_tensor.py", line 703, in contains
return (element == self).any().item() # type: ignore[union-attr]
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Hello, thank you for your work. When I run bash launch_chatglm_cmd.sh I've got the error:
RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Please Enter Your Name:
用户名:Tiantian 欢迎新用户Tiantian!我会记住你的名字,下次见面就能叫你的名字啦! Welcome to use SiliconFriend model,please enter your question to start conversation,enter "clear" to clear conversation ,enter "stop" to stop program
Tiantian:tell me a joke Traceback (most recent call last): File "/home/Hongwei/disk_hdd/Tiantian/LLM_Memory/MemoryBank-SiliconFriend-main/SiliconFriend-ChatGLM-BELLE/cli_demo.py", line 212, in
main()
File "/home/Hongwei/disk_hdd/Tiantian/LLM_Memory/MemoryBank-SiliconFriend-main/SiliconFriend-ChatGLM-BELLE/cli_demo.py", line 198, in main
history_state, history, msg = predict_new(text=query,history=history,top_p=0.95,temperature=1,max_length_tokens=1024,max_context_length_tokens=200,user_name=user_name,user_memory=user_memory,user_memory_index=user_memory_index)
File "/home/Hongwei/disk_hdd/Tiantian/LLM_Memory/MemoryBank-SiliconFriend-main/SiliconFriend-ChatGLM-BELLE/cli_demo.py", line 148, in predict_new
response = chat(model,tokenizer,text,history=history,
File "/home/Hongwei/disk_hdd/Tiantian/LLM_Memory/MemoryBank-SiliconFriend-main/SiliconFriend-ChatGLM-BELLE/cli_demo.py", line 106, in chat
outputs = model.generate(inputs, gen_kwargs)
File "/home/Hongwei/anaconda3/lib/python3.9/site-packages/peft/peft_model.py", line 1022, in generate
outputs = self.base_model.generate(*kwargs)
File "/home/Hongwei/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(args, **kwargs)
File "/home/Hongwei/anaconda3/lib/python3.9/site-packages/transformers/generation/utils.py", line 1308, in generate
model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
File "/home/Hongwei/anaconda3/lib/python3.9/site-packages/transformers/generation/utils.py", line 603, in _prepare_attention_mask_for_generation
is_pad_token_in_inputs = (pad_token_id is not None) and (pad_token_id in inputs)
File "/home/Hongwei/anaconda3/lib/python3.9/site-packages/torch/_tensor.py", line 703, in contains
return (element == self).any().item() # type: ignore[union-attr]
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.