模型微调出现模型部分参数在cpu上面

kunzeng-ch commented 1 year ago

大佬，这是怎么回事，我是直接执行了train_qlora.py文件，然后出现了这个错误 File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm2_6b/modeling_chatglm.py", line 588, in forward hidden_states, kv_cache = layer( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/jovyan/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm2_6b/modeling_chatglm.py", line 510, in forward attention_output, kv_cache = self.self_attention( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/jovyan/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm2_6b/modeling_chatglm.py", line 342, in forward mixed_x_layer = self.query_key_value(hidden_states) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/peft/tuners/lora.py", line 456, in forward after_A = self.lora_A(self.lora_dropout(x)) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

shuxueslpi commented 1 year ago

多卡目前有点问题，可以先在一张卡上跑。

---- 回复的原邮件 ---- | 发件人 | @.> | | 日期 | 2023年07月10日 16:24 | | 收件人 | @.> | | 抄送至 | @.***> | | 主题 | [shuxueslpi/chatGLM-6B-QLoRA] 模型微调出现模型部分参数在cpu上面 (Issue #21) |

大佬，这是怎么回事，我是直接执行了train_qlora.py文件，然后出现了这个错误 File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm2_6b/modeling_chatglm.py", line 588, in forward hidden_states, kv_cache = layer( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/jovyan/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm2_6b/modeling_chatglm.py", line 510, in forward attention_output, kv_cache = self.self_attention( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/jovyan/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm2_6b/modeling_chatglm.py", line 342, in forward mixed_x_layer = self.query_key_value(hidden_states) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/peft/tuners/lora.py", line 456, in forward after_A = self.lora_A(self.lora_dropout(x)) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

kunzeng-ch commented 1 year ago

好的大佬，密切关注大佬动向

shuxueslpi commented 1 year ago

@kunzeng-ch 不好意思，我刚刚看错了，我以为你是两张卡，报的cuda0和cuda1，但是你报的是cuda0和cpu，能告知下你的硬件环境吗？显卡型号，cpu型号，操作系统等？之前好像没有人遇到过这个问题。

shuxueslpi / chatGLM-6B-QLoRA

模型微调出现模型部分参数在cpu上面 #21