yuanzhoulvpi2017 / zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)
MIT License
2.85k stars 355 forks source link

chinese_bloom通过ds训练报错:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm) #123

Open shaoqing404 opened 1 year ago

shaoqing404 commented 1 year ago

hi,我在尝试进行训练的过程中出现了问题,可否帮我关注一下? GPU配置:3090X2 系统:ubuntu20.04 我在网络上找到的一些解决方案可能有先入为主的误导:他们说因为是双卡在跑所以某些数据可能在另外一张卡上。。。 3090一张卡显存太小了,也许换一张显存大的会更好?

恳请您的解答

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

错误问题1

错误信息: 位置信息如下

/File "/home/ash404/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 2516, in layer_norm return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

yuanzhoulvpi2017 commented 1 year ago

参考这个回答https://github.com/yuanzhoulvpi2017/zero_nlp/issues/118#issuecomment-1574763709

shaoqing404 commented 1 year ago

参考这个回答#118 (评论)

这个操作是有效的,之前的错误确认消失了。

shaoqing404 commented 1 year ago

参考这个回答#118 (评论)

但事实上我只有两块3090,合计20G显存,终究逃不过爆显存,没办法做更多确认TT

yuanzhoulvpi2017 commented 1 year ago
  1. 如果对于3b及3b以下的模型,其实你两块3090是可以直接全量微调的。
  2. 如果你要训练的是7b及7b以上的模型,那还是使用deepspeed-zero3,或者使用量化、lora、qlora等