chinese_bloom通过ds训练报错:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm) #123
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)
错误信息:
/File "/home/ash404/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 2516, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)
hi,我在尝试进行训练的过程中出现了问题,可否帮我关注一下? GPU配置:3090X2 系统:ubuntu20.04 我在网络上找到的一些解决方案可能有先入为主的误导:他们说因为是双卡在跑所以某些数据可能在另外一张卡上。。。 3090一张卡显存太小了,也许换一张显存大的会更好?
恳请您的解答
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)
错误信息:
/File "/home/ash404/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 2516, in layer_norm return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)