我的模型文件从dddd版本里下的，Chatglm6b_ModelParallel这个文件夹下，只修改了cuda的配置，训练还是遇到了问题。

Rorschaaaach commented 1 year ago

Traceback (most recent call last): File "train_model_all.py", line 320, in trainer.train() File "/home/ubuntu/lirui/Chatglm6b_ModelParallel/MyTrainer.py", line 1600, in train return inner_training_loop( File "/home/ubuntu/lirui/Chatglm6b_ModelParallel/MyTrainer.py", line 1867, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/ubuntu/lirui/Chatglm6b_ModelParallel/MyTrainer.py", line 2601, in training_step loss = self.compute_loss(model, inputs) File "/home/ubuntu/lirui/Chatglm6b_ModelParallel/MyTrainer.py", line 2634, in compute_loss outputs = model(inputs) File "/opt/conda/envs/ChatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/opt/conda/envs/ChatGLM/lib/python3.8/site-packages/peft/peft_model.py", line 529, in forward return self.base_model( File "/opt/conda/envs/ChatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(input, kwargs) File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/thuglm/modeling_chatglm.py", line 1071, in forward transformer_outputs = self.transformer( File "/opt/conda/envs/ChatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, kwargs) File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/thuglm/modeling_chatglm.py", line 901, in forward layer_ret = torch.utils.checkpoint.checkpoint( File "/opt/conda/envs/ChatGLM/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/opt/conda/envs/ChatGLM/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(args) File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/thuglm/modeling_chatglm.py", line 897, in custom_forward return module(inputs, use_cache, output_attentions) File "/opt/conda/envs/ChatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(input, kwargs) File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/thuglm/modeling_chatglm.py", line 571, in forward attention_input = self.input_layernorm(hidden_states) File "/opt/conda/envs/ChatGLM/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/envs/ChatGLM/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 190, in forward return F.layer_norm( File "/opt/conda/envs/ChatGLM/lib/python3.8/site-packages/torch/nn/functional.py", line 2515, in layer_norm return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper__native_layer_norm)