lmx760581375 commented 1 year ago

单机多卡训练出现错误，V100 32g

liangwq commented 1 year ago

单机多卡训练出现错误，V100 32g

这个问题基本就是fp16和model.half()的问题还有一个可能就是token是用的autotoken还是chatglm自己的token包你先自己查一下，如果还是解决不了把，详细错误信息发上来，我看看

lmx760581375 commented 1 year ago

root@ts-ca62954e04974251981306a5f1766536-launcher:/apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test# python3 multi_gpu_finetune.py

issues

/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')} warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths... warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//192.168.0.1'), PosixPath('443'), PosixPath('tcp')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')} warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)! warn(msg) CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 116 /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so... The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00, 5.76s/it] /opt/conda/lib/python3.10/site-packages/peft/tuners/lora.py:191: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False. warnings.warn( /opt/conda/lib/python3.10/site-packages/torch/cuda/memory.py:282: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats. warnings.warn( 0%| | 0/1000000 [00:00<?, ?it/s] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/multi_gpu_finetune.py:2 │ │ 57 in │ │ │ │ 254 │ parser.add_argument("--remove_unused_columns", type=bool, default=False) │ │ 255 │ parser.add_argument("--logging_steps", type=int, default=50) │ │ 256 │ args = parser.parse_args() │ │ ❱ 257 │ main(args) │ │ 258 │ │ │ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/multi_gpu_finetune.py:1 │ │ 96 in main │ │ │ │ 193 │ │ │ │ │ # save_tunable_parameters(model, os.path.join(path, "chatglm-lora.pt │ │ 194 │ │ │ │ i += 1 │ │ 195 │ │ │ │ # print(batch) │ │ ❱ 196 │ │ │ │ outputs = model(batch) │ │ 197 │ │ │ │ loss_detach = outputs.loss.detach().cpu().float() │ │ 198 │ │ │ │ t.set_description(f"loss: {loss_detach}") │ │ 199 │ │ │ │ total_loss += loss_detach │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(input, kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/peft/peft_model.py:575 in forward │ │ │ │ 572 │ │ kwargs, │ │ 573 │ ): │ │ 574 │ │ if not isinstance(self.peft_config, PromptLearningConfig): │ │ ❱ 575 │ │ │ return self.base_model( │ │ 576 │ │ │ │ input_ids=input_ids, │ │ 577 │ │ │ │ attention_mask=attention_mask, │ │ 578 │ │ │ │ inputs_embeds=inputs_embeds, │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(input, kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(*args, kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(*args, *kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/modeling_chatglm.py:104 │ │ 4 in forward │ │ │ │ 1041 │ │ use_cache = use_cache if use_cache is not None else self.config.use_cache │ │ 1042 │ │ return_dict = return_dict if return_dict is not None else self.config.use_return │ │ 1043 │ │ │ │ ❱ 1044 │ │ transformer_outputs = self.transformer( │ │ 1045 │ │ │ input_ids=input_ids, │ │ 1046 │ │ │ position_ids=position_ids, │ │ 1047 │ │ │ attention_mask=attention_mask, │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(input, kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(*args, kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(*args, *kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/modeling_chatglm.py:886 │ │ in forward │ │ │ │ 883 │ │ │ if output_hidden_states: │ │ 884 │ │ │ │ all_hidden_states = all_hidden_states + (hidden_states,) │ │ 885 │ │ │ │ │ ❱ 886 │ │ │ layer_ret = layer( │ │ 887 │ │ │ │ hidden_states, │ │ 888 │ │ │ │ position_ids=position_ids, │ │ 889 │ │ │ │ attention_mask=attention_mask, │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(input, kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(*args, kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(*args, *kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/modeling_chatglm.py:570 │ │ in forward │ │ │ │ 567 │ │ │ │ 568 │ │ # Layer norm at the begining of the transformer layer. │ │ 569 │ │ # [seq_len, batch, hidden_size] │ │ ❱ 570 │ │ attention_input = self.input_layernorm(hidden_states) │ │ 571 │ │ │ │ 572 │ │ # Self attention. │ │ 573 │ │ attention_outputs = self.attention( │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(input, kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(*args, *kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(args, **kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = newforward │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py:190 in forward │ │ │ │ 187 │ │ │ init.zeros(self.bias) │ │ 188 │ │ │ 189 │ def forward(self, input: Tensor) -> Tensor: │ │ ❱ 190 │ │ return F.layer_norm( │ │ 191 │ │ │ input, self.normalized_shape, self.weight, self.bias, self.eps) │ │ 192 │ │ │ 193 │ def extra_repr(self) -> str: │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py:2515 in layer_norm │ │ │ │ 2512 │ │ return handle_torch_function( │ │ 2513 │ │ │ layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, b │ │ 2514 │ │ ) │ │ ❱ 2515 │ return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.c │ │ 2516 │ │ 2517 │ │ 2518 def group_norm( │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: expected scalar type Half but found Float

liangwq commented 1 year ago

root@ts-ca62954e04974251981306a5f1766536-launcher:/apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test# python3 multi_gpu_finetune.py

===================================BUG REPORT===================================

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')} warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths... warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//192.168.0.1'), PosixPath('443'), PosixPath('tcp')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')} warn(msg) /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)! warn(msg) CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 116 /opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so... The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00, 5.76s/it] /opt/conda/lib/python3.10/site-packages/peft/tuners/lora.py:191: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False. warnings.warn( /opt/conda/lib/python3.10/site-packages/torch/cuda/memory.py:282: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats. warnings.warn( 0%| | 0/1000000 [00:00<?, ?it/s] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/multi_gpu_finetune.py:2 │ │ 57 in │ │ │ │ 254 │ parser.add_argument("--remove_unused_columns", type=bool, default=False) │ │ 255 │ parser.add_argument("--logging_steps", type=int, default=50) │ │ 256 │ args = parser.parse_args() │ │ ❱ 257 │ main(args) │ │ 258 │ │ │ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/multi_gpu_finetune.py:1 │ │ 96 in main │ │ │ │ 193 │ │ │ │ │ # save_tunable_parameters(model, os.path.join(path, "chatglm-lora.pt │ │ 194 │ │ │ │ i += 1 │ │ 195 │ │ │ │ # print(batch) │ │ ❱ 196 │ │ │ │ outputs = model(batch) │ │ 197 │ │ │ │ loss_detach = outputs.loss.detach().cpu().float() │ │ 198 │ │ │ │ t.set_description(f"loss: {loss_detach}") │ │ 199 │ │ │ │ total_loss += loss_detach │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(input, kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/peft/peft_model.py:575 in forward │ │ │ │ 572 │ │ kwargs, │ │ 573 │ ): │ │ 574 │ │ if not isinstance(self.peft_config, PromptLearningConfig): │ │ ❱ 575 │ │ │ return self.base_model( │ │ 576 │ │ │ │ input_ids=input_ids, │ │ 577 │ │ │ │ attention_mask=attention_mask, │ │ 578 │ │ │ │ inputs_embeds=inputs_embeds, │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(input, kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(*args, kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(*args, *kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/modeling_chatglm.py:104 │ │ 4 in forward │ │ │ │ 1041 │ │ use_cache = use_cache if use_cache is not None else self.config.use_cache │ │ 1042 │ │ return_dict = return_dict if return_dict is not None else self.config.use_return │ │ 1043 │ │ │ │ ❱ 1044 │ │ transformer_outputs = self.transformer( │ │ 1045 │ │ │ input_ids=input_ids, │ │ 1046 │ │ │ position_ids=position_ids, │ │ 1047 │ │ │ attention_mask=attention_mask, │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(input, kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(*args, kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(*args, *kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/modeling_chatglm.py:886 │ │ in forward │ │ │ │ 883 │ │ │ if output_hidden_states: │ │ 884 │ │ │ │ all_hidden_states = all_hidden_states + (hidden_states,) │ │ 885 │ │ │ │ │ ❱ 886 │ │ │ layer_ret = layer( │ │ 887 │ │ │ │ hidden_states, │ │ 888 │ │ │ │ position_ids=position_ids, │ │ 889 │ │ │ │ attention_mask=attention_mask, │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(input, kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(*args, kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(*args, *kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/chatglm_finetune_test/modeling_chatglm.py:570 │ │ in forward │ │ │ │ 567 │ │ │ │ 568 │ │ # Layer norm at the begining of the transformer layer. │ │ 569 │ │ # [seq_len, batch, hidden_size] │ │ ❱ 570 │ │ attention_input = self.input_layernorm(hidden_states) │ │ 571 │ │ │ │ 572 │ │ # Self attention. │ │ 573 │ │ attention_outputs = self.attention( │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(input, kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(*args, *kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(args, **kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = newforward │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py:190 in forward │ │ │ │ 187 │ │ │ init.zeros(self.bias) │ │ 188 │ │ │ 189 │ def forward(self, input: Tensor) -> Tensor: │ │ ❱ 190 │ │ return F.layer_norm( │ │ 191 │ │ │ input, self.normalized_shape, self.weight, self.bias, self.eps) │ │ 192 │ │ │ 193 │ def extra_repr(self) -> str: │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py:2515 in layer_norm │ │ │ │ 2512 │ │ return handle_torch_function( │ │ 2513 │ │ │ layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, b │ │ 2514 │ │ ) │ │ ❱ 2515 │ return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.c │ │ 2516 │ │ 2517 │ │ 2518 def group_norm( │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: expected scalar type Half but found Float

试试把你的pytorch和transformer升级到最新版本

lmx760581375 commented 1 year ago

你好，现在我出现了新的错误，可以加我联系方式协助一下我吗，qq是我名字后面的数字串

liangwq commented 1 year ago

你好，现在我出现了新的错误，可以加我联系方式协助一下我吗，qq是我名字后面的数字串

你的模型没下载，这个尽量设置cache_dir到一个你常用地址，很可能是你没办法链接外网自动下载模型

lmx760581375 commented 1 year ago

不，我的模型下载了，这个错误出现的地方是accelerator.prepare

liangwq commented 1 year ago

不，我的模型下载了，这个错误出现的地方是accelerator.prepare

如果你下载了模型，看看你下载模型放置路径对不

lmx760581375 commented 1 year ago

Creating extension directory /root/.cache/torch_extensions/py310_cu116/utils... Using /root/.cache/torch_extensions/py310_cu116 as PyTorch extensions root... ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/Chatglm_lora_multi-gpu/multi_gpu_fintune_bell │ │ e.py:357 in │ │ │ │ 354 │ │ 355 │ │ 356 if name == "main": │ │ ❱ 357 │ main() │ │ 358 │ │ 359 # accelerate方式调用 │ │ 360 # accelerate launch --config_file accelerate_ds_zero3_cpu_offload_config.yaml multi_gpu │ │ │ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/Chatglm_lora_multi-gpu/multi_gpu_fintune_bell │ │ e.py:286 in main │ │ │ │ 283 │ │ num_training_steps=(int(len(train_dataloader) / accumulate_step) NUM_EPOCHS), │ │ 284 │ ) │ │ 285 │ │ │ ❱ 286 │ model, optimizer, train_dataloader = accelerator.prepare(model, optimizer, train_dat │ │ 287 │ │ │ 288 │ accelerator.print(model) │ │ 289 │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py:1090 in prepare │ │ │ │ 1087 │ │ │ old_named_params = self._get_named_parameters(args) │ │ 1088 │ │ │ │ 1089 │ │ if self.distributed_type == DistributedType.DEEPSPEED: │ │ ❱ 1090 │ │ │ result = self._prepare_deepspeed(args) │ │ 1091 │ │ elif self.distributed_type == DistributedType.MEGATRON_LM: │ │ 1092 │ │ │ result = self._prepare_megatron_lm(args) │ │ 1093 │ │ else: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py:1368 in _prepare_deepspeed │ │ │ │ 1365 │ │ │ │ │ │ if type(scheduler).name in deepspeed.runtime.lr_schedules.VA │ │ 1366 │ │ │ │ │ │ │ kwargs["lrscheduler"] = scheduler │ │ 1367 │ │ │ │ │ ❱ 1368 │ │ │ engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs) │ │ 1369 │ │ │ if optimizer is not None: │ │ 1370 │ │ │ │ optimizer = DeepSpeedOptimizerWrapper(optimizer) │ │ 1371 │ │ │ if scheduler is not None: │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/init.py:125 in initialize │ │ │ │ 122 │ assert model is not None, "deepspeed.initialize requires a model" │ │ 123 │ │ │ 124 │ if not isinstance(model, PipelineModule): │ │ ❱ 125 │ │ engine = DeepSpeedEngine(args=args, │ │ 126 │ │ │ │ │ │ │ │ model=model, │ │ 127 │ │ │ │ │ │ │ │ optimizer=optimizer, │ │ 128 │ │ │ │ │ │ │ │ model_parameters=model_parameters, │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py:340 in init │ │ │ │ 337 │ │ │ model_parameters = list(model_parameters) │ │ 338 │ │ │ │ 339 │ │ if has_optimizer: │ │ ❱ 340 │ │ │ self._configure_optimizer(optimizer, model_parameters) │ │ 341 │ │ │ self._configure_lr_scheduler(lr_scheduler) │ │ 342 │ │ │ self._report_progress(0) │ │ 343 │ │ elif self.zero_optimization(): │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py:1298 in _configure_optimizer │ │ │ │ 1295 │ │ optimizer_wrapper = self._do_optimizer_sanity_check(basic_optimizer) │ │ 1296 │ │ │ │ 1297 │ │ if optimizer_wrapper == ZERO_OPTIMIZATION: │ │ ❱ 1298 │ │ │ self.optimizer = self._configure_zero_optimizer(basic_optimizer) │ │ 1299 │ │ elif optimizer_wrapper == AMP: │ │ 1300 │ │ │ amp_params = self.amp_params() │ │ 1301 │ │ │ log_dist(f"Initializing AMP with these params: {amp_params}", ranks=[0]) │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py:1547 in │ │ _configure_zero_optimizer │ │ │ │ 1544 │ │ │ │ │ │ "Pipeline parallelism does not support overlapped communication, │ │ 1545 │ │ │ │ │ ) │ │ 1546 │ │ │ │ │ overlap_comm = False │ │ ❱ 1547 │ │ │ optimizer = DeepSpeedZeroOptimizer( │ │ 1548 │ │ │ │ optimizer, │ │ 1549 │ │ │ │ self.param_names, │ │ 1550 │ │ │ │ timers=timers, │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py:165 in init │ │ │ │ 162 │ │ self.optimizer = init_optimizer │ │ 163 │ │ │ │ 164 │ │ # Load pre-built or JIT compile (un)flatten ops │ │ ❱ 165 │ │ util_ops = UtilsBuilder().load() │ │ 166 │ │ self.flatten = util_ops.flatten │ │ 167 │ │ self.unflatten = util_ops.unflatten │ │ 168 │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py:485 in load │ │ │ │ 482 │ │ │ │ │ 483 │ │ │ return importlib.import_module(self.absolute_name()) │ │ 484 │ │ else: │ │ ❱ 485 │ │ │ return self.jit_load(verbose) │ │ 486 │ │ │ 487 │ def jit_load(self, verbose=True): │ │ 488 │ │ if not self.is_compatible(verbose): │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py:520 in jit_load │ │ │ │ 517 │ │ │ torch_arch_list = os.environ.get("TORCH_CUDA_ARCH_LIST") │ │ 518 │ │ │ os.environ["TORCH_CUDA_ARCH_LIST"] = "" │ │ 519 │ │ │ │ ❱ 520 │ │ op_module = load( │ │ 521 │ │ │ name=self.name, │ │ 522 │ │ │ sources=self.strip_empty_entries(sources), │ │ 523 │ │ │ extra_include_paths=self.strip_empty_entries(extra_include_paths), │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1284 in load │ │ │ │ 1281 │ │ ... extra_cflags=['-O2'], │ │ 1282 │ │ ... verbose=True) │ │ 1283 │ ''' │ │ ❱ 1284 │ return _jit_compile( │ │ 1285 │ │ name, │ │ 1286 │ │ [sources] if isinstance(sources, str) else sources, │ │ 1287 │ │ extra_cflags, │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1508 in _jit_compile │ │ │ │ 1505 │ │ │ │ │ │ │ │ 1506 │ │ │ │ │ │ sources = list(hipified_sources) │ │ 1507 │ │ │ │ │ │ │ ❱ 1508 │ │ │ │ │ _write_ninja_file_and_build_library( │ │ 1509 │ │ │ │ │ │ name=name, │ │ 1510 │ │ │ │ │ │ sources=sources, │ │ 1511 │ │ │ │ │ │ extra_cflags=extra_cflags or [], │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1597 in │ │ _write_ninja_file_and_build_library │ │ │ │ 1594 │ │ compiler = os.environ.get('CXX', 'cl') │ │ 1595 │ else: │ │ 1596 │ │ compiler = os.environ.get('CXX', 'c++') │ │ ❱ 1597 │ get_compiler_abi_compatibility_and_version(compiler) │ │ 1598 │ if with_cuda is None: │ │ 1599 │ │ with_cuda = any(map(_is_cuda_file, sources)) │ │ 1600 │ extra_ldflags = _prepare_ldflags( │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:336 in │ │ get_compiler_abi_compatibility_and_version │ │ │ │ 333 │ │ return (True, TorchVersion('0.0.0')) │ │ 334 │ │ │ 335 │ # First check if the compiler is one of the expected ones for the particular platfor │ │ ❱ 336 │ if not check_compiler_ok_for_platform(compiler): │ │ 337 │ │ warnings.warn(WRONG_COMPILER_WARNING.format( │ │ 338 │ │ │ user_compiler=compiler, │ │ 339 │ │ │ pytorch_compiler=_accepted_compilers_for_platform()[0], │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:290 in │ │ check_compiler_ok_for_platform │ │ │ │ 287 │ ''' │ │ 288 │ if IS_WINDOWS: │ │ 289 │ │ return True │ │ ❱ 290 │ which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT) │ │ 291 │ # Use os.path.realpath to resolve any symlinks, in particular from 'c++' to e.g. 'g+ │ │ 292 │ compiler_path = os.path.realpath(which.decode(SUBPROCESS_DECODE_ARGS).strip()) │ │ 293 │ # Check the compiler name │ │ │ │ /opt/conda/lib/python3.10/subprocess.py:421 in check_output │ │ │ │ 418 │ │ │ empty = b'' │ │ 419 │ │ kwargs['input'] = empty │ │ 420 │ │ │ ❱ 421 │ return run(popenargs, stdout=PIPE, timeout=timeout, check=True, │ │ 422 │ │ │ **kwargs).stdout │ │ 423 │ │ 424 │ │ │ │ /opt/conda/lib/python3.10/subprocess.py:526 in run │ │ │ │ 523 │ │ │ raise │ │ 524 │ │ retcode = process.poll() │ │ 525 │ │ if check and retcode: │ │ ❱ 526 │ │ │ raise CalledProcessError(retcode, process.args, │ │ 527 │ │ │ │ │ │ │ │ │ output=stdout, stderr=stderr) │ │ 528 │ return CompletedProcess(process.args, retcode, stdout, stderr) │ │ 529 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['which', 'c++']' returned non-zero exit status 1.

当我修正了model的类使用AutoModel加载，并且修改成了load_in_8bit=True,我的模型成功加载了，因为我当初下载模型的时候配置就是这样的。使用accelerate launch执行很遗憾还是出错了，我还有一点疑惑的就是为什么你的源代码是使用的另一个类加载的chatglm？我看有很多人的代码都无一例外得引入了modeling_chatglm这个py文件，这是为什么？

liangwq commented 1 year ago

Creating extension directory /root/.cache/torch_extensions/py310_cu116/utils... Using /root/.cache/torch_extensions/py310_cu116 as PyTorch extensions root... ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/Chatglm_lora_multi-gpu/multi_gpu_fintune_bell │ │ e.py:357 in │ │ │ │ 354 │ │ 355 │ │ 356 if name == "main": │ │ ❱ 357 │ main() │ │ 358 │ │ 359 # accelerate方式调用 │ │ 360 # accelerate launch --config_file accelerate_ds_zero3_cpu_offload_config.yaml multi_gpu │ │ │ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/Chatglm_lora_multi-gpu/multi_gpu_fintune_bell │ │ e.py:286 in main │ │ │ │ 283 │ │ num_training_steps=(int(len(train_dataloader) / accumulate_step) NUM_EPOCHS), │ │ 284 │ ) │ │ 285 │ │ │ ❱ 286 │ model, optimizer, train_dataloader = accelerator.prepare(model, optimizer, train_dat │ │ 287 │ │ │ 288 │ accelerator.print(model) │ │ 289 │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py:1090 in prepare │ │ │ │ 1087 │ │ │ old_named_params = self._get_named_parameters(args) │ │ 1088 │ │ │ │ 1089 │ │ if self.distributed_type == DistributedType.DEEPSPEED: │ │ ❱ 1090 │ │ │ result = self._prepare_deepspeed(args) │ │ 1091 │ │ elif self.distributed_type == DistributedType.MEGATRON_LM: │ │ 1092 │ │ │ result = self._prepare_megatron_lm(args) │ │ 1093 │ │ else: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py:1368 in _prepare_deepspeed │ │ │ │ 1365 │ │ │ │ │ │ if type(scheduler).name in deepspeed.runtime.lr_schedules.VA │ │ 1366 │ │ │ │ │ │ │ kwargs["lrscheduler"] = scheduler │ │ 1367 │ │ │ │ │ ❱ 1368 │ │ │ engine, optimizer, , lr_scheduler = deepspeed.initialize(kwargs) │ │ 1369 │ │ │ if optimizer is not None: │ │ 1370 │ │ │ │ optimizer = DeepSpeedOptimizerWrapper(optimizer) │ │ 1371 │ │ │ if scheduler is not None: │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/init.py:125 in initialize │ │ │ │ 122 │ assert model is not None, "deepspeed.initialize requires a model" │ │ 123 │ │ │ 124 │ if not isinstance(model, PipelineModule): │ │ ❱ 125 │ │ engine = DeepSpeedEngine(args=args, │ │ 126 │ │ │ │ │ │ │ │ model=model, │ │ 127 │ │ │ │ │ │ │ │ optimizer=optimizer, │ │ 128 │ │ │ │ │ │ │ │ model_parameters=model_parameters, │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py:340 in init │ │ │ │ 337 │ │ │ model_parameters = list(model_parameters) │ │ 338 │ │ │ │ 339 │ │ if has_optimizer: │ │ ❱ 340 │ │ │ self._configure_optimizer(optimizer, model_parameters) │ │ 341 │ │ │ self._configure_lr_scheduler(lr_scheduler) │ │ 342 │ │ │ self._report_progress(0) │ │ 343 │ │ elif self.zero_optimization(): │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py:1298 in _configure_optimizer │ │ │ │ 1295 │ │ optimizer_wrapper = self._do_optimizer_sanity_check(basic_optimizer) │ │ 1296 │ │ │ │ 1297 │ │ if optimizer_wrapper == ZERO_OPTIMIZATION: │ │ ❱ 1298 │ │ │ self.optimizer = self._configure_zero_optimizer(basic_optimizer) │ │ 1299 │ │ elif optimizer_wrapper == AMP: │ │ 1300 │ │ │ amp_params = self.amp_params() │ │ 1301 │ │ │ log_dist(f"Initializing AMP with these params: {amp_params}", ranks=[0]) │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py:1547 in │ │ _configure_zero_optimizer │ │ │ │ 1544 │ │ │ │ │ │ "Pipeline parallelism does not support overlapped communication, │ │ 1545 │ │ │ │ │ ) │ │ 1546 │ │ │ │ │ overlap_comm = False │ │ ❱ 1547 │ │ │ optimizer = DeepSpeedZeroOptimizer( │ │ 1548 │ │ │ │ optimizer, │ │ 1549 │ │ │ │ self.param_names, │ │ 1550 │ │ │ │ timers=timers, │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py:165 in init │ │ │ │ 162 │ │ self.optimizer = init_optimizer │ │ 163 │ │ │ │ 164 │ │ # Load pre-built or JIT compile (un)flatten ops │ │ ❱ 165 │ │ util_ops = UtilsBuilder().load() │ │ 166 │ │ self.flatten = util_ops.flatten │ │ 167 │ │ self.unflatten = util_ops.unflatten │ │ 168 │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py:485 in load │ │ │ │ 482 │ │ │ │ │ 483 │ │ │ return importlib.import_module(self.absolute_name()) │ │ 484 │ │ else: │ │ ❱ 485 │ │ │ return self.jit_load(verbose) │ │ 486 │ │ │ 487 │ def jit_load(self, verbose=True): │ │ 488 │ │ if not self.is_compatible(verbose): │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py:520 in jit_load │ │ │ │ 517 │ │ │ torch_arch_list = os.environ.get("TORCH_CUDA_ARCH_LIST") │ │ 518 │ │ │ os.environ["TORCH_CUDA_ARCH_LIST"] = "" │ │ 519 │ │ │ │ ❱ 520 │ │ op_module = load( │ │ 521 │ │ │ name=self.name, │ │ 522 │ │ │ sources=self.strip_empty_entries(sources), │ │ 523 │ │ │ extra_include_paths=self.strip_empty_entries(extra_include_paths), │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1284 in load │ │ │ │ 1281 │ │ ... extra_cflags=['-O2'], │ │ 1282 │ │ ... verbose=True) │ │ 1283 │ ''' │ │ ❱ 1284 │ return _jit_compile( │ │ 1285 │ │ name, │ │ 1286 │ │ [sources] if isinstance(sources, str) else sources, │ │ 1287 │ │ extra_cflags, │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1508 in _jit_compile │ │ │ │ 1505 │ │ │ │ │ │ │ │ 1506 │ │ │ │ │ │ sources = list(hipified_sources) │ │ 1507 │ │ │ │ │ │ │ ❱ 1508 │ │ │ │ │ _write_ninja_file_and_build_library( │ │ 1509 │ │ │ │ │ │ name=name, │ │ 1510 │ │ │ │ │ │ sources=sources, │ │ 1511 │ │ │ │ │ │ extra_cflags=extra_cflags or [], │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1597 in │ │ _write_ninja_file_and_build_library │ │ │ │ 1594 │ │ compiler = os.environ.get('CXX', 'cl') │ │ 1595 │ else: │ │ 1596 │ │ compiler = os.environ.get('CXX', 'c++') │ │ ❱ 1597 │ get_compiler_abi_compatibility_and_version(compiler) │ │ 1598 │ if with_cuda is None: │ │ 1599 │ │ with_cuda = any(map(_is_cuda_file, sources)) │ │ 1600 │ extra_ldflags = _prepare_ldflags( │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:336 in │ │ get_compiler_abi_compatibility_and_version │ │ │ │ 333 │ │ return (True, TorchVersion('0.0.0')) │ │ 334 │ │ │ 335 │ # First check if the compiler is one of the expected ones for the particular platfor │ │ ❱ 336 │ if not check_compiler_ok_for_platform(compiler): │ │ 337 │ │ warnings.warn(WRONG_COMPILER_WARNING.format( │ │ 338 │ │ │ user_compiler=compiler, │ │ 339 │ │ │ pytorch_compiler=_accepted_compilers_for_platform()[0], │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:290 in │ │ check_compiler_ok_for_platform │ │ │ │ 287 │ ''' │ │ 288 │ if IS_WINDOWS: │ │ 289 │ │ return True │ │ ❱ 290 │ which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT) │ │ 291 │ # Use os.path.realpath to resolve any symlinks, in particular from 'c++' to e.g. 'g+ │ │ 292 │ compiler_path = os.path.realpath(which.decode(SUBPROCESS_DECODE_ARGS).strip()) │ │ 293 │ # Check the compiler name │ │ │ │ /opt/conda/lib/python3.10/subprocess.py:421 in check_output │ │ │ │ 418 │ │ │ empty = b'' │ │ 419 │ │ kwargs['input'] = empty │ │ 420 │ │ │ ❱ 421 │ return run(popenargs, stdout=PIPE, timeout=timeout, check=True, │ │ 422 │ │ │ kwargs).stdout │ │ 423 │ │ 424 │ │ │ │ /opt/conda/lib/python3.10/subprocess.py:526 in run │ │ │ │ 523 │ │ │ raise │ │ 524 │ │ retcode = process.poll() │ │ 525 │ │ if check and retcode: │ │ ❱ 526 │ │ │ raise CalledProcessError(retcode, process.args, │ │ 527 │ │ │ │ │ │ │ │ │ output=stdout, stderr=stderr) │ │ 528 │ return CompletedProcess(process.args, retcode, stdout, stderr) │ │ 529 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['which', 'c++']' returned non-zero exit status 1.

当我修正了model的类使用AutoModel加载，并且修改成了load_in_8bit=True,我的模型成功加载了，因为我当初下载模型的时候配置就是这样的。使用accelerate launch执行很遗憾还是出错了，我还有一点疑惑的就是为什么你的源代码是使用的另一个类加载的chatglm？我看有很多人的代码都无一例外得引入了modeling_chatglm这个py文件，这是为什么？

modeling_chatglm是chatglm的模型，automodel之所以可以可以不用这个是因为transformer集成了这个类，但有些小bug所以大家都自己带这个模型

lmx760581375 commented 1 year ago

/opt/conda/lib/python3.10/site-packages/torch/cuda/memory.py:282: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats. warnings.warn( 0%| | 0/251 [00:00<?, ?it/s] /opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/Chatglm_lora_multi-gpu/multi_gpu_fintune_bell │ │ e.py:357 in │ │ │ │ 354 │ │ 355 │ │ 356 if name == "main": │ │ ❱ 357 │ main() │ │ 358 │ │ 359 # accelerate方式调用 │ │ 360 # accelerate launch --config_file accelerate_ds_zero3_cpu_offload_config.yaml multi_gpu │ │ │ │ /apdcephfs_cq3/share_1567347/share_info/mingxiaoli/Chatglm_lora_multi-gpu/multi_gpu_fintune_bell │ │ e.py:313 in main │ │ │ │ 310 │ │ │ │ │ │ │ │ │ os.path.join(path, "chatglm-lora.pt")) │ │ 311 │ │ │ │ │ # save_tunable_parameters(model, os.path.join(path, "chatglm-lora.pt │ │ 312 │ │ │ │ i += 1 │ │ ❱ 313 │ │ │ │ outputs = model(batch) │ │ 314 │ │ │ │ loss_detach = outputs.loss.detach().cpu().float() │ │ 315 │ │ │ │ t.set_description(f"loss: {loss_detach}") │ │ 316 │ │ │ │ total_loss += loss_detach │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, *kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/utils/nvtx.py:11 in wrapped_fn │ │ │ │ 8 │ function call.""" │ │ 9 │ def wrapped_fn(args, kwargs): │ │ 10 │ │ get_accelerator().range_push(func.qualname) │ │ ❱ 11 │ │ ret_val = func(*args, kwargs) │ │ 12 │ │ get_accelerator().range_pop() │ │ 13 │ │ return ret_val │ │ 14 │ │ │ │ /opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py:1846 in forward │ │ │ │ 1843 │ │ if self.fp16_auto_cast(): │ │ 1844 │ │ │ inputs = self._cast_inputs_half(inputs) │ │ 1845 │ │ │ │ ❱ 1846 │ │ loss = self.module(*inputs, *kwargs) │ │ 1847 │ │ │ │ 1848 │ │ if self.zero_optimization_partition_weights(): │ │ 1849 │ │ │ # Disable automated discovery of external parameters │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(input, kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/peft/peft_model.py:575 in forward │ │ │ │ 572 │ │ kwargs, │ │ 573 │ ): │ │ 574 │ │ if not isinstance(self.peft_config, PromptLearningConfig): │ │ ❱ 575 │ │ │ return self.base_model( │ │ 576 │ │ │ │ input_ids=input_ids, │ │ 577 │ │ │ │ attention_mask=attention_mask, │ │ 578 │ │ │ │ inputs_embeds=inputs_embeds, │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, *kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(args, kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(*args, kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /root/.cache/huggingface/modules/transformers_modules/chatglm-6b/modeling_chatglm.py:1160 in │ │ forward │ │ │ │ 1157 │ │ use_cache = use_cache if use_cache is not None else self.config.use_cache │ │ 1158 │ │ return_dict = return_dict if return_dict is not None else self.config.use_return │ │ 1159 │ │ │ │ ❱ 1160 │ │ transformer_outputs = self.transformer( │ │ 1161 │ │ │ input_ids=input_ids, │ │ 1162 │ │ │ position_ids=position_ids, │ │ 1163 │ │ │ attention_mask=attention_mask, │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, *kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(args, kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(*args, kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /root/.cache/huggingface/modules/transformers_modules/chatglm-6b/modeling_chatglm.py:907 in │ │ forward │ │ │ │ 904 │ │ │ raise ValueError("You have to specify either input_ids or inputs_embeds") │ │ 905 │ │ │ │ 906 │ │ if inputs_embeds is None: │ │ ❱ 907 │ │ │ inputs_embeds = self.word_embeddings(input_ids) │ │ 908 │ │ │ │ 909 │ │ if past_key_values is None: │ │ 910 │ │ │ if self.pre_seq_len is not None: │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, *kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /opt/conda/lib/python3.10/site-packages/accelerate/hooks.py:165 in new_forward │ │ │ │ 162 │ │ │ with torch.no_grad(): │ │ 163 │ │ │ │ output = old_forward(args, kwargs) │ │ 164 │ │ else: │ │ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │ │ 166 │ │ return module._hf_hook.post_forward(module, output) │ │ 167 │ │ │ 168 │ module.forward = new_forward │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py:160 in forward │ │ │ │ 157 │ │ │ │ self.weight[self.paddingidx].fill(0) │ │ 158 │ │ │ 159 │ def forward(self, input: Tensor) -> Tensor: │ │ ❱ 160 │ │ return F.embedding( │ │ 161 │ │ │ input, self.weight, self.padding_idx, self.max_norm, │ │ 162 │ │ │ self.norm_type, self.scale_grad_by_freq, self.sparse) │ │ 163 │ │ │ │ /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py:2210 in embedding │ │ │ │ 2207 │ │ # torch.embeddingrenorm │ │ 2208 │ │ # remove once script supports set_grad_enabled │ │ 2209 │ │ _no_grad_embeddingrenorm(weight, input, max_norm, norm_type) │ │ ❱ 2210 │ return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) │ │ 2211 │ │ 2212 │ │ 2213 def embedding_bag( │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper__index_select) loss: 8.75: 0%| | 0/251 [00:01<?, ?it/s]

接着上一个问题，我发现可能是我环境我错误，于是我执行了以下命令： apt-get install build-essential 于是我重新运行了，又出现了个新的问题，

lmx760581375 commented 1 year ago

@liangwq 我解决了这个问题，但是解决的方式非常玄学。。。

我的model是用autoModel下载到本地的，于是当我在一个新的容器上没用使用automodel下载过的模型去加载我共享盘下好的chatglm进行ddp的多卡训练时候，就会报这个错误，但是当我把模型modeling_chatglm跟tokenizer_chatglm源代码拉出来，然后改成ChatGLM的tokenizer跟model的类型再跑一遍会报一个维度错误的error，但是当我进行了以上两步之后再改回去AutoModel跟AutoTokenizer，神奇的是居然不报错了，顺利的ddp跑起来了。。

这个流程我试了两遍，简直玄学。。。

liangwq commented 1 year ago

@liangwq 我解决了这个问题，但是解决的方式非常玄学。。。

我的model是用autoModel下载到本地的，于是当我在一个新的容器上没用使用automodel下载过的模型去加载我共享盘下好的chatglm进行ddp的多卡训练时候，就会报这个错误，但是当我把模型modeling_chatglm跟tokenizer_chatglm源代码拉出来，然后改成ChatGLM的tokenizer跟model的类型再跑一遍会报一个维度错误的error，但是当我进行了以上两步之后再改回去AutoModel跟AutoTokenizer，神奇的是居然不报错了，顺利的ddp跑起来了。。

这个流程我试了两遍，简直玄学。。。

这个原因在于，hf上集成的modeling_chat和我给的是有区别的，如果按hf给的modeling_chat加载后，用我的代码跑格式会有微小差异，（你可以打开pretrain model路径对比两份文件差异），统一imort的模型和代码中引用的pretain model格式这个问题可以避免

liangwq / Chatglm_lora_multi-gpu

RuntimeError: expected scalar type Half but found Float #18

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

===================================BUG REPORT===================================