Open Zjq9409 opened 2 months ago
optimum-habana 1.13.2 +-----------------------------------------------------------------------------+ | HL-SMI Version: hl-1.17.1-fw-51.5.0 | | Driver Version: 1.17.1-78932ae |
examples
download Qwen1.5-14B weight from: https://huggingface.co/Qwen/Qwen1.5-14B
https://huggingface.co/Qwen/Qwen1.5-14B
cd optimum-habana/examples/language-modeling
python ../gaudi_spawn.py \ --world_size 8 --use_deepspeed run_clm.py \ --model_name_or_path /data/models/Qwen1.5-7B-Chat/ \ --dataset_name wikitext \ --dataset_config_name wikitext-2-raw-v1 \ --per_device_train_batch_size 6 \ --per_device_eval_batch_size 4 \ --do_train \ --do_eval \ --output_dir /tmp/test-clm-xl-1 \ --gaudi_config_name ./gaudi_config.json \ --use_habana \ --logging_steps 1 \ --use_lazy_mode \ --gradient_checkpointing \ --use_hpu_graphs_for_inference \ --throughput_warmup_steps 3 \ --overwrite_output_dir \ --deepspeed ./llama2_ds_zero3_config.json
The running error log is as follows:
[2024-09-17 07:57:31,077] [INFO] [checkpointing.py:542:forward] Activation Checkpointing Information [2024-09-17 07:57:31,078] [INFO] [checkpointing.py:543:forward] ----Partition Activations False, CPU CHECKPOINTING False [2024-09-17 07:57:31,078] [INFO] [checkpointing.py:544:forward] ----contiguous Memory Checkpointing False with None total layers [2024-09-17 07:57:31,078] [INFO] [checkpointing.py:546:forward] ----Synchronization False [2024-09-17 07:57:31,078] [INFO] [checkpointing.py:547:forward] ----Profiling time in checkpointing False [rank3]: Traceback (most recent call last): [rank3]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank3]: main() [rank3]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank3]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank3]: return inner_training_loop( [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank3]: tr_loss_step = self.training_step(model, inputs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank3]: loss = self.compute_loss(model, inputs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank3]: outputs = model(**inputs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank3]: return self._call_impl(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank3]: return forward_call(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank3]: ret_val = func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank3]: loss = self.module(*inputs, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank3]: return self._call_impl(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank3]: result = forward_call(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank3]: outputs = self.model( [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank3]: return self._call_impl(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank3]: result = forward_call(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank3]: layer_outputs = self._gradient_checkpointing_func( [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank3]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank3]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank3]: outputs = run_function(*inputs_cuda) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank3]: return self._call_impl(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank3]: result = forward_call(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank3]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank3]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank3]: attn_output = self.o_proj(attn_output) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank3]: return self._call_impl(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank3]: args_result = hook(self, args) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank3]: ret_val = func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank3]: self.pre_sub_module_forward_function(module) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank3]: return func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank3]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank3]: return fn(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank3]: ret_val = func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank3]: return func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank3]: self.__all_gather_params(params_to_fetch, forward) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank3]: ret_val = func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank3]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank3]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank3]: ret_val = func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank3]: handles = _dist_allgather_fn( [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank3]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank3]: ret_val = func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank3]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank3]: return func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank3]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank3]: return fn(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank3]: return self.all_gather_function(output_tensor=output_tensor, [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank3]: return func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank3]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank3]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure]. [rank6]: Traceback (most recent call last): [rank6]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank6]: main() [rank6]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank6]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank6]: return inner_training_loop( [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank6]: tr_loss_step = self.training_step(model, inputs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank6]: loss = self.compute_loss(model, inputs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank6]: outputs = model(**inputs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank6]: return self._call_impl(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank6]: return forward_call(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank6]: ret_val = func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank6]: loss = self.module(*inputs, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank6]: return self._call_impl(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank6]: result = forward_call(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank6]: outputs = self.model( [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank6]: return self._call_impl(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank6]: result = forward_call(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank6]: layer_outputs = self._gradient_checkpointing_func( [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank6]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank6]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank6]: outputs = run_function(*inputs_cuda) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank6]: return self._call_impl(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank6]: result = forward_call(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank6]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank6]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank6]: attn_output = self.o_proj(attn_output) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank6]: return self._call_impl(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank6]: args_result = hook(self, args) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank6]: ret_val = func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank6]: self.pre_sub_module_forward_function(module) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank6]: return func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank6]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank6]: return fn(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank6]: ret_val = func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank6]: return func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank6]: self.__all_gather_params(params_to_fetch, forward) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank6]: ret_val = func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank6]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank6]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank6]: ret_val = func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank6]: handles = _dist_allgather_fn( [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank6]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank6]: ret_val = func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank6]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank6]: return func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank6]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank6]: return fn(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank6]: return self.all_gather_function(output_tensor=output_tensor, [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank6]: return func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank6]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank6]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure]. [rank1]: Traceback (most recent call last): [rank1]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank1]: main() [rank1]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank1]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank1]: return inner_training_loop( [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank1]: tr_loss_step = self.training_step(model, inputs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank1]: loss = self.compute_loss(model, inputs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank1]: outputs = model(**inputs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank1]: ret_val = func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank1]: loss = self.module(*inputs, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank1]: result = forward_call(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank1]: outputs = self.model( [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank1]: result = forward_call(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank1]: layer_outputs = self._gradient_checkpointing_func( [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank1]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank1]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank1]: outputs = run_function(*inputs_cuda) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank1]: result = forward_call(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank1]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank1]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank1]: attn_output = self.o_proj(attn_output) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank1]: args_result = hook(self, args) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank1]: ret_val = func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank1]: self.pre_sub_module_forward_function(module) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank1]: return func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank1]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank1]: return fn(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank1]: ret_val = func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank1]: return func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank1]: self.__all_gather_params(params_to_fetch, forward) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank1]: ret_val = func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank1]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank1]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank1]: ret_val = func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank1]: handles = _dist_allgather_fn( [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank1]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank1]: ret_val = func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank1]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank1]: return func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank1]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank1]: return fn(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank1]: return self.all_gather_function(output_tensor=output_tensor, [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank1]: return func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank1]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank1]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure]. [rank4]: Traceback (most recent call last): [rank4]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank4]: main() [rank4]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank4]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank4]: return inner_training_loop( [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank4]: tr_loss_step = self.training_step(model, inputs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank4]: loss = self.compute_loss(model, inputs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank4]: outputs = model(**inputs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank4]: return self._call_impl(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank4]: return forward_call(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank4]: ret_val = func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank4]: loss = self.module(*inputs, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank4]: return self._call_impl(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank4]: result = forward_call(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank4]: outputs = self.model( [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank4]: return self._call_impl(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank4]: result = forward_call(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank4]: layer_outputs = self._gradient_checkpointing_func( [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank4]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank4]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank4]: outputs = run_function(*inputs_cuda) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank4]: return self._call_impl(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank4]: result = forward_call(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank4]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank4]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank4]: attn_output = self.o_proj(attn_output) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank4]: return self._call_impl(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank4]: args_result = hook(self, args) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank4]: ret_val = func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank4]: self.pre_sub_module_forward_function(module) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank4]: return func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank4]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank4]: return fn(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank4]: ret_val = func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank4]: return func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank4]: self.__all_gather_params(params_to_fetch, forward) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank4]: ret_val = func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank4]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank4]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank4]: ret_val = func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank4]: handles = _dist_allgather_fn( [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank4]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank4]: ret_val = func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank4]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank4]: return func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank4]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank4]: return fn(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank4]: return self.all_gather_function(output_tensor=output_tensor, [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank4]: return func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank4]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank4]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure]. [rank5]: Traceback (most recent call last): [rank5]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank5]: main() [rank5]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank5]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank5]: return inner_training_loop( [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank5]: tr_loss_step = self.training_step(model, inputs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank5]: loss = self.compute_loss(model, inputs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank5]: outputs = model(**inputs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank5]: return forward_call(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank5]: loss = self.module(*inputs, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank5]: result = forward_call(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank5]: outputs = self.model( [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank5]: result = forward_call(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank5]: layer_outputs = self._gradient_checkpointing_func( [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank5]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank5]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank5]: outputs = run_function(*inputs_cuda) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank5]: result = forward_call(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank5]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank5]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank5]: attn_output = self.o_proj(attn_output) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank5]: args_result = hook(self, args) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank5]: self.pre_sub_module_forward_function(module) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank5]: return func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank5]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank5]: return fn(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank5]: return func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank5]: self.__all_gather_params(params_to_fetch, forward) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank5]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank5]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank5]: handles = _dist_allgather_fn( [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank5]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank5]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank5]: return func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank5]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank5]: return fn(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank5]: return self.all_gather_function(output_tensor=output_tensor, [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank5]: return func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank5]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank5]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure]. [rank7]: Traceback (most recent call last): [rank7]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank7]: main() [rank7]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank7]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank7]: return inner_training_loop( [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank7]: tr_loss_step = self.training_step(model, inputs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank7]: loss = self.compute_loss(model, inputs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank7]: outputs = model(**inputs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank7]: return forward_call(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank7]: ret_val = func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank7]: loss = self.module(*inputs, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank7]: result = forward_call(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank7]: outputs = self.model( [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank7]: result = forward_call(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank7]: layer_outputs = self._gradient_checkpointing_func( [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank7]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank7]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank7]: outputs = run_function(*inputs_cuda) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank7]: result = forward_call(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank7]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank7]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank7]: attn_output = self.o_proj(attn_output) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank7]: args_result = hook(self, args) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank7]: ret_val = func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank7]: self.pre_sub_module_forward_function(module) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank7]: return func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank7]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank7]: return fn(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank7]: ret_val = func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank7]: return func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank7]: self.__all_gather_params(params_to_fetch, forward) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank7]: ret_val = func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank7]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank7]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank7]: ret_val = func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank7]: handles = _dist_allgather_fn( [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank7]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank7]: ret_val = func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank7]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank7]: return func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank7]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank7]: return fn(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank7]: return self.all_gather_function(output_tensor=output_tensor, [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank7]: return func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank7]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank7]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure]. [rank0]: Traceback (most recent call last): [rank0]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank0]: main() [rank0]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank0]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank0]: return inner_training_loop( [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank0]: tr_loss_step = self.training_step(model, inputs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank0]: loss = self.compute_loss(model, inputs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank0]: outputs = model(**inputs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank0]: loss = self.module(*inputs, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank0]: outputs = self.model( [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank0]: layer_outputs = self._gradient_checkpointing_func( [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank0]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank0]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank0]: outputs = run_function(*inputs_cuda) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank0]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank0]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank0]: attn_output = self.o_proj(attn_output) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank0]: args_result = hook(self, args) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank0]: self.pre_sub_module_forward_function(module) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank0]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank0]: return fn(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank0]: self.__all_gather_params(params_to_fetch, forward) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank0]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank0]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank0]: handles = _dist_allgather_fn( [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank0]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank0]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank0]: return func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank0]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank0]: return fn(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank0]: return self.all_gather_function(output_tensor=output_tensor, [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank0]: return func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank0]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank0]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure].
Can successfully run Qwen1.5-14B with full parameter fine-tuning.
I can reproduce it, cc @libinta
@Zjq9409 Have you tried qwen finetune from examples/trl side?
System Info
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
download Qwen1.5-14B weight from:
https://huggingface.co/Qwen/Qwen1.5-14B
The running error log is as follows:
Expected behavior
Can successfully run Qwen1.5-14B with full parameter fine-tuning.