补充错误信息:
Traceback (most recent call last):
File "/ML-A100/team/align//code/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in
launch()
File "/ML-A100/team/align//code/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
run_exp()
File "/ML-A100/team/align//code/LLaMA-Factory/src/llamafactory/train/tuner.py", line 52, in run_exp
run_rm(model_args, data_args, training_args, finetuning_args, callbacks)
File "/ML-A100/team/align//code/LLaMA-Factory/src/llamafactory/train/rm/workflow.py", line 87, in run_rm
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/trainer.py", line 1932, in train
return inner_training_loop(
File "/root/miniconda3/lib/python3.10/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/trainer.py", line 3307, in training_step
loss = self.compute_loss(model, inputs)
File "/ML-A100/team/align//code/LLaMA-Factory/src/llamafactory/train/rm/trainer.py", line 111, in computeloss
, _, values = model(inputs, output_hidden_states=True, return_dict=True)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1842, in forward
loss = self.module(*inputs, *kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
result = forward_call(*args, kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/trl/models/modeling_value_head.py", line 171, in forward
base_model_output = self.pretrained_model(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
result = forward_call(args, kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1174, in forward
outputs = self.model(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
result = forward_call(args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 950, in forward
causal_mask = self._update_causal_mask(
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1071, in _update_causal_mask
causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Reminder
System Info
llamafactory
version: 0.8.3.dev0 Platform: Linux-5.4.210-4-velinux1-amd64-x86_64-with-glibc2.31 Python version: 3.10.12 PyTorch version: 2.1.0+cu118 (GPU) Transformers version: 4.42.3 Datasets version: 2.20.0 Accelerate version: 0.30.1 PEFT version: 0.11.1 TRL version: 0.8.6 GPU type: NVIDIA A800-SXM4-80GB DeepSpeed version: 0.13.1 Bitsandbytes version: 0.42.0Reproduction
model
model_name_or_path: /ML-A100/team/align/public/models/Yi-34B-Chat-0205
method
stage: rm do_train: true finetuning_type: full deepspeed: examples/deepspeed/ds_z3_config.json
dataset
dataset: dpo_en_demo template: llama3 cutoff_len: 1024 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16 use_fast_tokenizer: False
output
output_dir: saves/llama3-8b/full/reward logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true
train
per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000
eval
val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500
已经在yaml中增加use_fast_tokenizer
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [171,0,0], thread: [57,0,0] Assertion
srcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [171,0,0], thread: [58,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [171,0,0], thread: [59,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [171,0,0], thread: [60,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [171,0,0], thread: [61,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [171,0,0], thread: [62,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [171,0,0], thread: [63,0,0] AssertionsrcIndex < srcSelectDimSize
failed. [E ProcessGroupNCCL.cpp:915] [Rank 7] NCCL watchdog thread terminated with exception: CUDA error: device-side assert triggeredExpected behavior
No response
Others
No response