lin-dy commented 2 weeks ago

Reminder

[X] I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-5.4.119-1-tlinux4-0010.3-x86_64-with-glibc2.38
Python version: 3.11.0
PyTorch version: 2.4.0+cu121 (GPU)
Transformers version: 4.44.2
Datasets version: 2.21.0
Accelerate version: 0.34.2
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: A800-SXM4-80GB
vLLM version: 0.6.0

Reproduction

命令：llamafactory-cli train examples/train_lora/my_lora_sft.yaml 参数：

model

model_name_or_path: /data/model/Qwen2.5-Math-7B-Instruct

method

stage: sft do_train: true finetuning_type: lora lora_target: all

dataset

dataset: *** mix_strategy: interleave_under interleave_probs: 0.8,0.2

dataset: original_train_data_ragprompt

template: empty cutoff_len: 8192 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: *** logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 2 gradient_accumulation_steps: 8 learning_rate: 1.0e-5 num_train_epochs: 5 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000

eval

val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500 终端输出： 09/25/2024 14:44:11 - INFO - llamafactory.data.loader - Loading dataset rag_data/unrelated_train_data_ragpgpt4ans.json... 09/25/2024 14:44:11 - WARNING - llamafactory.data.data_utils - We recommend using mix_strategy=concat inaming mode. 09/25/2024 14:44:11 - WARNING - llamafactory.data.data_utils - We recommend using mix_strategy=concat inaming mode. 09/25/2024 14:44:12 - WARNING - llamafactory.data.data_utils - We recommend using mix_strategy=concat inaming mode. 09/25/2024 14:44:12 - WARNING - llamafactory.data.data_utils - We recommend using mix_strategy=concat inaming mode. 09/25/2024 14:44:12 - WARNING - llamafactory.data.data_utils - We recommend using mix_strategy=concat inaming mode. 09/25/2024 14:44:13 - WARNING - llamafactory.data.data_utils - We recommend using mix_strategy=concat inaming mode. [INFO|configuration_utils.py:731] 2024-09-25 14:44:13,103 >> loading configuration file /data/model/Qwen2.-Instruct/config.json [INFO|configuration_utils.py:800] 2024-09-25 14:44:13,104 >> Model config Qwen2Config { "_name_or_path": "/data/model/Qwen2.5-Math-7B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 4096, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_theta": 10000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.44.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|modeling_utils.py:3675] 2024-09-25 14:44:13,131 >> loading weights file /data/model/Qwen2.5-Math-7B-model.safetensors.index.json [INFO|modeling_utils.py:1606] 2024-09-25 14:44:13,132 >> Instantiating Qwen2ForCausalLM model under defaulorch.bfloat16. [INFO|configuration_utils.py:1038] 2024-09-25 14:44:13,132 >> Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151645 } Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 4/4 [00:04<00:00, 09/25/2024 14:44:17 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled09/25/2024 14:44:17 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster trainiference. 09/25/2024 14:44:17 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 09/25/2024 14:44:17 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 09/25/2024 14:44:17 - INFO - llamafactory.model.model_utils.misc - Found linear modules: up_proj,o_proj,v_oj,gate_proj,down_proj,k_proj 09/25/2024 14:44:18 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635|| trainable%: 0.2643 Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 4/4 [00:05<00:00, [INFO|modeling_utils.py:4507] 2024-09-25 14:44:18,275 >> All model checkpoint weights were used when initiwen2ForCausalLM. [INFO|modeling_utils.py:4515] 2024-09-25 14:44:18,275 >> All the weights of Qwen2ForCausalLM were initialithe model checkpoint at /data/model/Qwen2.5-Math-7B-Instruct. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2FM for predictions without further training. [INFO|configuration_utils.py:991] 2024-09-25 14:44:18,277 >> loading configuration file /data/model/Qwen2.-Instruct/generation_config.json [INFO|configuration_utils.py:1038] 2024-09-25 14:44:18,277 >> Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643 } 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster trainiference. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.misc - Found linear modules: o_proj,gate_proj,proj,down_proj,k_proj,up_proj Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 4/4 [00:04<00:00, 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster trainiference. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.misc - Found linear modules: v_proj,gate_proj,_proj,q_proj,down_proj,o_proj Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 4/4 [00:05<00:00, 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster trainiference. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.misc - Found linear modules: v_proj,q_proj,gatproj,up_proj,k_proj,down_proj Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 4/4 [00:05<00:00, 09/25/2024 14:44:18 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635|| trainable%: 0.2643 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster trainiference. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.misc - Found linear modules: v_proj,q_proj,gatwn_proj,o_proj,up_proj,k_proj Detected kernel version 5.4.119, which is below the recommended minimum of 5.5.0; this can cause the proceg. It is recommended to upgrade the kernel to the minimum version or higher. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 4/4 [00:05<00:00, Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 4/4 [00:05<00:00, 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster trainiference. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.misc - Found linear modules: k_proj,gate_proj,,o_proj,v_proj,q_proj,up_proj 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster trainiference. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 09/25/2024 14:44:18 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635|| trainable%: 0.2643 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.misc - Found linear modules: gate_proj,k_proj,proj,q_proj,up_proj,down_proj Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 4/4 [00:05<00:00, 09/25/2024 14:44:18 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635|| trainable%: 0.2643 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster trainiference. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 09/25/2024 14:44:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 09/25/2024 14:44:18 - INFO - llamafactory.model.model_utils.misc - Found linear modules: up_proj,q_proj,k__proj,gate_proj,v_proj,o_proj 09/25/2024 14:44:18 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635|| trainable%: 0.2643 [INFO|trainer.py:648] 2024-09-25 14:44:18,954 >> Using auto half precision backend 09/25/2024 14:44:18 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635|| trainable%: 0.2643 09/25/2024 14:44:19 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635|| trainable%: 0.2643 09/25/2024 14:44:19 - INFO - llamafactory.model.loader - trainable params: 20,185,088 || all params: 7,635|| trainable%: 0.2643 [INFO|trainer.py:2134] 2024-09-25 14:44:19,654 >> Running training [INFO|trainer.py:2135] 2024-09-25 14:44:19,654 >> Num examples = 2,360 [INFO|trainer.py:2136] 2024-09-25 14:44:19,654 >> Num Epochs = 5 [INFO|trainer.py:2137] 2024-09-25 14:44:19,654 >> Instantaneous batch size per device = 2 [INFO|trainer.py:2140] 2024-09-25 14:44:19,654 >> Total train batch size (w. parallel, distributed & acc) = 128 [INFO|trainer.py:2141] 2024-09-25 14:44:19,654 >> Gradient Accumulation steps = 8 [INFO|trainer.py:2142] 2024-09-25 14:44:19,654 >> Total optimization steps = 90 [INFO|trainer.py:2143] 2024-09-25 14:44:19,657 >> Number of trainable parameters = 20,185,088 0%| | 0/90 [00:00</root/miniforge3/envs/newlf/lib/python3.11/site-packages/torch/utils/checkpoint.py:295: FutureWarning: tomp.autocast(args...) is deprecated. Please use torch.amp.autocast('cpu', args...) instead. with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(ctx.cpu_autocast_kwargs): # typ[attr-defined] /root/miniforge3/envs/newlf/lib/python3.11/site-packages/torch/utils/checkpoint.py:295: FutureWarning: tomp.autocast(args...) is deprecated. Please use torch.amp.autocast('cpu', args...) instead. with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(ctx.cpu_autocast_kwargs): # typ[attr-defined] /root/miniforge3/envs/newlf/lib/python3.11/site-packages/torch/utils/checkpoint.py:295: FutureWarning: tomp.autocast(args...) is deprecated. Please use torch.amp.autocast('cpu', args...) instead. with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(ctx.cpu_autocast_kwargs): # typ[attr-defined] /root/miniforge3/envs/newlf/lib/python3.11/site-packages/torch/utils/checkpoint.py:295: FutureWarning: tomp.autocast(args...) is deprecated. Please use torch.amp.autocast('cpu', args...) instead. with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(ctx.cpu_autocast_kwargs): # typ[attr-defined] /root/miniforge3/envs/newlf/lib/python3.11/site-packages/torch/utils/checkpoint.py:295: FutureWarning: tomp.autocast(args...) is deprecated. Please use torch.amp.autocast('cpu', args...) instead. with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(ctx.cpu_autocast_kwargs): # typ[attr-defined] /root/miniforge3/envs/newlf/lib/python3.11/site-packages/torch/utils/checkpoint.py:295: FutureWarning: tomp.autocast(args...) is deprecated. Please use torch.amp.autocast('cpu', args...) instead. with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(ctx.cpu_autocast_kwargs): # typ[attr-defined] /root/miniforge3/envs/newlf/lib/python3.11/site-packages/torch/utils/checkpoint.py:295: FutureWarning: tomp.autocast(args...) is deprecated. Please use torch.amp.autocast('cpu', args...) instead. with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(ctx.cpu_autocast_kwargs): # typ[attr-defined] /root/miniforge3/envs/newlf/lib/python3.11/site-packages/torch/utils/checkpoint.py:295: FutureWarning: tomp.autocast(args...) is deprecated. Please use torch.amp.autocast('cpu', args...) instead. with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(ctx.cpu_autocast_kwargs): # typ[attr-defined] 1%|▉ | 1/90

Expected behavior

step进度一直卡在1/90

Others

求大佬帮忙看一下什么原因

hiyouga commented 1 week ago

torch 安装成了 CPU 版本

lin-dy commented 1 week ago