【NPU】GLM-4-9B-Chat PPO 出错

hiyouga / LLaMA-Factory

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)

Apache License 2.0

31.93k stars 3.92k forks source link

llamafactory-cli train \ --stage ppo \ --do_train True \ --model_name_or_path ZhipuAI/glm-4-9b-chat \ --preprocessing_num_workers 16 \ --finetuning_type lora \ --template glm4 \ --flash_attn auto \ --dataset_dir data \ --dataset disc-law-sft-triplet \ --cutoff_len 8192 \ --learning_rate 5e-05 \ --num_train_epochs 3.0 \ --max_samples 100000 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 8 \ --lr_scheduler_type cosine \ --max_grad_norm 1.0 \ --logging_steps 5 \ --save_steps 100 \ --warmup_steps 0 \ --optim adamw_torch \ --packing False \ --report_to none \ --output_dir saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-44-37 \ --bf16 True \ --plot_loss True \ --ddp_timeout 180000000 \ --include_num_input_tokens_seen True \ --adapter_name_or_path saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 \ --lora_rank 8 \ --lora_alpha 16 \ --lora_dropout 0 \ --lora_target all \ --reward_model saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06 \ --reward_model_type lora \ --ppo_score_norm True \ --top_k 0 \ --top_p 0.9 ### Expected behavior _No response_ ### Others [2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] [2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] ***************************************** [2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] ***************************************** [2024-06-07 10:11:03,623] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect) [2024-06-07 10:11:03,661] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect) [2024-06-07 10:11:03,705] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect) [2024-06-07 10:11:03,818] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect) [2024-06-07 10:11:03,836] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect) [2024-06-07 10:11:03,905] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect) [2024-06-07 10:11:03,955] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect) [2024-06-07 10:11:03,991] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect) 06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. 06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 0, device: npu:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 2024-06-07 10:11:17,434 - modelscope - INFO - PyTorch version 2.2.0 Found. 2024-06-07 10:11:17,436 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2024-06-07 10:11:17,490 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 ceb78a2ac746b5506819a47dbbf0e37c and a total number of 976 components indexed 06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. 06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 7, device: npu:7, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. 06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 4, device: npu:4, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. 06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 6, device: npu:6, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. 06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 2, device: npu:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. 06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 1, device: npu:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 06/07/2024 10:11:18 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. 06/07/2024 10:11:18 - INFO - llamafactory.hparams.parser - Process rank: 5, device: npu:5, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 [INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,235 >> loading file tokenizer.model [INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,235 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,236 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,236 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,236 >> loading file tokenizer.json 06/07/2024 10:11:18 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. 06/07/2024 10:11:18 - INFO - llamafactory.hparams.parser - Process rank: 3, device: npu:3, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 [WARNING|logging.py:314] 2024-06-07 10:11:19,288 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/07/2024 10:11:19 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words. 06/07/2024 10:11:19 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json... Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/07/2024 10:11:19 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/07/2024 10:11:19 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/07/2024 10:11:22 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words. 06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json... 06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json... 06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json... 06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json... 06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json... 06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json... 06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json... Running tokenizer on dataset (num_proc=16): 100%|█████████████████████████████████████████████████████████████████| 16000/16000 [00:38<00:00, 416.91 examples/s] input_ids: [151331, 151333, 151336, 198, 100698, 103309, 101138, 3837, 113094, 110590, 105177, 99312, 8994, 98379, 106170, 117921, 3837, 98546, 20, 98334, 21, 98424, 99146, 98385, 99082, 117225, 3837, 108592, 98696, 105181, 103757, 117537, 98380, 99043, 100451, 102337, 103273, 106156, 118828, 98798, 105181, 101376, 98314, 117055, 98550, 109534, 3837, 98459, 101247, 105079, 98634, 123900, 98324, 117537, 98595, 101676, 111602, 99916, 98760, 101642, 98335, 3837, 108592, 98696, 105181, 98453, 105529, 109290, 98396, 98381, 103941, 98798, 105181, 99195, 118894, 3837, 103078, 98711, 109534, 105079, 98322, 107801, 98993, 114731, 100129, 101242, 3837, 98547, 110664, 99999, 105181, 109487, 98365, 3837, 108592, 98696, 105181, 98701, 107801, 98993, 114731, 103941, 98798, 105181, 98314, 99527, 113995, 3837, 99704, 124187, 116767, 101806, 98583, 109695, 98829, 110960, 99416, 121952, 109055, 112246, 117442, 101242, 3837, 117442, 101242, 100048, 98875, 121424, 99054, 99893, 98649, 105862, 98433, 112998, 99108, 120250, 106318, 100035, 1773, 98365, 98379, 118828, 98798, 105181, 105420, 3837, 101113, 99131, 100588, 98634, 100059, 98493, 108592, 98696, 105181, 98607, 103278, 98344, 98817, 1773, 98379, 103171, 3837, 109534, 108634, 99532, 102492, 20, 11, 124206, 13, 24, 98575, 3837, 109055, 108634, 99532, 102492, 16, 11, 19, 101474, 13, 102486, 98575, 3837, 117442, 101242, 108634, 99532, 102492, 17, 11, 24, 99951, 13, 99082, 98575, 3837, 99054, 99893, 98649, 106508, 99108, 120250, 108634, 99532, 102492, 24, 11, 102114, 21, 98575, 3837, 111086, 101832, 99532, 106234, 102492, 98729, 11, 101135, 17, 13, 21, 98575, 1773, 101409, 100867, 3837, 108592, 98696, 105181, 98319, 119626, 98322, 100297, 98479, 110416, 3837, 118828, 98798, 105181, 5373, 100547, 105181, 5373, 104464, 105181, 110065, 3837, 110664, 99999, 105181, 98314, 98697, 98856, 3837, 100059, 111413, 99565, 98990, 3837, 116550, 99304, 3837, 103171, 102622, 98560, 3837, 108592, 98696, 105181, 98314, 127251, 98381, 102070, 98539, 98404, 102243, 105483, 3837, 106144, 102919, 1773, 151337] inputs: [gMASK] <sop> <|user|> 基于下列案件，推测可能的判决结果。经审理查明，2015年6月21日15时许，被告人白某某在大东区小河沿公交车站乘坐被害人张某某驾驶的133路公交车，当车辆行驶至沈阳市大东区东陵西路26号附近时，被告人白某某因未能下车而与司机张某某发生争执，并在该公交车行驶中用手拉拽档杆，被证人韩某某拉开后，被告人白某某又用手拉拽司机张某某的右胳膊，导致该车失控撞向右侧马路边停放的轿车和一个路灯杆，路灯杆折断后将福锅记炖品店的牌匾砸坏。后经被害人张某某报警，公安人员赶至现场将被告人白某某传唤到案。经鉴定，公交车受损价值人民币5,189.9元，轿车受损价值人民币1,449.57元，路灯杆受损价值人民币2,927.15元，福锅记饭店牌匾受损价值人民币9,776元，本案损失价值共计人民币19,342.6元。上述事实，被告人白某某在庭审中亦无异议，被害人张某某、朱某某、詹某某陈述，证人韩某某的证言，现场勘察笔录，视听资料，鉴定结论书，被告人白某某的供述与辩解等证据证实，足以认定。 <|assistant|> [INFO|configuration_utils.py:731] 2024-06-07 10:12:08,107 >> loading configuration file /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json [INFO|configuration_utils.py:731] 2024-06-07 10:12:08,110 >> loading configuration file /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json [INFO|configuration_utils.py:796] 2024-06-07 10:12:08,111 >> Model config ChatGLMConfig { "_name_or_path": "/root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat", "add_bias_linear": false, "add_qkv_bias": true, "apply_query_key_layer_scaling": true, "apply_residual_connection_post_layernorm": false, "architectures": [ "ChatGLMModel" ], "attention_dropout": 0.0, "attention_softmax_in_fp32": true, "auto_map": { "AutoConfig": "configuration_chatglm.ChatGLMConfig", "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification" }, "bias_dropout_fusion": true, "classifier_dropout": null, "eos_token_id": [ 151329, 151336, 151338 ], "ffn_hidden_size": 13696, "fp32_residual_connection": false, "hidden_dropout": 0.0, "hidden_size": 4096, "kv_channels": 128, "layernorm_epsilon": 1.5625e-07, "model_type": "chatglm", "multi_query_attention": true, "multi_query_group_num": 2, "num_attention_heads": 32, "num_hidden_layers": 40, "num_layers": 40, "original_rope": true, "pad_token_id": 151329, "padded_vocab_size": 151552, "post_layer_norm": true, "rmsnorm": true, "rope_ratio": 500, "seq_length": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 151552 } [INFO|modeling_utils.py:3471] 2024-06-07 10:12:08,159 >> loading weights file /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-06-07 10:12:08,160 >> Instantiating ChatGLMForConditionalGeneration model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-06-07 10:12:08,162 >> Generate config GenerationConfig { "eos_token_id": [ 151329, 151336, 151338 ], "pad_token_id": 151329 } Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00, 1.45it/s] [INFO|modeling_utils.py:4280] 2024-06-07 10:12:15,224 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration. [INFO|modeling_utils.py:4288] 2024-06-07 10:12:15,224 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training. [INFO|modeling_utils.py:3797] 2024-06-07 10:12:15,231 >> Generation config file not found, using a generation config created from the model config. 06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation. 06/07/2024 10:12:15 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/07/2024 10:12:15 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA Loading checkpoint shards: 60%|██████████████████████████████████████████████████████████▏ | 6/10 [00:04<00:02, 1.35it/s]06/07/2024 10:12:15 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files.. 06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model. 06/07/2024 10:12:15 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248 Loading checkpoint shards: 70%|███████████████████████████████████████████████████████████████████▉ | 7/10 [00:05<00:02, 1.39it/s]06/07/2024 10:12:16 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06 Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00, 1.51it/s] 06/07/2024 10:12:17 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/07/2024 10:12:17 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation. 06/07/2024 10:12:17 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/07/2024 10:12:17 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00, 1.42it/s] 06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation. 06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00, 1.36it/s] 06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation. 06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00, 1.35it/s] Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00, 1.34it/s] 06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation. 06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation. 06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA Loading checkpoint shards: 90%|███████████████████████████████████████████████████████████████████████████████████████▎ | 9/10 [00:07<00:00, 1.19it/s]06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files.. 06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model. 06/07/2024 10:12:18 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248 06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files.. 06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model. 06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248 Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00, 1.19it/s] 06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation. 06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files.. 06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model. 06/07/2024 10:12:19 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06 06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248 06/07/2024 10:12:19 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06 06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files.. 06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model. 06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files.. 06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model. 06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248 06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248 06/07/2024 10:12:20 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06 06/07/2024 10:12:20 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06 Loading checkpoint shards: 90%|███████████████████████████████████████████████████████████████████████████████████████▎ | 9/10 [00:09<00:01, 1.02s/it]06/07/2024 10:12:20 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 06/07/2024 10:12:20 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06 06/07/2024 10:12:20 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files.. 06/07/2024 10:12:20 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model. 06/07/2024 10:12:20 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248 06/07/2024 10:12:21 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06 Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:10<00:00, 1.05s/it] 06/07/2024 10:12:21 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/07/2024 10:12:21 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation. 06/07/2024 10:12:21 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/07/2024 10:12:21 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 06/07/2024 10:12:22 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 06/07/2024 10:12:22 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files.. 06/07/2024 10:12:22 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model. 06/07/2024 10:12:22 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248 06/07/2024 10:12:23 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06 06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer - ***** Running training ***** 06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer - Num examples = 16000 06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer - Num Epochs = 3.0 06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer - Instantaneous batch size per device = 1 06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer - Total train batch size (w. parallel, buffer, distributed & accumulation) = 64 06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer - Gradient Accumulation steps = 8 06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer - Num optimization epochs per batch = 4 06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer - Total training steps = 750 06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer - Number of trainable parameters = 21180417 0%| | 0/750 [00:00<?, ?it/s]/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.) scores_processed = torch.where(scores != scores, 0.0, scores) /data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.) scores_processed = torch.where(scores != scores, 0.0, scores) /data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.) scores_processed = torch.where(scores != scores, 0.0, scores) /data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.) scores_processed = torch.where(scores != scores, 0.0, scores) /data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.) scores_processed = torch.where(scores != scores, 0.0, scores) /data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.) scores_processed = torch.where(scores != scores, 0.0, scores) /data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.) scores_processed = torch.where(scores != scores, 0.0, scores) /data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.) scores_processed = torch.where(scores != scores, 0.0, scores) [rank1]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) [rank0]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) [rank7]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) [rank2]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) [rank6]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) [rank3]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) [rank4]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) [rank5]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) 0%| | 0/750 [00:14<?, ?it/s] Traceback (most recent call last): File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module> launch() File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch run_exp() File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses) File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards rewards.append(values[i, end_index].float().detach().cpu()) # use fp32 type IndexError: index 213 is out of bounds for dimension 1 with size 1 Traceback (most recent call last): File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module> launch() File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch run_exp() File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses) File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards rewards.append(values[i, end_index].float().detach().cpu()) # use fp32 type IndexError: index 67 is out of bounds for dimension 1 with size 1 Traceback (most recent call last): File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module> launch() File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch run_exp() File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses) File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards rewards.append(values[i, end_index].float().detach().cpu()) # use fp32 type IndexError: index 379 is out of bounds for dimension 1 with size 1 Traceback (most recent call last): File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module> launch() File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch run_exp() File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo Traceback (most recent call last): File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module> ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses) File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context launch() File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch return func(*args, **kwargs) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards run_exp() File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo rewards.append(values[i, end_index].float().detach().cpu()) # use fp32 type IndexError: index 390 is out of bounds for dimension 1 with size 1 ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses) File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards rewards.append(values[i, end_index].float().detach().cpu()) # use fp32 type IndexError: index 408 is out of bounds for dimension 1 with size 1 Traceback (most recent call last): File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module> launch() File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch run_exp() File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses) File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards rewards.append(values[i, end_index].float().detach().cpu()) # use fp32 type IndexError: index 499 is out of bounds for dimension 1 with size 1 Traceback (most recent call last): File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module> launch() File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch run_exp() File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses) File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards rewards.append(values[i, end_index].float().detach().cpu()) # use fp32 type IndexError: index 501 is out of bounds for dimension 1 with size 1 Traceback (most recent call last): File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module> launch() File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch run_exp() File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses) File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards rewards.append(values[i, end_index].float().detach().cpu()) # use fp32 type IndexError: index 488 is out of bounds for dimension 1 with size 1 [2024-06-07 10:12:46,085] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227860 closing signal SIGTERM [2024-06-07 10:12:46,085] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227861 closing signal SIGTERM [2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227862 closing signal SIGTERM [2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227863 closing signal SIGTERM [2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227864 closing signal SIGTERM [2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227865 closing signal SIGTERM [2024-06-07 10:12:46,451] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2227858) of binary: /data/anaconda3/envs/llama_factory/bin/python Traceback (most recent call last): File "/data/anaconda3/envs/llama_factory/bin/torchrun", line 8, in <module> sys.exit(main()) File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper return f(*args, **kwargs) File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main run(args) File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run elastic_launch( File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /data/LLaMA-Factory/src/llamafactory/launcher.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2024-06-07_10:12:46 host : localhost.localdomain rank : 1 (local_rank: 1) exitcode : 1 (pid: 2227859) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-06-07_10:12:46 host : localhost.localdomain rank : 0 (local_rank: 0) exitcode : 1 (pid: 2227858) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================

Reminder

[x] I have read the README and searched the existing issues.

System Info

[2024-06-07 10:17:14,980] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)

llamafactory version: 0.7.2.dev0
Platform: Linux-5.10.0-198.0.0.111.oe2203sp3.aarch64-aarch64-with-glibc2.34
Python version: 3.10.14
PyTorch version: 2.2.0 (NPU)
Transformers version: 4.41.2
Datasets version: 2.19.2
Accelerate version: 0.30.1
PEFT version: 0.11.1
TRL version: 0.9.3
NPU type: Ascend910B2
CANN version: 8.0.RC2.alpha001
DeepSpeed version: 0.13.2

Reproduction


llamafactory-cli train \
    --stage ppo \
    --do_train True \
    --model_name_or_path ZhipuAI/glm-4-9b-chat \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template glm4 \
    --flash_attn auto \
    --dataset_dir data \
    --dataset disc-law-sft-triplet \
    --cutoff_len 8192 \
    --learning_rate 5e-05 \
    --num_train_epochs 3.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --optim adamw_torch \
    --packing False \
    --report_to none \
    --output_dir saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-44-37 \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --adapter_name_or_path saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0 \
    --lora_target all \
    --reward_model saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06 \
    --reward_model_type lora \
    --ppo_score_norm True \
    --top_k 0 \
    --top_p 0.9

### Expected behavior

_No response_

### Others

[2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] 
[2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] *****************************************
[2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
[2024-06-07 10:10:55,970] torch.distributed.run: [WARNING] *****************************************
[2024-06-07 10:11:03,623] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,661] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,705] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,818] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,836] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,905] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,955] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
[2024-06-07 10:11:03,991] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 0, device: npu:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
2024-06-07 10:11:17,434 - modelscope - INFO - PyTorch version 2.2.0 Found.
2024-06-07 10:11:17,436 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-06-07 10:11:17,490 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 ceb78a2ac746b5506819a47dbbf0e37c and a total number of 976 components indexed
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 7, device: npu:7, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 4, device: npu:4, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 6, device: npu:6, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 2, device: npu:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:17 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:17 - INFO - llamafactory.hparams.parser - Process rank: 1, device: npu:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
06/07/2024 10:11:18 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:18 - INFO - llamafactory.hparams.parser - Process rank: 5, device: npu:5, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,235 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,235 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,236 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,236 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2106] 2024-06-07 10:11:18,236 >> loading file tokenizer.json
06/07/2024 10:11:18 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/07/2024 10:11:18 - INFO - llamafactory.hparams.parser - Process rank: 3, device: npu:3, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[WARNING|logging.py:314] 2024-06-07 10:11:19,288 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:19 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
06/07/2024 10:11:19 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:19 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:19 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:20 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/07/2024 10:11:22 - INFO - llamafactory.data.template - Add <|user|>,<|observation|> to stop words.
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
06/07/2024 10:11:26 - INFO - llamafactory.data.loader - Loading dataset disc-law-sft-triplet.json...
Running tokenizer on dataset (num_proc=16): 100%|█████████████████████████████████████████████████████████████████| 16000/16000 [00:38<00:00, 416.91 examples/s]
input_ids:
[151331, 151333, 151336, 198, 100698, 103309, 101138, 3837, 113094, 110590, 105177, 99312, 8994, 98379, 106170, 117921, 3837, 98546, 20, 98334, 21, 98424, 99146, 98385, 99082, 117225, 3837, 108592, 98696, 105181, 103757, 117537, 98380, 99043, 100451, 102337, 103273, 106156, 118828, 98798, 105181, 101376, 98314, 117055, 98550, 109534, 3837, 98459, 101247, 105079, 98634, 123900, 98324, 117537, 98595, 101676, 111602, 99916, 98760, 101642, 98335, 3837, 108592, 98696, 105181, 98453, 105529, 109290, 98396, 98381, 103941, 98798, 105181, 99195, 118894, 3837, 103078, 98711, 109534, 105079, 98322, 107801, 98993, 114731, 100129, 101242, 3837, 98547, 110664, 99999, 105181, 109487, 98365, 3837, 108592, 98696, 105181, 98701, 107801, 98993, 114731, 103941, 98798, 105181, 98314, 99527, 113995, 3837, 99704, 124187, 116767, 101806, 98583, 109695, 98829, 110960, 99416, 121952, 109055, 112246, 117442, 101242, 3837, 117442, 101242, 100048, 98875, 121424, 99054, 99893, 98649, 105862, 98433, 112998, 99108, 120250, 106318, 100035, 1773, 98365, 98379, 118828, 98798, 105181, 105420, 3837, 101113, 99131, 100588, 98634, 100059, 98493, 108592, 98696, 105181, 98607, 103278, 98344, 98817, 1773, 98379, 103171, 3837, 109534, 108634, 99532, 102492, 20, 11, 124206, 13, 24, 98575, 3837, 109055, 108634, 99532, 102492, 16, 11, 19, 101474, 13, 102486, 98575, 3837, 117442, 101242, 108634, 99532, 102492, 17, 11, 24, 99951, 13, 99082, 98575, 3837, 99054, 99893, 98649, 106508, 99108, 120250, 108634, 99532, 102492, 24, 11, 102114, 21, 98575, 3837, 111086, 101832, 99532, 106234, 102492, 98729, 11, 101135, 17, 13, 21, 98575, 1773, 101409, 100867, 3837, 108592, 98696, 105181, 98319, 119626, 98322, 100297, 98479, 110416, 3837, 118828, 98798, 105181, 5373, 100547, 105181, 5373, 104464, 105181, 110065, 3837, 110664, 99999, 105181, 98314, 98697, 98856, 3837, 100059, 111413, 99565, 98990, 3837, 116550, 99304, 3837, 103171, 102622, 98560, 3837, 108592, 98696, 105181, 98314, 127251, 98381, 102070, 98539, 98404, 102243, 105483, 3837, 106144, 102919, 1773, 151337]
inputs:
[gMASK] <sop> <|user|> 
基于下列案件，推测可能的判决结果。
经审理查明，2015年6月21日15时许，被告人白某某在大东区小河沿公交车站乘坐被害人张某某驾驶的133路公交车，当车辆行驶至沈阳市大东区东陵西路26号附近时，被告人白某某因未能下车而与司机张某某发生争执，并在该公交车行驶中用手拉拽档杆，被证人韩某某拉开后，被告人白某某又用手拉拽司机张某某的右胳膊，导致该车失控撞向右侧马路边停放的轿车和一个路灯杆，路灯杆折断后将福锅记炖品店的牌匾砸坏。后经被害人张某某报警，公安人员赶至现场将被告人白某某传唤到案。经鉴定，公交车受损价值人民币5,189.9元，轿车受损价值人民币1,449.57元，路灯杆受损价值人民币2,927.15元，福锅记饭店牌匾受损价值人民币9,776元，本案损失价值共计人民币19,342.6元。上述事实，被告人白某某在庭审中亦无异议，被害人张某某、朱某某、詹某某陈述，证人韩某某的证言，现场勘察笔录，视听资料，鉴定结论书，被告人白某某的供述与辩解等证据证实，足以认定。 <|assistant|>
[INFO|configuration_utils.py:731] 2024-06-07 10:12:08,107 >> loading configuration file /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:731] 2024-06-07 10:12:08,110 >> loading configuration file /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/config.json
[INFO|configuration_utils.py:796] 2024-06-07 10:12:08,111 >> Model config ChatGLMConfig {
  "_name_or_path": "/root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat",
  "add_bias_linear": false,
  "add_qkv_bias": true,
  "apply_query_key_layer_scaling": true,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "ChatGLMModel"
  ],
  "attention_dropout": 0.0,
  "attention_softmax_in_fp32": true,
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
  },
  "bias_dropout_fusion": true,
  "classifier_dropout": null,
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "ffn_hidden_size": 13696,
  "fp32_residual_connection": false,
  "hidden_dropout": 0.0,
  "hidden_size": 4096,
  "kv_channels": 128,
  "layernorm_epsilon": 1.5625e-07,
  "model_type": "chatglm",
  "multi_query_attention": true,
  "multi_query_group_num": 2,
  "num_attention_heads": 32,
  "num_hidden_layers": 40,
  "num_layers": 40,
  "original_rope": true,
  "pad_token_id": 151329,
  "padded_vocab_size": 151552,
  "post_layer_norm": true,
  "rmsnorm": true,
  "rope_ratio": 500,
  "seq_length": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "vocab_size": 151552
}

[INFO|modeling_utils.py:3471] 2024-06-07 10:12:08,159 >> loading weights file /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat/model.safetensors.index.json
[INFO|modeling_utils.py:1519] 2024-06-07 10:12:08,160 >> Instantiating ChatGLMForConditionalGeneration model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:962] 2024-06-07 10:12:08,162 >> Generate config GenerationConfig {
  "eos_token_id": [
    151329,
    151336,
    151338
  ],
  "pad_token_id": 151329
}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00,  1.45it/s]
[INFO|modeling_utils.py:4280] 2024-06-07 10:12:15,224 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[INFO|modeling_utils.py:4288] 2024-06-07 10:12:15,224 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /root/.cache/modelscope/hub/ZhipuAI/glm-4-9b-chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
[INFO|modeling_utils.py:3797] 2024-06-07 10:12:15,231 >> Generation config file not found, using a generation config created from the model config.
06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:15 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:15 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards:  60%|██████████████████████████████████████████████████████████▏                                      | 6/10 [00:04<00:02,  1.35it/s]06/07/2024 10:12:15 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:15 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:15 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
Loading checkpoint shards:  70%|███████████████████████████████████████████████████████████████████▉                             | 7/10 [00:05<00:02,  1.39it/s]06/07/2024 10:12:16 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:06<00:00,  1.51it/s]
06/07/2024 10:12:17 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:17 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:17 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:17 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.42it/s]
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.36it/s]
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.35it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:07<00:00,  1.34it/s]
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
Loading checkpoint shards:  90%|███████████████████████████████████████████████████████████████████████████████████████▎         | 9/10 [00:07<00:00,  1.19it/s]06/07/2024 10:12:18 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:18 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:18 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00,  1.19it/s]
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:19 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:19 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:19 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:19 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:19 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:20 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:20 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
Loading checkpoint shards:  90%|███████████████████████████████████████████████████████████████████████████████████████▎         | 9/10 [00:09<00:01,  1.02s/it]06/07/2024 10:12:20 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:20 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:20 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:20 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:20 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:21 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:10<00:00,  1.05s/it]
06/07/2024 10:12:21 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/07/2024 10:12:21 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/07/2024 10:12:21 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
06/07/2024 10:12:21 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
06/07/2024 10:12:22 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03
06/07/2024 10:12:22 - INFO - llamafactory.model.model_utils.valuehead - Provided path (saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03) does not contain value head weights: saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03 does not appear to have a file named value_head.bin. Checkout 'https://huggingface.co/saves/GLM-4-9B-Chat/lora/train_2024-06-06-15-42-03/tree/None' for available files..
06/07/2024 10:12:22 - INFO - llamafactory.model.model_utils.valuehead - Ignore the above message if you are not resuming the training of a value head model.
06/07/2024 10:12:22 - INFO - llamafactory.model.loader - trainable params: 21180417 || all params: 9421131777 || trainable%: 0.2248
06/07/2024 10:12:23 - INFO - llamafactory.train.trainer_utils - Loaded adapter weights of reward model from saves/GLM-4-9B-Chat/lora/train_2024-06-07-09-37-06
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer - ***** Running training *****
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Num examples = 16000
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Num Epochs = 3.0
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Instantaneous batch size per device = 1
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Total train batch size (w. parallel, buffer, distributed & accumulation) = 64
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Gradient Accumulation steps = 8
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Num optimization epochs per batch = 4
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Total training steps = 750
06/07/2024 10:12:23 - INFO - llamafactory.train.ppo.trainer -   Number of trainable parameters = 21180417
  0%|                                                                                                                                   | 0/750 [00:00<?, ?it/s]/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.)
  scores_processed = torch.where(scores != scores, 0.0, scores)
[rank1]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank0]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank7]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank2]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank6]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank3]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank4]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[rank5]:[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
  0%|                                                                                                                                   | 0/750 [00:14<?, ?it/s]
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 213 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 67 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 379 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 390 is out of bounds for dimension 1 with size 1
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 408 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 499 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 501 is out of bounds for dimension 1 with size 1
Traceback (most recent call last):
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in <module>
    launch()
  File "/data/LLaMA-Factory/src/llamafactory/launcher.py", line 5, in launch
    run_exp()
  File "/data/LLaMA-Factory/src/llamafactory/train/tuner.py", line 37, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 59, in run_ppo
    ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 220, in ppo_train
    mini_batch_rewards = self.get_rewards(mini_batch_queries, mini_batch_responses)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 387, in get_rewards
    rewards.append(values[i, end_index].float().detach().cpu())  # use fp32 type
IndexError: index 488 is out of bounds for dimension 1 with size 1
[2024-06-07 10:12:46,085] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227860 closing signal SIGTERM
[2024-06-07 10:12:46,085] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227861 closing signal SIGTERM
[2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227862 closing signal SIGTERM
[2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227863 closing signal SIGTERM
[2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227864 closing signal SIGTERM
[2024-06-07 10:12:46,086] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2227865 closing signal SIGTERM
[2024-06-07 10:12:46,451] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2227858) of binary: /data/anaconda3/envs/llama_factory/bin/python
Traceback (most recent call last):
  File "/data/anaconda3/envs/llama_factory/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/data/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/data/LLaMA-Factory/src/llamafactory/launcher.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-06-07_10:12:46
  host      : localhost.localdomain
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 2227859)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-06-07_10:12:46
  host      : localhost.localdomain
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2227858)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
===========================================================
老哥，你好，昇腾的glm4微调训练成功了么，可不可以提供你的脚本

hiyouga / LLaMA-Factory