yuanzhoulvpi2017 / zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)
MIT License
2.81k stars 351 forks source link

训练出错 #154

Open loki1017 opened 1 year ago

loki1017 commented 1 year ago

参数详情: --do_train --do_eval --train_file /Work/.../train.json --validation_file /Work/.../valid.json --preprocessing_num_workers 10 --prompt_column instruction --query_column input --response_column output --overwrite_cache --model_name_or_path /Work/pre_model/chatglm2-6b --output_dir output/IM-04 --overwrite_output_dir --max_source_length 256 --max_target_length 256 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 1 --predict_with_generate --num_train_epochs 1 --logging_strategy steps --logging_steps 10 --eval_steps 50 --evaluation_strategy steps --save_steps 1000 --save_strategy steps --learning_rate 2e-5 --lora_r 8 --model_parallel_mode True --warmup_ratio 0.05 --weight_decay 0.05 --max_train_samples 500000 --max_eval_samples 10000

我的训练集大约90W左右,测试集大约10W左右,我分别设置了不同的样本量,在设置--max_train_samples 200000 --max_eval_samples 2000 的时候模型可以正常训练,但是当--max_train_samples 500000 --max_eval_samples 10000的时候,却出现了下面的错误: Traceback (most recent call last): File "/Work/zhanglongji7036/chatglm_sft/ptuning/main.py", line 566, in main() File "/Work/zhanglongji7036/chatglm_sft/ptuning/main.py", line 492, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/zhanglongji7036/anaconda3/envs/loki/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/home/zhanglongji7036/anaconda3/envs/loki/lib/python3.10/site-packages/transformers/trainer.py", line 2019, in _inner_training_loop self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval) File "/home/zhanglongji7036/anaconda3/envs/loki/lib/python3.10/site-packages/transformers/trainer.py", line 2300, in _maybe_log_save_evaluate metrics = self.evaluate(ignore_keys=ignore_keys_for_eval) File "/Work/zhanglongji7036/chatglm_sft/ptuning/trainer_seq2seq.py", line 78, in evaluate return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix) File "/home/zhanglongji7036/anaconda3/envs/loki/lib/python3.10/site-packages/transformers/trainer.py", line 3029, in evaluate output = eval_loop( File "/home/zhanglongji7036/anaconda3/envs/loki/lib/python3.10/site-packages/transformers/trainer.py", line 3318, in evaluation_loop metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels)) File "/Work/zhanglongji7036/chatglm_sft/ptuning/main.py", line 444, in compute_metrics scores = rouge.get_scores( File "/home/zhanglongji7036/anaconda3/envs/loki/lib/python3.10/site-packages/rouge_chinese/rouge.py", line 116, in get_scores return self._get_scores(hyps, refs) File "/home/zhanglongji7036/anaconda3/envs/loki/lib/python3.10/site-packages/rouge_chinese/rouge.py", line 129, in _get_scores sc = fn( File "/home/zhanglongji7036/anaconda3/envs/loki/lib/python3.10/site-packages/rouge_chinese/rouge.py", line 54, in "rouge-1": lambda hyp, ref, k: rouge_score.rouge_n(hyp, ref, 1, k), File "/home/zhanglongji7036/anaconda3/envs/loki/lib/python3.10/site-packages/rouge_chinese/rouge_score.py", line 253, in rouge_n raise ValueError("Hypothesis is empty.") ValueError: Hypothesis is empty. 0%| | 50/125000 [06:11<258:02:21, 7.43s/it]

Process finished with exit code 1

同时,我打印了模型的输出内容,发现完全没有生成任何内容,对此我非常困惑,不知道是否有人遇到同样的问题: hypothesis: [] reference: ['约翰', '是', '一名', '软件', '工程师', ',', '居住', '在', '旧金山', '。', '他', '在', '一家', '科技', '公司', '工作', '了', '5', '年', ',', '业余时间', '喜欢', '打网球', '。']


hypothesis: ['《'] reference: ['《', '时间', '旅行者', '的', '妻子', '》', '是', '我', '的', '最', '爱', '之一', ',', '因为', '它', '不仅仅', '是', '一部', '科幻小说', ',', '而是', '一部', '强烈', '的', '情感故事', '。', '它', '讲述', '了', '一位', '时间', '旅行者', '和', '他', '的', '妻子', '之间', '的', '爱情', ',', '跨越', '时空', '交织', '在', '一起', '。', '小说', '的', '结构', '非常', '独特', ',', '以', '非线性', '的', '方式', '呈现', '了', '两位', '主人公', '的', '故事', ',', '让', '读者', '能够', '体会', '到', '他们', '内心', '的', '情感', '和', '思绪', '。', '此外', ',', '作者', 'Audrey', ' ', 'Niffenegger', '深入', '探索', '了', '人类', '的', '情感', '、', '时间', '、', '家庭', '以及', '命运', '等', '主题', ',', '使', '小说', '变得', '更加', '深刻', '和', '有', '意义', '。']


hypothesis: [] reference: ['自然风光', '如此', '迷人', ',', '冬日', '里', '的', '雪花', '在', '阳光', '下', '闪耀', ',', '春天里', '的', '草木', '吐露', '新芽', ',', '秋季', '的', '枫叶', '像', '火焰', '一般', '绚烂', '多彩', ',', '让', '人', '不由得', '沉醉', '其中', '。']


hypothesis: [] reference: ['日本', '发生', '大规模', '地震', ',', '紧急', '启动', '预防措施']