dpo_training训练chatglm3-6b模型报错。

Describe the bug

run_dpo.sh文件内容如下： CUDA_VISIBLE_DEVICES=0,1 python dpo_training.py \ --model_type chatglm \ --model_name_or_path models/chatglm3-6b \ --train_file_dir ./data/reward \ --validation_file_dir ./data/reward \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --do_train \ --do_eval \ --use_peft True \ --max_train_samples 1000 \ --max_eval_samples 10 \ --max_steps 100 \ --eval_steps 20 \ --save_steps 50 \ --max_source_length 128 \ --max_target_length 128 \ --output_dir outputs-dpo-bloom-v1 \ --target_modules all \ --lora_rank 8 \ --lora_alpha 16 \ --lora_dropout 0.05 \ --torch_dtype float16 \ --fp16 True \ --device_map auto \ --report_to tensorboard \ --remove_unused_columns False \ --gradient_checkpointing True \ --cache_dir ./cache

训练数据使用的默认的./data/reward中的数据。报错内容如下：

shibing624 / MedicalGPT

dpo_training训练chatglm3-6b模型报错。 #340

Describe the bug