shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Apache License 2.0
3.34k stars 499 forks source link

在用 chatglm2-6b 构建奖励模型时候报错 ValueError: Found array with dim 3. None expected <= 2. #122

Closed diaojunxian closed 1 year ago

diaojunxian commented 1 year ago
  1. transformers 版本 4.31.0 最新版本;
  2. 已经修改的地方有
    
    --- a/reward_modeling.py
    +++ b/reward_modeling.py
    @@ -34,6 +34,7 @@ from transformers import (
     Trainer,
     TrainingArguments,
     set_seed,
    +    AutoModel
    )
    from transformers.trainer import TRAINING_ARGS_NAME

@@ -44,6 +45,7 @@ MODEL_CLASSES = { "bloom": (AutoConfig, BloomForSequenceClassification, BloomTokenizerFast), "llama": (AutoConfig, LlamaForSequenceClassification, LlamaTokenizer), "baichuan": (AutoConfig, LlamaForSequenceClassification, AutoTokenizer),

raceback (most recent call last):████████████████████████████████████████████████████████████████████████| 9/9 [00:03<00:00,  2.66it/s]
  File "/home/cloudadmin/test-sample/test_chatglm/MedicalGPT/reward_modeling.py", line 649, in <module>
    main()
  File "/home/cloudadmin/test-sample/test_chatglm/MedicalGPT/reward_modeling.py", line 621, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1901, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2226, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/home/cloudadmin/test-sample/test_chatglm/MedicalGPT/reward_modeling.py", line 264, in evaluate
    return super().evaluate(eval_dataset=eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
  File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2934, in evaluate
    output = eval_loop(
  File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 3222, in evaluation_loop
    metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
  File "/home/cloudadmin/test-sample/test_chatglm/MedicalGPT/reward_modeling.py", line 184, in compute_metrics
    mse = mean_squared_error(labels, preds)
  File "/opt/conda/lib/python3.10/site-packages/sklearn/metrics/_regression.py", line 442, in mean_squared_error
    y_type, y_true, y_pred, multioutput = _check_reg_targets(
  File "/opt/conda/lib/python3.10/site-packages/sklearn/metrics/_regression.py", line 101, in _check_reg_targets
    y_true = check_array(y_true, ensure_2d=False, dtype=dtype)
  File "/opt/conda/lib/python3.10/site-packages/sklearn/utils/validation.py", line 915, in check_array
    raise ValueError(
ValueError: Found array with dim 3. None expected <= 2.

执行的指令为,机器是 A100 40G:

python reward_modeling.py \
    --model_type chatglm \
    --model_name_or_path merged-sft \
    --train_file_dir ./data/reward \
    --validation_file_dir ./data/reward \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --do_train \
    --use_peft True \
    --seed 42 \
    --max_train_samples 1000 \
    --max_eval_samples 10 \
    --num_train_epochs 1 \
    --learning_rate 2e-5 \
    --warmup_ratio 0.05 \
    --weight_decay 0.001 \
    --logging_strategy steps \
    --logging_steps 10 \
    --eval_steps 50 \
    --evaluation_strategy steps \
    --save_steps 500 \
    --save_strategy steps \
    --save_total_limit 3 \
    --max_source_length 128 \
    --max_target_length 128 \
    --output_dir outputs-rm-v1 \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --target_modules all \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --torch_dtype float32 \
    --device_map auto \
    --report_to tensorboard \
    --ddp_find_unused_parameters False \
    --remove_unused_columns False \
    --gradient_checkpointing True
shibing624 commented 1 year ago

奖励模型本质是个分类模型,需要AutoModelForSequenceClassification ,chatglm不支持。

diaojunxian commented 1 year ago

模型

@shibing624 那是不是意味着 chatglm 没有办法走阶段4了?因为阶段3无法完成的话;

shibing624 commented 1 year ago