shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Apache License 2.0
3.24k stars 492 forks source link

1.6.0版本中的flash_attn参数指定之后运行报错”Some specified arguments are not used by the HfArgumentParser: {remaining_args}“ #246

Closed CHAOJICHENG5 closed 11 months ago

CHAOJICHENG5 commented 11 months ago

我看readme说明中1.6.0版本已经支持llama的flash_attention2加速技术,但我加了flash_attn参数之后运行提示报错,报错详情如下所示(运行条件:A100单卡,基础模型:llama-2-7b): raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}") ValueError: Some specified arguments are not used by the HfArgumentParser: ['--flash_attn']

我运行的run_pt.sh脚本具体如下: CUDA_VISIBLE_DEVICES=1 torchrun --nproc_per_node 1 --master_port 12345 pretraining_origin.py \ --model_type llama \ --model_name_or_path /public/home/lfchen/llama/Chinese-LLaMA-Alpaca/llama_models/llama-2-7b-hf \ --train_file_dir /public/home/lfchen/llama/LLaMA-Factory-main/Mix_up/Domain_10M/1-1 \ --validation_file_dir public/home/lfchen/llama/LLaMA-Factory-main/Mix_up/Domain_10M/1-4 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --do_train \ --do_eval \ --use_peft True \ --seed 42 \ --fp16 \ --max_train_samples 10 \ --max_eval_samples 10 \ --num_train_epochs 1 \ --learning_rate 2e-4 \ --warmup_ratio 0.05 \ --weight_decay 0.01 \ --logging_strategy steps \ --logging_steps 10 \ --eval_steps 50 \ --evaluation_strategy steps \ --save_steps 500 \ --save_strategy steps \ --save_total_limit 3 \ --gradient_accumulation_steps 1 \ --preprocessing_num_workers 1 \ --block_size 1024 \ --output_dir /public/home/lfchen/llama/MedicalGPT-main/1.6.0-update/MedicalGPT-1.6.0-update/llama_output \ --overwrite_output_dir \ --ddp_timeout 30000 \ --logging_first_step True \ --target_modules all \ --lora_rank 8 \ --lora_alpha 16 \ --lora_dropout 0.05 \ --torch_dtype float16 \ --device_map auto \ --report_to tensorboard \ --ddp_find_unused_parameters False \ --gradient_checkpointing True 脚本应该是没错的。

然后我验证了不加”--flash_attn“这个参数,但还是报错,报错如下: “trainable params: 19,988,480 || all params: 6,758,404,096 || trainable%: 0.2957573965106688 Traceback (most recent call last): File "pretraining.py", line 739, in main() File "pretraining.py", line 682, in main trainer = SavePeftModelTrainer( File "/public/home/lfchen/anaconda3/envs/llama/lib/python3.8/site-packages/transformers/trainer.py", line 347, in init self.create_accelerator_and_postprocess() File "/public/home/lfchen/anaconda3/envs/llama/lib/python3.8/site-packages/transformers/trainer.py", line 3983, in create_accelerator_and_postprocess deepspeed_plugin=self.args.deepspeed_plugin, AttributeError: 'PeftArguments' object has no attribute 'deepspeed_plugin' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 90413) of binary”

查了一下可能的原因如下所示,执行脚本是将上面的运行脚本中的flash_attn参数删除,其他不变: “代码中的PeftArguments对象中没有名为deepspeed_plugin的属性”。想请教一下这些错误的原因是什么,不胜感激!

shibing624 commented 11 months ago

更新代码

CHAOJICHENG5 commented 11 months ago

我通过下载最新的project覆盖之后,还是出现无--flash_atten参数的报错,但后面那个“代码中的PeftArguments对象中没有名为deepspeed_plugin的属性”问题倒是解决了

shibing624 commented 11 months ago

我好像没写清楚,是只在SFT中支持--flash_attn,PT阶段不加这个flash_attn。