我看readme说明中1.6.0版本已经支持llama的flash_attention2加速技术,但我加了flash_attn参数之后运行提示报错,报错详情如下所示(运行条件:A100单卡,基础模型:llama-2-7b):
raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--flash_attn']
然后我验证了不加”--flash_attn“这个参数,但还是报错,报错如下:
“trainable params: 19,988,480 || all params: 6,758,404,096 || trainable%: 0.2957573965106688
Traceback (most recent call last):
File "pretraining.py", line 739, in
main()
File "pretraining.py", line 682, in main
trainer = SavePeftModelTrainer(
File "/public/home/lfchen/anaconda3/envs/llama/lib/python3.8/site-packages/transformers/trainer.py", line 347, in init
self.create_accelerator_and_postprocess()
File "/public/home/lfchen/anaconda3/envs/llama/lib/python3.8/site-packages/transformers/trainer.py", line 3983, in create_accelerator_and_postprocess
deepspeed_plugin=self.args.deepspeed_plugin,
AttributeError: 'PeftArguments' object has no attribute 'deepspeed_plugin'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 90413) of binary”
我看readme说明中1.6.0版本已经支持llama的flash_attention2加速技术,但我加了flash_attn参数之后运行提示报错,报错详情如下所示(运行条件:A100单卡,基础模型:llama-2-7b): raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}") ValueError: Some specified arguments are not used by the HfArgumentParser: ['--flash_attn']
我运行的run_pt.sh脚本具体如下: CUDA_VISIBLE_DEVICES=1 torchrun --nproc_per_node 1 --master_port 12345 pretraining_origin.py \ --model_type llama \ --model_name_or_path /public/home/lfchen/llama/Chinese-LLaMA-Alpaca/llama_models/llama-2-7b-hf \ --train_file_dir /public/home/lfchen/llama/LLaMA-Factory-main/Mix_up/Domain_10M/1-1 \ --validation_file_dir public/home/lfchen/llama/LLaMA-Factory-main/Mix_up/Domain_10M/1-4 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --do_train \ --do_eval \ --use_peft True \ --seed 42 \ --fp16 \ --max_train_samples 10 \ --max_eval_samples 10 \ --num_train_epochs 1 \ --learning_rate 2e-4 \ --warmup_ratio 0.05 \ --weight_decay 0.01 \ --logging_strategy steps \ --logging_steps 10 \ --eval_steps 50 \ --evaluation_strategy steps \ --save_steps 500 \ --save_strategy steps \ --save_total_limit 3 \ --gradient_accumulation_steps 1 \ --preprocessing_num_workers 1 \ --block_size 1024 \ --output_dir /public/home/lfchen/llama/MedicalGPT-main/1.6.0-update/MedicalGPT-1.6.0-update/llama_output \ --overwrite_output_dir \ --ddp_timeout 30000 \ --logging_first_step True \ --target_modules all \ --lora_rank 8 \ --lora_alpha 16 \ --lora_dropout 0.05 \ --torch_dtype float16 \ --device_map auto \ --report_to tensorboard \ --ddp_find_unused_parameters False \ --gradient_checkpointing True 脚本应该是没错的。
然后我验证了不加”--flash_attn“这个参数,但还是报错,报错如下: “trainable params: 19,988,480 || all params: 6,758,404,096 || trainable%: 0.2957573965106688 Traceback (most recent call last): File "pretraining.py", line 739, in
main()
File "pretraining.py", line 682, in main
trainer = SavePeftModelTrainer(
File "/public/home/lfchen/anaconda3/envs/llama/lib/python3.8/site-packages/transformers/trainer.py", line 347, in init
self.create_accelerator_and_postprocess()
File "/public/home/lfchen/anaconda3/envs/llama/lib/python3.8/site-packages/transformers/trainer.py", line 3983, in create_accelerator_and_postprocess
deepspeed_plugin=self.args.deepspeed_plugin,
AttributeError: 'PeftArguments' object has no attribute 'deepspeed_plugin'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 90413) of binary”
查了一下可能的原因如下所示,执行脚本是将上面的运行脚本中的flash_attn参数删除,其他不变: “代码中的PeftArguments对象中没有名为deepspeed_plugin的属性”。想请教一下这些错误的原因是什么,不胜感激!