Closed ben-8878 closed 6 months ago
训练代码:
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch \ --config_file examples/accelerate/fsdp_config.yaml \ src/train_bash.py \ --stage sft \ --do_train \ --model_name_or_path ../Chinese-LLM-Chat/models/Meta-Llama-3-70B-Instruct \ --dataset sjcy_sft_zh,general_intension_sft_zh,in3_interaction_zh,cot_zh,sharegpt4_local,comparison_gpt4_zh \ --template llama3 \ --finetuning_type lora \ --lora_target q_proj,v_proj \ --output_dir finetuned_models/intention-llama3-70b \ --cutoff_len 32768 \ --preprocessing_num_workers 16 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 2 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 5000 \ --eval_steps 5000 \ --learning_rate 5e-5 \ --num_train_epochs 6.0 \ --plot_loss \ --ddp_timeout 1800000 \ --val_size 0.001 \ --quantization_bit 4 \ --shift_attn \ --rope_scaling linear \ --fp16
推理代码:
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/cli_demo.py \ --model_name_or_path ../../model_hub/Meta-Llama-3-70B-Instruct-hf \ --template llama3 \ --finetuning_type lora \ --adapter_name_or_path finetune_models/intention-llama3-70b-4k/checkpoint-15000
推理结果示例1 推理结果示例2
关闭 shift_attn
关闭 shift_attn重新训练吗?
是的
shift_attn 用来扩展上下文的,这个参数目前无法使用? 还是fsdp和shift_attn不兼容?
训练代码:
推理代码:
推理结果示例1 推理结果示例2