hiyouga / LLaMA-Factory

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
30.26k stars 3.73k forks source link

加载checkpoint相关问题 #3857

Closed sunfan1997 closed 3 months ago

sunfan1997 commented 3 months ago

Reminder

Reproduction

1.在4090×2上训练的checkpoint,per_device_train_batch_size=2,gradient_accumulation_steps=8,换到A100 40G×4机器上,per_device_train_batch_size=8,gradient_accumulation_steps=8,每个epoch的step由原来的13000多缩小到了6000多,这是为什么,也不是线性关系啊,应该缩小到1/8才对。 2.A100的机器上的adapter模型再加载到4090上训练,报错 TypeError: TrainerState.init() got an unexpected keyword argument 'stateful_callbacks'。 3.我在做SFT的lora微调,用的int4量化,还有什么办法能加速吗,4*A100 40G,per_device_train_batch_size=8,每张卡大约占用20多G,设置到10就可能OOM。 谢谢大佬回复。 训练脚本

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch \
    /home/tom/fssd/LLaMA-Factory/src/train_bash.py \
    --stage sft \
    --do_train \
    --model_name_or_path /home/tom/fssd/Baichuan2-Chat-v2 \
    --dataset juhe_self_config,CrimeKgAssitant_92k,legal_advice,legal_counsel_v2,CrimeKgAssitant_52k,zixun_gpt4,lawzhidao_filter,CAIL2018_EC_sentence_pred,DISC-Law-SFT-Pair-judgmentPred,CAIL2022_eventDet,prac_prob,DISC-Law-SFT-Triplet-released,DISC-Law-SFT-Pair,alpaca_gpt4_zh \
    --dataset_dir /home/tom/fssd/LLaMA-Factory/data \
    --template baichuan2 \
    --finetuning_type lora \
    --lora_target W_pack \
    --output_dir /home/tom/fssd/LLaMA-Factory/saves/Baichuan2-13B-Chat/lora/train_2024-05-17 \
    --overwrite_cache \
    --overwrite_output_dir \
    --cutoff_len 1024 \
    --preprocessing_num_workers 16 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 1 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --warmup_steps 20 \
    --save_steps 2000 \
    --eval_steps 2000 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --learning_rate 1e-4 \
    --num_train_epochs 6 \
    --max_samples 100000 \
    --val_size 0.1 \
    --ddp_timeout 180000000 \
    --repetition_penalty 1.2 \
    --plot_loss \
    --quantization_bit 4 \
    --fp16

Expected behavior

No response

System Info

No response

Others

No response

hiyouga commented 3 months ago

目前不支持 baichuan2 模型加速,请使用 Qwen1.5