hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs
Apache License 2.0
25.26k stars 3.13k forks source link

微调llama3-8b的时候,eval_loss不断上升,考虑到了使用多个数据集混合,但还是没有效果,应该怎么解决? #4554

Closed MemoryOldTime closed 2 days ago

MemoryOldTime commented 3 days ago

Reminder

System Info

8张Asend910A,数据集采用的alpaca_en(21.7MB)和alpaca_gpt4_en(41.3MB),利用lora技术进行混合微调

Reproduction

!/bin/bash

NPROC_PER_NODE=8 NNODES=1 RANK=0

ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun \ --nproc_per_node $NPROC_PER_NODE \ --nnodes $NNODES \ --node_rank $RANK \ src/train.py examples/train_lora/llama3_lora_sft_ds0.yaml

Expected behavior

感觉不太像是数据集不够的原因,模型参数明显也不能改变了,不太像是正常情况,还有什么办法可以解决这种问题吗 625

Others

No response