hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Apache License 2.0

25.26k stars 3.13k forks source link

微调llama3-8b的时候，eval_loss不断上升，考虑到了使用多个数据集混合，但还是没有效果，应该怎么解决？ #4554

Closed MemoryOldTime closed 2 days ago

MemoryOldTime commented 3 days ago

Reminder

[X] I have read the README and searched the existing issues.

System Info

8张Asend910A，数据集采用的alpaca_en（21.7MB）和alpaca_gpt4_en（41.3MB），利用lora技术进行混合微调

Reproduction

!/bin/bash

NPROC_PER_NODE=8 NNODES=1 RANK=0

ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun \ --nproc_per_node $NPROC_PER_NODE \ --nnodes $NNODES \ --node_rank $RANK \ src/train.py examples/train_lora/llama3_lora_sft_ds0.yaml

Expected behavior

感觉不太像是数据集不够的原因，模型参数明显也不能改变了，不太像是正常情况，还有什么办法可以解决这种问题吗 625

Others

No response