Closed TomasAndersonFang closed 10 months ago
cc @pacman100 @muellerz
@amyeroberts I'm sorry I actually solved this problem. This problem is caused by fp16 and a large learning rate. When fine-tuning LLaMA with Lora, it's ok to use them. But with full-parameter fine-tuning, it's necessary to use bf16 and a smaller learning rate (I use 5e-6, although 5e-5 is also ok but it's sometimes unstable).
@TomasAndersonFang thanks for replying and detailing what the issue was!
System Info
Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
My script:
Commands used to launch the script:
Accelerate config
Log
Expected behavior
I don't know why loss converges to 0 so quickly, so I think these may have some problems.
Additional info:
My question: