Open likeucode opened 8 months ago
That's one of the reasons I posted Checkpoint. Maybe it's because of Plateu phenomenon, after 2-3 hours of vibration, loss starts to converge. For mitigation, I tried to make warm-up schedulers, optimizer changes, and so on and I was able to improve.
I've committed the refactored code to train CP, FITB task model. I haven't tested it because I don't have GPU yet, but I've done some optimization that I didn't have before, so I think there will be an improvement if you use it.
Thank you for your prompt reply, i will try it, thx~
I checked my wandb and found this log! Hope this will help you.
After modifying the code, I updated the required checkpoints for CP and CIR. We ask for your interest!
Hi ~ How many epoches had you ever trained?
Hi, the acc is always about 0.5 and there is no decline in training losses when i try to train a cp model. Has anything like that ever happened to you?