Closed pfeatherstone closed 4 years ago
@pfeatherstone I was not able to get it to work well. adabouond transitions between adam and sgd, but for coco at least adam has never worked well.
So it works fine for me. It converges quickly like Adam. I didn't leave it running long enough to be able to compare the final weights against yours. But after 4 or 5 epochs, you get a pretty good result.
@pfeatherstone oh, that's very interesting! Well good, I'm glad it works for you.
I have seen Adam perform better than SGD on smaller custom datasets sometimes, but I've never been able to use it on COCO unfortunately.
Can you not get it to detect anything, or just not very well?
@pfeatherstone oh, no there's no real technical problems, it trains fine, it just underperforms SGD in terms of mAP on COCO, at least in all the experiments I've tried.
Ah ok, thought so. Have you had any luck with the OneCycleLr scheduler?
No, haven’t had time. If you’d like to help I can review a PR!
This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.
I noticed in your code you tried using Adabound. How did it compare to SGD + cosine annealing + burnin? Presumably you didn't need the burnin for Adabound?