Open cnuernber opened 6 years ago
So you want to implement the approach from this publication? Or are open to any ideas which might improve solver gains?
Any ideas. Especially research into why Adam doesn't performs as well as SGD for some (important) problems.
I believe the choice of optimizer depends on the class of problem - its not an across the board 'this one is best' - so this is is not at all surprising. I am assuming this is to be done via the Cortex layer and not in CUDA or TensorFlow? Downloaded the paper to read.
It surprised a lot of very experienced practitioners in machine learning at NIPS; for a long time we were all trying to get rid of hyperparameters and there are a large set of problems where Adam and friends do provably converge faster; just not overparameterized machine learning problems. Here I think is the paper that was quite interesting:
Oh, and if you can figure out concretely why this is and fix it for hyper-parameterless optimizers then you have your Ph. D I think :-); so if I were you I wouldn't worry about cortex vs. tf.
https://arxiv.org/abs/1710.09278