Hi. This is not a issue, but a question. I suspect you compared several optimizers for SGD before ended with adam. Was it better than others? I also have FM implementation which uses adagrad so far. I tried rmsprop, but it was much much worse. My question is - did you compares adam with adagrad? was it better?
I didn't compare adagrad with adam. The reason I use adam is because it always performs better than agagrad on my nn models. I might make a benchmark if I have time to implement adagrad
Hi. This is not a issue, but a question. I suspect you compared several optimizers for SGD before ended with adam. Was it better than others? I also have FM implementation which uses adagrad so far. I tried rmsprop, but it was much much worse. My question is - did you compares adam with adagrad? was it better?