Open fabianp opened 8 years ago
Indeed I don't think SAG is monotonic but if the behavior appears repeatedly, this might be a bug.
One explanation would be that in the first epoch, SAG has accumulated only very few stochastic gradients and therefore the auto step size might be too large.
I've observed that SAG increases the objective function in the first epoch. This would be OK occasionally, except that I'm seeing this behaviour consistently across different datasets, which lead me to think that there might be a bug in the implementation:
I'm not seeing this behaviour with SAGA (se also http://fa.bianp.net/blog/2016/saga-algorithm-in-the-lightning-library/ )