Open arthurmensch opened 7 years ago
Ok so looking at the code I believe that the gradient is monitored and compared to the gradient computed at the first epoch in SAGClassifier ?
yes we are looking at the residuals of the KKT conditions, "normalized" by the residuals at the first iteration if I remember correctly. Anyway, would be cool to have the stopping criteria you mention.
I haven't been able to find appropriate documentation for the stopping criterion that are used in SAG(A) and coordinate descent. Is it violation of the KKT conditions ? It would be great to make this explicit in the documentation, and verify consistency of the stopping criterion across solvers.
Optionally it would also be nice to be able to monitor the loss on a validation set to do early stopping, as it is done with specific callbacks in e.g.,
keras
-- but this is a feature that should appear inscikit-learn
.