Closed geraschenko closed 1 year ago
We can turn verbose=1
and check the if f
or g
reach convergence:
With the larger tol
, we reach the gtol
convergence: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
while with small tol
, we reach the ftol
convergence CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
I am not sure how we should expose these tolerances indeed. ping @jnothman @agramfort @GaelVaroquaux @ogrisel @amueller
It doesn‘t solve the convergence issues of lbfgs, but note that the given example is an optimization problem of 1000+1 parameters (n_fearures + intercept) of which 1000 do not influence the objective at all (or terribly missed something) as y does not depend on x.
Edit: The L2 penalty term of the the objective function depends on the 1000 parameters.
@lorentzenchr That's correct. This is random data with random labels, meant simply to demonstrate the convergence issue. If you instead use labels y = (x.mean(axis=1) > .5)
, you get the same issue that lbfgs terminates after a fixed number of iterations regardless of how small you make tol
(though the absolute difference in fit coefficients is less dramatic).
eps
is not relevant here as the grad function is provided.
passing ftol: tol
in options would fix the problem. Now if it should be a fraction of tol or something it's not obvious.
I would be fine adding ftol: tol
This might also be related to #24752.
I opened #27191 where I set gtol
to tol
(as previously) and ftol
proportional to gtol
such that its default value stays the same as previously for the default value of tol=1e-4
. It seems to work fine in the sense that none of the existing tests break.
With recent master, i.e. #26721 merged, (extending until tol 1e-7) I get
tol=0.01
Optimizer iterations, forward order: 12, reverse order: 12.
Mean absolute diff in coefficients: 7.475020494078499e-16
tol=0.001
Optimizer iterations, forward order: 61, reverse order: 61.
Mean absolute diff in coefficients: 9.457888598980843e-10
tol=0.0001
Optimizer iterations, forward order: 116, reverse order: 110.
Mean absolute diff in coefficients: 0.0029428192009488627
tol=1e-05
Optimizer iterations, forward order: 332, reverse order: 323.
Mean absolute diff in coefficients: 0.000762248896287633
tol=1e-06
Optimizer iterations, forward order: 423, reverse order: 401.
Mean absolute diff in coefficients: 8.578505782753616e-05
tol=1e-07
Optimizer iterations, forward order: 633, reverse order: 473.
Mean absolute diff in coefficients: 6.8633468157778965e-06
This looks good, so issue solved.
LogisticRegresssion with the lbfgs solver terminates early, even when
tol
is decreased andmax_iter
has not been reached.Code to Reproduce
We fit random data twice, changing only the order of the examples. Ideally, example order should not matter; the fit coefficients should be the same either way. I produced the results below with this code in colab.
Expected Results
As
tol
is reduced, the difference between coefficients continues to decrease provided thatmax_iter
is not being hit. Whensolver
is changed to'newton-cg'
, we get the expected behavior:Actual Results
As
tol
is reduced, the optimizer does not take more steps despite not having converged:Versions
Output of
sklearn.show_versions()
:Diagnosis
I'm pretty sure the issue is in the call to scipy.optimize.minimize at this line in linear_model/_logistic.py. The value of
tol
is passed tominimize
asgtol
, butftol
andeps
are left at their default values. In the example above, I think the optimizer is hitting theftol
termination condition. Possible solutions:ftol
andeps
by some multiple oftol
.eps
by some multiple oftol
and setftol
to zero.ftol
andeps
through additional kwargs.