CDClassifier does not converge

pprett commented 10 years ago

Example using synthetic data from one of the unit tests. CD fails to converge and starts oscillating after iteration 25.

Here is the example: https://gist.github.com/pprett/44d8bb3cbfe84c06a158

@mblondel is this to be expected? or a poor choice of hyper parameters?

mblondel commented 10 years ago

Coordinate descent is known to converge slowly for very loosely regularized problems (small alpha). alpha=0.01 doesn't seem small on first sight but it could be for that particular problem. One way to check is to plot test MSE as a function of alpha. The plot should have a bell shape (too regularized = underfit, loosely regularized = overfit). Perhaps alpha=0.01 will be in the overfit region.

That said the oscillating effect is a bit strange so this could be a bug. I would be curious how the scikit-learn or liblinear solvers behave on the same data.

Also, does the problem only occur with L2 regularization or also with L1 regularization?

pprett commented 10 years ago

@mblondel I confirm, the solver terminates much quicker with L1 penalty, however, I can see some oscillation there too -- especially once I set tol very low:

In [23]: est = CDClassifier(C=10.0, alpha=1.0, random_state=0, penalty="l1", loss="log", verbose=3, max_iter=25, tol=0.000001)
In [24]: est.fit(bin_dense[:10000,:], bin_target[:10000])
Iteration 0
.
Active size: 100
Violation sum ratio: 1.000000 (tol=0.000000)
Iteration 1
.
Active size: 100
Violation sum ratio: 0.119174 (tol=0.000000)
Iteration 2
.
Active size: 100
Violation sum ratio: 0.016049 (tol=0.000000)
Iteration 3
.
Active size: 100
Violation sum ratio: 0.002891 (tol=0.000000)
Iteration 4
.
Active size: 100
Violation sum ratio: 0.000715 (tol=0.000000)
Iteration 5
.
Active size: 100
Violation sum ratio: 0.000306 (tol=0.000000)
Iteration 6
.
Active size: 100
Violation sum ratio: 0.000166 (tol=0.000000)
Iteration 7
.
Active size: 100
Violation sum ratio: 0.000043 (tol=0.000000)
Iteration 8
.
Active size: 100
Violation sum ratio: 0.000030 (tol=0.000000)
Iteration 9
.
Active size: 100
Violation sum ratio: 0.000013 (tol=0.000000)
Iteration 10
.
Active size: 100
Violation sum ratio: 0.000008 (tol=0.000000)
Iteration 11
.
Active size: 100
Violation sum ratio: 0.000008 (tol=0.000000)
Iteration 12
.
Active size: 100
Violation sum ratio: 0.000011 (tol=0.000000)

pprett commented 10 years ago

liblinear (sklearn LogisticRegression) w/ L2 penalty terminates after 4 iterations.

mblondel commented 10 years ago

I pushed b07bf0305ccb604d32d52aee4eaba6f6eb8a3342 which I believe should fix or improve the situation. The L2 solver now uses the same stopping criterion as other solvers. Also you can now use termination="violation_sum" or termination="violation_max". On my laptop, the solver now stops in ~5 iterations with tol=1e-3, ~10 iterations with tol=1e-4 and ~30 iterations with tol=1e-5.

scikit-learn-contrib / lightning

CDClassifier does not converge #16