scikit-learn-contrib / lightning

Large-scale linear classification, regression and ranking in Python
https://contrib.scikit-learn.org/lightning/
1.73k stars 214 forks source link

[BUG] Results from liblinear and SAGAClassifier do not match #113

Open arthurmensch opened 7 years ago

arthurmensch commented 7 years ago

I wanted to check the differences between the output of SAGAClassifier and the output of liblinear, from LogisticRegression in sklearn. It turns out that both estimators return very different coefficients when solving what I thought would be the same problem. I suspect that I do not set the regularisation parameters correctly, would you be able to enlighten me ?


  iris = load_iris()
    X, y = iris.data, iris.target
    X = np.concatenate([X] * 10)
    y = np.concatenate([y] * 10)

    X_bin = X[y <= 1]
    y_bin = y[y <= 1] * 2 - 1

    X_sparse, y_sparse = make_classification(n_samples=100, n_features=50,
                                             random_state=0)
    X_sparse = sparse.csr_matrix(X_sparse)

    for penalty in ['l2', 'l1']:
        for (X, y) in ((X_bin, y_bin), (X_sparse, y_sparse)):
            n_samples = X.shape[0]
            for alpha in [1e-4, 0.001, 0.01, 0.1, 1, 10, 100]:
                liblinear = LogisticRegression(
                    C=1. / (n_samples * alpha),
                    solver='liblinear',
                    multi_class='ovr',
                    max_iter=500,
                    fit_intercept=False,
                    penalty=penalty, random_state=0, tol=1e-24)

                if penalty == 'l1':
                    lalpha = 0
                    lbeta = alpha
                    lpenalty = 'l1'
                else:
                    lalpha = alpha
                    lbeta = 0
                    lpenalty = None
                lsaga = SAGAClassifier(loss='log',
                                       beta=lbeta, penalty=lpenalty,
                                       alpha=lalpha,
                                       max_iter=5000,
                                       random_state=0)

                lsaga.fit(X, y)
                liblinear.fit(X, y)
                print('[penalty=%s, alpha=%s, solver=liblinear]' % (penalty, alpha),
                      liblinear.coef_[0, :4])
                print('[penalty=%s, alpha=%s, solver=lightning]' % (penalty, alpha),
                      lsaga.coef_[0, :4])
                print('-------------------------------')

returns

[penalty=l2, alpha=0.0001, solver=liblinear] [-1.02531282 -3.15400719  4.88951892  2.4742867 ]
[penalty=l2, alpha=0.0001, solver=lightning] [-0.96940347 -2.93327997  4.42703281  2.06674462]
-------------------------------
[penalty=l2, alpha=0.001, solver=liblinear] [-0.70002361 -2.29196084  3.47376739  1.6533943 ]
[penalty=l2, alpha=0.001, solver=lightning] [-0.81731435 -2.54960375  3.90450615  1.82232541]
-------------------------------
[penalty=l2, alpha=0.01, solver=liblinear] [-0.44234397 -1.48544888  2.23987699  1.01147785]
[penalty=l2, alpha=0.01, solver=lightning] [-0.48432713 -1.47725009  2.27903945  1.0521148 ]
-------------------------------
[penalty=l2, alpha=0.1, solver=liblinear] [-0.22184315 -0.77409488  1.17300667  0.51035061]
[penalty=l2, alpha=0.1, solver=lightning] [-0.24775234 -0.78256783  1.16843028  0.50506395]
-------------------------------
[penalty=l2, alpha=1, solver=liblinear] [-0.05236005 -0.22889097  0.36543505  0.15461335]
[penalty=l2, alpha=1, solver=lightning] [-0.03478164 -0.22241039  0.38099032  0.15941418]
-------------------------------
[penalty=l2, alpha=10, solver=liblinear] [ 0.00451328 -0.02524653  0.05703947  0.0231134 ]
[penalty=l2, alpha=10, solver=lightning] [ 0.00504563 -0.02507184  0.05748536  0.02325224]
-------------------------------
[penalty=l2, alpha=100, solver=liblinear] [ 0.00194671 -0.0018153   0.00675508  0.00263555]
[penalty=l2, alpha=100, solver=lightning] [ 0.00196397 -0.00180612  0.00676533  0.00263853]
-------------------------------
[penalty=l2, alpha=0.0001, solver=liblinear] [-0.82022128  0.92745245  0.42971564 -1.32700661]
[penalty=l2, alpha=0.0001, solver=lightning] [-0.2659946   0.39512438  0.2258429  -0.81652081]
-------------------------------
[penalty=l2, alpha=0.001, solver=liblinear] [-0.46367915  0.56980419  0.26686545 -0.89167593]
[penalty=l2, alpha=0.001, solver=lightning] [-0.22980838  0.36281145  0.20797067 -0.74839268]
-------------------------------
[penalty=l2, alpha=0.01, solver=liblinear] [-0.14942186  0.2984446   0.13926796 -0.51601786]
[penalty=l2, alpha=0.01, solver=lightning] [-0.1077949   0.26593897  0.11853815 -0.47146184]
-------------------------------
[penalty=l2, alpha=0.1, solver=liblinear] [-0.01941695  0.15948197  0.04128331 -0.15189835]
[penalty=l2, alpha=0.1, solver=lightning] [-0.02838471  0.15804439  0.02516528 -0.14709557]
-------------------------------
[penalty=l2, alpha=1, solver=liblinear] [-0.00669644  0.04957467 -0.00320788 -0.00985278]
[penalty=l2, alpha=1, solver=lightning] [-0.0062626   0.04967    -0.00297072 -0.00952412]
-------------------------------
[penalty=l2, alpha=10, solver=liblinear] [-0.00093277  0.00708531 -0.00238096  0.00041233]
[penalty=l2, alpha=10, solver=lightning] [-0.0009186   0.00712768 -0.00248968  0.00046571]
-------------------------------
[penalty=l2, alpha=100, solver=liblinear] [ -9.60703453e-05   7.45333077e-04  -2.91400764e-04   7.26364227e-05]
[penalty=l2, alpha=100, solver=lightning] [ -9.66109354e-05   7.45361637e-04  -2.92248813e-04   7.30893945e-05]
-------------------------------
[penalty=l1, alpha=0.0001, solver=liblinear] [-0.32907481 -5.48237105  6.84810188  0.        ]
[penalty=l1, alpha=0.0001, solver=lightning] [-0.97892471 -2.97072992  4.4728889   2.07649284]
-------------------------------
[penalty=l1, alpha=0.001, solver=liblinear] [ 0.         -4.09888584  4.5791252   0.        ]
[penalty=l1, alpha=0.001, solver=lightning] [-0.88400508 -2.86859767  4.29820189  1.88896483]
-------------------------------
[penalty=l1, alpha=0.01, solver=liblinear] [ 0.         -2.51730703  2.81037122  0.        ]
[penalty=l1, alpha=0.01, solver=lightning] [-0.23747537 -2.27213624  2.99581196  0.24480634]
-------------------------------
[penalty=l1, alpha=0.1, solver=liblinear] [ 0.         -1.05958628  1.22152771  0.        ]
[penalty=l1, alpha=0.1, solver=lightning] [ 0.         -1.07317208  1.23880475  0.        ]
-------------------------------
[penalty=l1, alpha=1, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=1, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=10, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=10, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=100, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=100, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=0.0001, solver=liblinear] [-0.44078639  0.01458268  1.32293892 -1.10445545]
[penalty=l1, alpha=0.0001, solver=lightning] [-0.25885727  0.38927315  0.22696332 -0.81394116]
-------------------------------
[penalty=l1, alpha=0.001, solver=liblinear] [-0.07313686  0.06330551  0.64000195 -0.76230004]
[penalty=l1, alpha=0.001, solver=lightning] [-0.16498383  0.31122609  0.21626616 -0.72676074]
-------------------------------
[penalty=l1, alpha=0.01, solver=liblinear] [ 0.          0.          0.09711175 -0.36113837]
[penalty=l1, alpha=0.01, solver=lightning] [ 0.          0.11207056  0.08627249 -0.30509918]
-------------------------------
[penalty=l1, alpha=0.1, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=0.1, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=1, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=1, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=10, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=10, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
[penalty=l1, alpha=100, solver=liblinear] [ 0.  0.  0.  0.]
[penalty=l1, alpha=100, solver=lightning] [ 0.  0.  0.  0.]
-------------------------------
arthurmensch commented 7 years ago

I should add that SAGClassifier (for l2 penalty) seems to behave the same way.

    iris = load_iris()
    X, y = iris.data, iris.target
    X = np.concatenate([X] * 10)
    y = np.concatenate([y] * 10)

    X_bin = X[y <= 1]
    y_bin = y[y <= 1] * 2 - 1

    X_sparse, y_sparse = make_classification(n_samples=1000, n_features=50,
                                             random_state=0)
    X_sparse = sparse.csr_matrix(X_sparse)

    for (X, y) in ((X_bin, y_bin), (X_sparse, y_sparse)):
        for penalty in ['l2']:
            n_samples = X.shape[0]
            for alpha in np.logspace(-3, 3, 5):
                liblinear = LogisticRegression(
                    C=1. / (n_samples * alpha),
                    solver='liblinear',
                    multi_class='ovr',
                    max_iter=500,
                    fit_intercept=False,
                    penalty=penalty, random_state=0, tol=1e-24)

                if penalty == 'l1':
                    lalpha = 0
                    lbeta = alpha
                    lpenalty = 'l1'
                else:
                    lalpha = alpha
                    lbeta = 0
                    lpenalty = None
                lsag = SAGClassifier(loss='log',
                                       beta=lbeta, penalty=lpenalty,
                                       alpha=lalpha,
                                       max_iter=3000,
                                       random_state=0)

                lsaga = SAGAClassifier(loss='log',
                                       beta=lbeta, penalty=lpenalty,
                                       alpha=lalpha,
                                       max_iter=3000,
                                       random_state=0)

                lsaga.fit(X, y)
                lsag.fit(X, y)
                liblinear.fit(X, y)
                print('[penalty=%s, alpha=%s, solver=liblinear]' % (penalty, alpha),
                      liblinear.coef_[0, :4])
                print('[penalty=%s, alpha=%s, solver=lightning saga]' % (penalty, alpha),
                      lsaga.coef_[0, :4])
                print('[penalty=%s, alpha=%s, solver=lightning sag]' % (penalty, alpha),
                      lsag.coef_[0, :4])
                print('-------------------------------')
--------
[penalty=l2, alpha=0.001, solver=liblinear] [-0.70002361 -2.29196084  3.47376739  1.6533943 ]
[penalty=l2, alpha=0.001, solver=lightning saga] [-0.81731435 -2.54960375  3.90450615  1.82232541]
[penalty=l2, alpha=0.001, solver=lightning sag] [-1.20407646 -4.19130767  6.18008358  2.65898967]
-------------------------------
[penalty=l2, alpha=0.0316227766017, solver=liblinear] [-0.32789047 -1.11596898  1.68437735  0.74534804]
[penalty=l2, alpha=0.0316227766017, solver=lightning saga] [-0.37339261 -1.12060485  1.68188194  0.74885022]
[penalty=l2, alpha=0.0316227766017, solver=lightning sag] [-0.33469617 -1.11047128  1.6849291   0.74818174]
-------------------------------
[penalty=l2, alpha=1.0, solver=liblinear] [-0.05236005 -0.22889097  0.36543505  0.15461335]
[penalty=l2, alpha=1.0, solver=lightning saga] [-0.03478164 -0.22241039  0.38099032  0.15941418]
[penalty=l2, alpha=1.0, solver=lightning sag] [-0.05261102 -0.2286055   0.36511032  0.15468706]
-------------------------------
[penalty=l2, alpha=31.6227766017, solver=liblinear] [ 0.00430706 -0.00666999  0.02016419  0.00797502]
[penalty=l2, alpha=31.6227766017, solver=lightning saga] [ 0.00447114 -0.00657897  0.02025385  0.00799994]
[penalty=l2, alpha=31.6227766017, solver=lightning sag] [ 0.00431818 -0.00666541  0.02017375  0.00797861]
-------------------------------
[penalty=l2, alpha=1000.0, solver=liblinear] [ 0.0002283  -0.00016419  0.00069642  0.00026974]
[penalty=l2, alpha=1000.0, solver=lightning saga] [ 0.00022848 -0.00016409  0.00069653  0.00026977]
[penalty=l2, alpha=1000.0, solver=lightning sag] [ 0.00022836 -0.00016415  0.00069646  0.00026975]
-------------------------------
--------
[penalty=l2, alpha=0.001, solver=liblinear] [-0.07378359 -0.08176338  0.15239206  0.150337  ]
[penalty=l2, alpha=0.001, solver=lightning saga] [-0.11194055 -0.08520462  0.17792485  0.13378968]
[penalty=l2, alpha=0.001, solver=lightning sag] [-0.10485795 -0.0808454   0.17528227  0.15793832]
-------------------------------
[penalty=l2, alpha=0.0316227766017, solver=liblinear] [-0.00666195 -0.057333    0.07570911  0.06161208]
[penalty=l2, alpha=0.0316227766017, solver=lightning saga] [-0.00618323 -0.07040822  0.10739897  0.01089067]
[penalty=l2, alpha=0.0316227766017, solver=lightning sag] [-0.00214345 -0.05492471  0.07050465  0.06115048]
-------------------------------
[penalty=l2, alpha=1.0, solver=liblinear] [ 0.00666119 -0.00775256  0.01907765  0.01085946]
[penalty=l2, alpha=1.0, solver=lightning saga] [ 0.00667705 -0.0076178   0.01907703  0.01098435]
[penalty=l2, alpha=1.0, solver=lightning sag] [ 0.00669436 -0.00763707  0.01888839  0.01085222]
-------------------------------
[penalty=l2, alpha=31.6227766017, solver=liblinear] [ 0.00041758 -0.00028051  0.00096995  0.00054382]
[penalty=l2, alpha=31.6227766017, solver=lightning saga] [ 0.00041997 -0.00028113  0.0009699   0.00054591]
[penalty=l2, alpha=31.6227766017, solver=lightning sag] [ 0.00041891 -0.00028166  0.00096977  0.00054561]
-------------------------------
[penalty=l2, alpha=1000.0, solver=liblinear] [  1.35809058e-05  -8.90077970e-06   3.13036897e-05   1.75473968e-05]
[penalty=l2, alpha=1000.0, solver=lightning saga] [  1.35823360e-05  -8.90193905e-06   3.13035749e-05   1.75491991e-05]
[penalty=l2, alpha=1000.0, solver=lightning sag] [  1.35823230e-05  -8.90193147e-06   3.13035523e-05   1.75492055e-05]
fabianp commented 7 years ago

Add tol=-1 to the kwargs of SAG*Classifier :-)

Also, note that lightning has a definition the logistic loss that is very slightly different from that of liblinear. In particular, the one in lightning is an approximation using some heuristics. See https://github.com/scikit-learn-contrib/lightning/blob/master/lightning/impl/sgd_fast.pyx , class Log . This only matters typically if you look beyond 10^-4 (e.g. if you plot convergence plots).

arthurmensch commented 7 years ago

Indeed, sorry for the noise, it now works. Do you use this approximation for speed or underflow control ?

fabianp commented 7 years ago

Mathieu did this, but I would say speed as underflow can be controlled by other ways (liblinear does this pretty well).

mblondel commented 7 years ago

If I am not mistaken, the log loss trick comes from scikit-learn, which itself comes from Leon Bottou's SGD code.