DOC: sometimes the Lasso solution is the same as sklearn, sometimes not

mathurinm commented 2 years ago

Hi @mblondel @fabianp I think this will be short to answer, why is the solution sometimes equal to that of sklearn, and sometimes not ?

This should be quick to reproduce, look at 1st and 3rd result over 5 seeds:

import numpy as np
from numpy.linalg import norm
from lightning.regression import CDRegressor
from sklearn.linear_model import Lasso

np.random.seed(0)
X = np.random.randn(200, 500)
beta = np.ones(X.shape[1])
beta[20:] = 0
y = X @ beta + 0.3 * np.random.randn(X.shape[0])
alpha = norm(X.T @ y, ord=np.inf) / 10

def p_obj(X, y, alpha, w):
    return norm(y - X @ w) ** 2 / 2 + alpha * norm(w, ord=1)

for seed in range(5):
    print('-' * 80)
    clf = CDRegressor(C=0.5, alpha=alpha, penalty='l1',
                      tol=1-30, random_state=seed)
    clf.fit(X, y)

    las = Lasso(fit_intercept=False, alpha=alpha/len(y), tol=1e-10).fit(X, y)
    print(norm(clf.coef_[0] - las.coef_))

    light_o = p_obj(X, y, alpha, clf.coef_[0])
    sklea_o = p_obj(X, y, alpha, las.coef_)

    print(light_o - sklea_o)

ping @qb3 @agramfort

mathurinm commented 2 years ago

setting permute=False fixes the issue. There may be a bug because permuting feature order is not an heuristic that can prevent convergence (I may be missing what permute does)

agramfort commented 2 years ago

it's like the random option in sklearn I suspect

Message ID: @.*** com>

mblondel commented 2 years ago

Thanks for the repro @mathurinm!

The permute option indeeds just permutes the coordinates: https://github.com/scikit-learn-contrib/lightning/blob/master/lightning/impl/primal_cd_fast.pyx#L1306

permute=True, shrinking=False works so it seems to be the combination of permute=True and shrinking=True that is problematic.

mathurinm commented 2 years ago

I also thought that there was an issue when the two of them were True, but in fact for small L1 pen strength, permute=False, shrinking=True gives different results too:

import numpy as np
from numpy.linalg import norm
from lightning.regression import CDRegressor
from sklearn.linear_model import Lasso

np.random.seed(0)
X = np.random.randn(200, 500)
beta = np.ones(X.shape[1])
beta[20:] = 0
y = X @ beta + 0.3 * np.random.randn(X.shape[0])
alpha = norm(X.T @ y, ord=np.inf) / 100

def p_obj(X, y, alpha, w):
    return norm(y - X @ w) ** 2 / 2 + alpha * norm(w, ord=1)

for shrinking in (True, False):
    seed = 0
    print('-' * 80)
    print(f"With shrinking={shrinking} and permute=False")
    clf = CDRegressor(C=0.5, alpha=alpha, penalty='l1',
                      tol=1-30, random_state=seed, permute=False,
                      shrinking=shrinking)
    clf.fit(X, y)

    las = Lasso(fit_intercept=False, alpha=alpha/len(y), max_iter=100_000,
                tol=1e-10).fit(X, y)
    print(f'distance between coeffs: {norm(clf.coef_[0] - las.coef_)}')

    light_o = p_obj(X, y, alpha, clf.coef_[0])
    sklea_o = p_obj(X, y, alpha, las.coef_)

    print(f"lightning obj - sklearn_obj : {light_o - sklea_o:.7f}")

mblondel commented 2 years ago

Thanks a lot for the investigation @mathurinm! So it seems that shrinking=False is 'unsafe'. Maybe the right thing to do would be to set it to False by default?

scikit-learn-contrib / lightning

DOC: sometimes the Lasso solution is the same as sklearn, sometimes not #186