mathurinm / andersoncd

This code is no longer maintained. The codebase has been moved to https://github.com/scikit-learn-contrib/skglm. This repository only serves to reproduce the results of the AISTATS 2021 paper "Anderson acceleration of coordinate descent" by Quentin Bertrand and Mathurin Massias.
BSD 3-Clause "New" or "Revised" License
18 stars 6 forks source link

robustify against 0 norm features #63

Closed mathurinm closed 2 years ago

mathurinm commented 2 years ago

Something weird is happening: without this, the solver fails when X has a 0 column (which is normal, there is a division by 0)

But such a column should not be selected in the WS, right ?

Reproduce with

import libsvmdata
import numpy as np
from numpy.linalg import norm
from andersoncd import Lasso
X, y = libsvmdata.fetch_libsvm("rcv1.binary")
alpha_max = norm(X.T @ y, ord=np.inf) / len(y)
clf = Lasso(fit_intercept=False, alpha=alpha_max/10, verbose=1).fit(X, y)
codecov-commenter commented 2 years ago

Codecov Report

Merging #63 (ea11649) into master (b8ce7d3) will increase coverage by 8.90%. The diff coverage is 67.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #63      +/-   ##
==========================================
+ Coverage   56.19%   65.09%   +8.90%     
==========================================
  Files          12       11       -1     
  Lines        1098      742     -356     
  Branches      242      117     -125     
==========================================
- Hits          617      483     -134     
+ Misses        406      228     -178     
+ Partials       75       31      -44     
Impacted Files Coverage Δ
andersoncd/tests/test_docstring_parameters.py 73.91% <ø> (-0.73%) :arrow_down:
andersoncd/penalties.py 44.76% <44.76%> (ø)
andersoncd/datafits.py 52.23% <52.23%> (ø)
andersoncd/solver.py 66.44% <66.44%> (ø)
andersoncd/data/synthetic.py 72.41% <69.23%> (-18.50%) :arrow_down:
andersoncd/estimators.py 92.39% <92.39%> (ø)
andersoncd/__init__.py 100.00% <100.00%> (ø)
andersoncd/data/__init__.py 100.00% <100.00%> (ø)
andersoncd/tests/test_estimators.py 100.00% <100.00%> (ø)
andersoncd/utils.py 28.76% <0.00%> (-4.11%) :arrow_down:
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1d83596...ea11649. Read the comment docs.

mathurinm commented 2 years ago

Investigated a bit further : at the iteration where the solver fails, kkt has only 82 non zero values (I'm surprised)

by growth policy it turns out that we select 114 feats in subpb.. We are unlucky and amongst the 114 - 86 features with 0 kkt violation that we pick, there is one which is a 0 column, hence the failure.

It's wild that so many features have 0 kkt violation, no ? I see the easy fix of selecting at most (kkt != 0).sum() features in the ws, but I'm surprised it's 0 for such a large number of features

Your take on this @QB3 ?

QB3 commented 2 years ago

I'm surprised it's 0 for such a large number of features

I also already observed that uncleaned rcv1 has a large number of zero columns. +1 for (kkt != 0).sum()