mathurinm commented 3 years ago

I guess we should maintain Xw and not R (for logreg for example). It may require storing Xj @ y in the datafit class to avoid recomputing it at every gradient step, but I am not sure how to handle it for CV or more generally when new X and y's are passed

I don't know how much more costly it is to do Xj @ (y - Xw) than Xj @ R

QB3 commented 3 years ago

I don't know how much more costly it is to do Xj @ (y - Xw) than Xj @ R

I would bet on an additional O(n) per update of coordinate descent. IMO the 2 questions become:

how much do we loose with Xj @ (y - Xw) instead of Xj @ R
is it possible to keep X_j R for quadratic datafit, and Xw for other with too much code duplication

codecov-commenter commented 3 years ago

Codecov Report

Merging #29 (ef00529) into master (b8ce7d3) will increase coverage by 7.10%. The diff coverage is 65.72%.

@@            Coverage Diff             @@
##           master      #29      +/-   ##
==========================================
+ Coverage   56.19%   63.29%   +7.10%     
==========================================
  Files          12       11       -1     
  Lines        1098      613     -485     
  Branches      242      101     -141     
==========================================
- Hits          617      388     -229     
+ Misses        406      191     -215     
+ Partials       75       34      -41

Impacted Files	Coverage Δ
andersoncd/tests/test_docstring_parameters.py	`73.91% <ø> (-0.73%)`	:arrow_down:
andersoncd/penalties.py	`43.11% <43.11%> (ø)`
andersoncd/solver.py	`61.53% <61.53%> (ø)`
andersoncd/datafits.py	`64.70% <64.70%> (ø)`
andersoncd/data/synthetic.py	`72.41% <69.23%> (-18.50%)`	:arrow_down:
andersoncd/__init__.py	`100.00% <100.00%> (ø)`
andersoncd/data/__init__.py	`100.00% <100.00%> (ø)`
andersoncd/estimators.py	`100.00% <100.00%> (ø)`
andersoncd/tests/test_estimators.py	`100.00% <100.00%> (ø)`
andersoncd/utils.py	`28.76% <0.00%> (-4.11%)`	:arrow_down:
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6174070...ef00529. Read the comment docs.

QB3 commented 3 years ago

WDYT of doing it properly and putting the n_samples in the lipschitz constant, in the gradient, etc ? IMO this will be easier for external contributors, and may solve ussome headaches without being more costly

seems better than what we currently have indeed. I propose to do it in an other PR

mathurinm / andersoncd

ENH: use Datafit class #29

Codecov Report