Closed Iron-Stark closed 7 years ago
Ok, I will review this one when you say it is ready.
Is this PR ready for review?
@zoq
Yes. I have added the other MATLAB implementations in a different PR.
This looks ok to me, but the tests are hanging on Jenkins. Have you run the tests successfully locally?
The tests are failing when adding AbsTol and MaxIter to lasso call.
Invalid parameter name: MaxIter.
Error in lasso (line 257) = internal.stats.parseArgs(pnames, dflts, varargin{:});
Invalid parameter name: AbsTol.
Error in lasso (line 257) = internal.stats.parseArgs(pnames, dflts, varargin{:});
I have added the lasso implementation to PR #94 itself. So closing this one.
It looks like MaxIter
and AbsTol
was introduced in 2017, so for now let's remove the options from the script.
>> help lasso
LASSO Perform lasso or elastic net regularization for linear regression.
[B,STATS] = lasso(X,Y,...) Performs L1-constrained linear least
squares fits (lasso) or L1- and L2-constrained fits (elastic net)
relating the predictors in X to the responses in Y. The default is a
lasso fit, or constraint on the L1-norm of the coefficients B.
Positional parameters:
X A numeric matrix (dimension, say, NxP)
Y A numeric vector of length N
Optional input parameters:
'Weights' Observation weights. Must be a vector of non-negative
values, of the same length as columns of X. At least
two values must be positive. (default ones(N,1) or
equivalently (1/N)*ones(N,1)).
'Alpha' Elastic net mixing value, or the relative balance
between L2 and L1 penalty (default 1, range (0,1]).
Alpha=1 ==> lasso, otherwise elastic net.
Alpha near zero ==> nearly ridge regression.
'NumLambda' The number of lambda values to use, if the parameter
'Lambda' is not supplied (default 100). Ignored
if 'Lambda' is supplied. LASSO may return fewer
fits than specified by 'NumLambda' if the residual
error of the fits drops below a threshold percentage
of the variance of Y.
'LambdaRatio' Ratio between the minimum value and maximum value of
lambda to generate, if the parameter "Lambda" is not
supplied. Legal range is [0,1). Default is 0.0001.
If 'LambdaRatio' is zero, LASSO will generate its
default sequence of lambda values but replace the
smallest value in this sequence with the value zero.
'LambdaRatio' is ignored if 'Lambda' is supplied.
'Lambda' Lambda values. Will be returned in return argument
STATS in ascending order. The default is to have LASSO
generate a sequence of lambda values, based on 'NumLambda'
and 'LambdaRatio'. LASSO will generate a sequence, based
on the values in X and Y, such that the largest LAMBDA
value is just sufficient to produce all zero coefficients B.
You may supply a vector of real, non-negative values of
lambda for LASSO to use, in place of its default sequence.
If you supply a value for 'Lambda', 'NumLambda' and
'LambdaRatio' are ignored.
'DFmax' Maximum number of non-zero coefficients in the model.
Can be useful with large numbers of predictors.
Results only for lambda values that satisfy this
degree of sparseness will be returned. Default is
to not limit the number of non-zero coefficients.
'Standardize' Whether to scale X prior to fitting the model
sequence. This affects whether the regularization is
applied to the coefficients on the standardized
scale or the original scale. The results are always
presented on the original data scale. Default is
TRUE, do scale X.
Note: X and Y are always centered.
'RelTol' Convergence threshold for coordinate descent algorithm.
The coordinate descent iterations will terminate
when the relative change in the size of the
estimated coefficients B drops below this threshold.
Default: 1e-4. Legal range is (0,1).
'CV' If present, indicates the method used to compute MSE.
When 'CV' is a positive integer K, LASSO uses K-fold
cross-validation. Set 'CV' to a cross-validation
partition, created using CVPARTITION, to use other
forms of cross-validation. You cannot use a
'Leaveout' partition with LASSO.
When 'CV' is 'resubstitution', LASSO uses X and Y
both to fit the model and to estimate the mean
squared errors, without cross-validation.
The default is 'resubstitution'.
'MCReps' A positive integer indicating the number of Monte-Carlo
repetitions for cross-validation. The default value is 1.
If 'CV' is 'resubstitution' or a cvpartition of type
'resubstitution', 'MCReps' must be 1. If 'CV' is a
cvpartition of type 'holdout', then 'MCReps' must be
greater than one.
'PredictorNames' A cell array of names for the predictor variables,
in the order in which they appear in X.
Default: {}
'Options' A structure that contains options specifying whether to
conduct cross-validation evaluations in parallel, and
options specifying how to use random numbers when computing
cross validation partitions. This argument can be created
by a call to STATSET. CROSSVAL uses the following fields:
'UseParallel'
'UseSubstreams'
'Streams'
For information on these fields see PARALLELSTATS.
NOTE: If supplied, 'Streams' must be of length one.
Added the lasso implementation to PR #94 Closing this one.
@zoq @rcurtin
I will keep on adding some implementations here.