mlpack / benchmarks

Machine Learning Benchmark Scripts
101 stars 49 forks source link

Update Matlab implementations 2 #93

Closed Iron-Stark closed 7 years ago

Iron-Stark commented 7 years ago

@zoq @rcurtin

I will keep on adding some implementations here.

rcurtin commented 7 years ago

Ok, I will review this one when you say it is ready.

zoq commented 7 years ago

Is this PR ready for review?

Iron-Stark commented 7 years ago

@zoq

Yes. I have added the other MATLAB implementations in a different PR.

rcurtin commented 7 years ago

This looks ok to me, but the tests are hanging on Jenkins. Have you run the tests successfully locally?

Iron-Stark commented 7 years ago

The tests are failing when adding AbsTol and MaxIter to lasso call.

Invalid parameter name: MaxIter.

Error in lasso (line 257) = internal.stats.parseArgs(pnames, dflts, varargin{:});

Invalid parameter name: AbsTol.

Error in lasso (line 257) = internal.stats.parseArgs(pnames, dflts, varargin{:});

I have added the lasso implementation to PR #94 itself. So closing this one.

zoq commented 7 years ago

It looks like MaxIter and AbsTol was introduced in 2017, so for now let's remove the options from the script.

>> help lasso
 LASSO Perform lasso or elastic net regularization for linear regression.
    [B,STATS] = lasso(X,Y,...) Performs L1-constrained linear least  
    squares fits (lasso) or L1- and L2-constrained fits (elastic net)
    relating the predictors in X to the responses in Y. The default is a
    lasso fit, or constraint on the L1-norm of the coefficients B.

    Positional parameters:

      X                A numeric matrix (dimension, say, NxP)
      Y                A numeric vector of length N

    Optional input parameters:  

      'Weights'        Observation weights.  Must be a vector of non-negative
                       values, of the same length as columns of X.  At least
                       two values must be positive. (default ones(N,1) or 
                       equivalently (1/N)*ones(N,1)).
      'Alpha'          Elastic net mixing value, or the relative balance
                       between L2 and L1 penalty (default 1, range (0,1]).
                       Alpha=1 ==> lasso, otherwise elastic net.
                       Alpha near zero ==> nearly ridge regression.
      'NumLambda'      The number of lambda values to use, if the parameter
                       'Lambda' is not supplied (default 100).  Ignored
                       if 'Lambda' is supplied.  LASSO may return fewer
                       fits than specified by 'NumLambda' if the residual
                       error of the fits drops below a threshold percentage 
                       of the variance of Y.
      'LambdaRatio'    Ratio between the minimum value and maximum value of
                       lambda to generate, if the  parameter "Lambda" is not 
                       supplied.  Legal range is [0,1). Default is 0.0001.
                       If 'LambdaRatio' is zero, LASSO will generate its
                       default sequence of lambda values but replace the
                       smallest value in this sequence with the value zero.
                       'LambdaRatio' is ignored if 'Lambda' is supplied.
      'Lambda'         Lambda values. Will be returned in return argument
                       STATS in ascending order. The default is to have LASSO
                       generate a sequence of lambda values, based on 'NumLambda'
                       and 'LambdaRatio'. LASSO will generate a sequence, based
                       on the values in X and Y, such that the largest LAMBDA                 
                       value is just sufficient to produce all zero coefficients B.
                       You may supply a vector of real, non-negative values of 
                       lambda for LASSO to use, in place of its default sequence.
                       If you supply a value for 'Lambda', 'NumLambda' and 
                       'LambdaRatio' are ignored.
      'DFmax'          Maximum number of non-zero coefficients in the model.
                       Can be useful with large numbers of predictors.
                       Results only for lambda values that satisfy this
                       degree of sparseness will be returned. Default is
                       to not limit the number of non-zero coefficients.
      'Standardize'    Whether to scale X prior to fitting the model
                       sequence. This affects whether the regularization is
                       applied to the coefficients on the standardized
                       scale or the original scale. The results are always
                       presented on the original data scale. Default is
                       TRUE, do scale X.
                       Note: X and Y are always centered.
      'RelTol'         Convergence threshold for coordinate descent algorithm.
                       The coordinate descent iterations will terminate
                       when the relative change in the size of the
                       estimated coefficients B drops below this threshold.
                       Default: 1e-4. Legal range is (0,1).
      'CV'             If present, indicates the method used to compute MSE.
                       When 'CV' is a positive integer K, LASSO uses K-fold
                       cross-validation.  Set 'CV' to a cross-validation 
                       partition, created using CVPARTITION, to use other
                       forms of cross-validation. You cannot use a
                       'Leaveout' partition with LASSO.                
                       When 'CV' is 'resubstitution', LASSO uses X and Y 
                       both to fit the model and to estimate the mean 
                       squared errors, without cross-validation.  
                       The default is 'resubstitution'.
      'MCReps'         A positive integer indicating the number of Monte-Carlo
                       repetitions for cross-validation.  The default value is 1.
                       If 'CV' is 'resubstitution' or a cvpartition of type
                       'resubstitution', 'MCReps' must be 1.  If 'CV' is a
                       cvpartition of type 'holdout', then 'MCReps' must be
                       greater than one.
      'PredictorNames' A cell array of names for the predictor variables,
                       in the order in which they appear in X. 
                       Default: {}
      'Options'        A structure that contains options specifying whether to
                       conduct cross-validation evaluations in parallel, and
                       options specifying how to use random numbers when computing
                       cross validation partitions. This argument can be created
                       by a call to STATSET. CROSSVAL uses the following fields:
                         'UseParallel'
                         'UseSubstreams'
                         'Streams'
                       For information on these fields see PARALLELSTATS.
                       NOTE: If supplied, 'Streams' must be of length one.
Iron-Stark commented 7 years ago

Added the lasso implementation to PR #94 Closing this one.