treder / MVPA-Light

Matlab toolbox for classification and regression of multi-dimensional data
MIT License
70 stars 35 forks source link

Regression coefficient #16

Closed paulsowman closed 4 years ago

paulsowman commented 4 years ago

Thanks for making this resource available. Is there a method for returning regression coefficients? I'm not quite sure how to interpret the result struct if I set cfg.metric to 'none'.

Kind regards, Paul

treder commented 4 years ago

If you need regression coefficients, the easiest way is to use the train function directly. E.g. for ridge regression, this would do the job:

param = mv_get_hyperparameter('ridge');  % get default hyperparameters for ridge regression
model = train_ridge(param, X, Y)

where X is your matrix of predictors and Y is your vector of targets. Then model.w contains the regression coefficients and model.b is the intercept. The higher-level functions (e.g. mv_regress) do not currently return all the different models trained in cross-validation but this could be easily implemented.

W.r.t. your second question, setting cfg.metric = 'none' returns the predictions ('y-hat') of the regression model on the test sets. If you use cross-validation, you do not get one vector of predictions but rather a cell array of vectors. The dimensions of this cell array are the number of repetitions and number of folds in your cross-validation settings (cfg.k and cfg.repeat) or the defaults if you haven't set anything.

If you only need the predictions on the training set (without cross-validation), I would use the test_ridge function, since functions such as mv_regress are mainly for doing the book-keeping around cross-validation. Hope this helps.

paulsowman commented 4 years ago

Hi Matthias, thanks so much for getting back so swiftly.

I did indeed try your suggested approach but it doesn’t seem to work for data with more than 2 dimensions in X whereas mv-regress does.

I have data which is samples (trials) features (channels) time (samples) and therefore X is 3 dimensions and can’t be transposed:

Transpose on ND array is not defined. Use PERMUTE instead.

Error in train_ridge (line 87)

    model.w = X' ((XX' + param.lambda * eye(N)) \ Y);   % dual

Is this the case that train_ridge doesn’t work over time? Or maybe I’ve misunderstood how this can be applied?

Kind regards, P

From: Matthias Treder notifications@github.com Reply to: treder/MVPA-Light reply@reply.github.com Date: Sunday, 26 July 2020 at 9:51 pm To: treder/MVPA-Light MVPA-Light@noreply.github.com Cc: paulsowman paulsowman@gmail.com, Author author@noreply.github.com Subject: Re: [treder/MVPA-Light] Regression coefficient (#16)

If you need regression coefficients, the easiest way is to use the train function directly. E.g. for ridge regression, this would do the job: param = mv_get_hyperparameter('ridge');  % get default hyperparameters for ridge regression model = train_ridge(param, X, Y) where X is your matrix of predictors and Y is your vector of targets. Then model.w contains the regression coefficients and model.b is the intercept. The higher-level functions (e.g. mv_regress) do not currently return all the different models trained in cross-validation but this could be easily implemented.

W.r.t. your second question, setting cfg.metric = 'none' returns the predictions ('y-hat') of the regression model on the test sets. If you use cross-validation, you do not get one vector of predictions but rather a cell array of vectors. The dimensions of this cell array are the number of repetitions and number of folds in your cross-validation settings (cfg.k and cfg.repeat) or the defaults if you haven't set anything.

If you only need the predictions on the training set (without cross-validation), I would use the test_ridge function, since functions such as mv_regress are mainly for doing the book-keeping around cross-validation. Hope this helps.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

treder commented 4 years ago

The workshare between the high-level functions and the train functions is as follows: each train function trains a single model (ie taking a [samples x features] data matrix), and the high-level functions takes care of looping over additional dimensions, doing all book keeping etc.

Do I understand correctly that you want to train a model for every time point and then obtain a different set of coefficients for every time point? If so, the easiest way is to use a for loop for the time dimension and repeatedly call train_ridge.

Alternatively, if you want to train a single model taking all the (channels time points ) combined as features, you can reshape your 3D array into 2D [samples x [channelstime points]) and then call train_ridge once.