treder / MVPA-Light

Matlab toolbox for classification and regression of multi-dimensional data
MIT License
67 stars 34 forks source link

MVPA-Light

Matlab toolbox for classification and regression of multi-dimensional data

Treder, M. S. (2020). MVPA-Light: A Classification and Regression Toolbox for Multi-Dimensional Data. Frontiers in Neuroscience, 14, 289. https://doi.org/10.3389/FNINS.2020.00289

HitCount

News

Table of contents

  1. Installation
  2. Overview
  3. Classification
  4. Regression
  5. Examples

Installation

In Linux/Mac, open a terminal and check out the repository by typing

git clone https://github.com/treder/MVPA-Light.git

In Windows, you might prefer to perform these steps using a Git client. Alternatively, you can simply download the toolbox. Git makes it easier to keep your local version up-do-date using git pull but it's not essential. Next, the toolbox needs to be added to Matlab's search path. In Matlab, add these lines to your startup.m file:

addpath('C:\git\MVPA-Light\startup')
startup_MVPA_Light

This assumes that the repository is located in C:\git\MVPA-Light, so change the path if necessary. The function startup_MVPA_Light adds the relevant folders and it avoids adding the .git subfolder.

If you do not want to use the startup.m file, you can directly add the MVPA-Light folder and its subfolders to the path using MATLAB's Path tool. The toolbox has been tested with Matlab R2012a and newer. There may be issues with earlier Matlab versions.

Overview

Multivariate pattern analysis (MVPA) is an umbrella term that covers multivariate methods such classification, regression and related approaches such as Representational Similarity Analysis. MVPA-Light provides functions for the classification and regression of neuroimaging data. It is meant to address the basic issues in MVPA (such as classification across time and generalization) in a fast and robust way while retaining a slim and readable codebase. For FieldTrip users, the use of the toolbox will be familiar: The first argument to the main functions is a configuration struct cfg that contains all the parameters. The toolbox does not require or use FieldTrip, but a FieldTrip integration is available (see tutorial).

Classifiers and regression models (jointly referred to as models) can be trained and tested by hand using the train_* and test_* functions. All classifiers and regression models are available in the model folder.

Training

In order to learn which features in the data discriminate between the experimental conditions or predict a response variable, a model needs to be exposed to training data. During training, the model's parameters are optimized (e.g. determining the betas in linear regression). All training functions start with train_ (e.g. train_lda).

Testing

Model performance is evaluated by applying the model to samples from the test data. The predictions of the model (i.e., predicted class labels by a classifier or predicted responses by a regression model) can then be compared to the true class labels / responses in order to quantify predictive performance. All test functions start with test_ (e.g. test_lda).

Cross-validation

To obtain a realistic estimate of model performance and control for overfitting, a model should be tested on an independent dataset that has not been used for training. In most neuroimaging experiments, there is only one dataset with a restricted number of trials. K-fold cross-validation makes efficient use of this data by splitting it into k different folds. In each iteration, one of the k folds is held out and used as test set, whereas all other folds are used for training. This is repeated until every fold has been used as test set once. See [Lemm2011] for a discussion of cross-validation and potential pitfalls. Cross-validation is implemented in all high-level functions in MVPA-Light, i.e. mv_classify, mv_regress, mv_classify_across_time, and mv_classify_timextime. It is controlled by the following parameters:

Hyperparameter

Hyperparameters are model parameters that have to be specified by the user. Examples are the kernel in SVM and the amount of regularization. These parameters are passed on to the train functions of the models. They can be controlled by setting the cfg.hyperparameter field before calling any of the high-level functions. To this end, initialize the field using cfg.hyperparameter = []. Then, add the desired parameters, e.g. cfg.hyperparameter.lambda = 0.5 for setting the regularization parameter or cfg.hyperparameter.kernel = 'polynomial' for defining a polynomial kernel for SVM. The hyperparameters for each model are specified in the documentation for each train_ function in the folder model.

Preprocessing

Preprocessing refers to operations applied to the data before training the classifier. In some cases, preprocessing operations such as oversampling, PCA, or Common Spatial Patterns (CSP) need to be performed as nested operations within a cross-validation analysis. In nested preprocessing, parameters are estimated on the train data and then applied to the test data. This avoids possible information flow from test set to the train set. A prepocessing pipeline can be added by setting the cfg.preprocess and cfg.preprocess_param fields. Currently implemented preprocessing functions are collected in the preprocess subfolder. See code snippet below and examples/understanding_preprocessing.m for examples.

Metrics and statistical significance

Classification and regression performance is typically measured using metrics such as accuracy and AUC for classification, or mean squared error for regression. These performance metrics do not come with p-values, however. To establish statistical significance, the function mv_statistics can be used. It implements binomial test and permutation testing (including a cluster permutation test). See understanding_statistics for code to get started.

Classification

Introduction to classification

A classifier is one of the main workhorses of MVPA. The input brain data, e.g. channels or voxels, is referred to as features, whereas the output data is a class label. Classification is the process of taking a feature vector as input and assigning it to a class. In MVPA-Light, class labels must be coded as 1 (for class 1), 2 (for class 2), 3 (for class 3), and so on.

Example: Assume that in an ERP-based memory paradigm, the goal is to predict whether an item is remembered or forgotten based on 128-channels EEG data. The target is single-trial ERPs at t=700 ms. Then, the feature vector for each trial consists of a 128-elements vector representing the activity at 700 ms for each electrode. Class labels are "remembered" (coded as 1) and "forgotten" (coded as 2). The exact order of coding the conditions does not affect the classification performance.

Classifiers for two classes

Multi-class classifiers (two or more classes)

Classification across time

Many neuroimaging datasets have a 3D structure (trials x channels x time). The start of the trial (t=0) typically corresponds to stimulus or response onset. Classification across time can help identify at which time point in a trial discriminative information shows up. To this end, classification is performed across trials, for each time point separately. This is implemented in the function mv_classify_across_time. It returns classification performance calculated for each time point in a trial. mv_plot_result can be used to plot the result.

Time x time generalization

Classification across time does not give insight into whether information is shared across different time points. For example, is the information that the classifier uses early in a trial (t=80 ms) the same that it uses later (t=300ms)? In time generalization, this question is answered by training the classifier at a certain time point t. The classifer is then tested at the same time point t but it is also tested at all other time points in the trial [King2014]. mv_classify_timextime implements time generalization. It returns a 2D matrix of classification performance, with performance calculated for each combination of training time point and testing time point. mv_plot_result can be used to plot the result.

Classification of multi-dimensional data

Neuroimaging datasets can be high dimensional. For instance, time-frequency data can have 4 (e.g. samples x channels x frequencies x times) or more dimensions. The function mv_classify deals with data of an arbitrary number and order of dimensions. It combines and generalizes the capabilities of the other high-level functions and allows for flexible tailoring of classification analysis including frequency x frequency generalization.

mv_classify also implements searchlight analysis, which investigates which features contribute most to classification performance. The answer to this question can be used to better interpret the data or to perform feature selection. If there is a spatial structure in the features (e.g. neighbouring eletrodes, neighbouring voxels), groups of features rather than single features can be considered. The result is a classification performance measure for each feature. If the features are e.g. channels, the result can be plotted as a topography. See getting_started_with_classification.m for code to get you started.

Classification performance metrics

Classifier output comes in form of decision values (=distances to the hyperplane for linear methods) or directly in form of class labels. However, rather than looking at the raw classifier output, one is often only interested in a performance metric that summarizes how well the classifier discriminates between the classes. The following metrics can be calculated by the function mv_calculate_performance:

There is usually no need to call mv_calculate_performance directly. By setting the cfg.metric field, the performance metric is calculated automatically in mv_classify_across_time, mv_classify_timextime and mv_classify. You can provide a cell array of metrics, e.g. cfg.metric = {'accuracy', 'confusion'} to calculate multiple metrics at once.

Regression

Introduction to regression

A regression model performs statistical predictions, similar to a classifier. The main difference between both types of models is that a classifier predicts class labels (a discrete variable) whereas a regression model predicts responses (a continuous variable). The function mv_regress deals with regression data of an arbitrary number and order of dimensions. It implements cross-validation, generalization, searchlight analysis, and so on. See getting_started_with_regression.m for code to get you started.

Example: We hypothesize that reaction time (RT) in a task is predicted by the magnitude of the BOLD response in brain areas. To investigate this, we perform a searchlight analysis: In every iteration, the single-trial BOLD response in a subset of contiguous voxels serves as features, whereas RT serves as response variable. We assume that this relationship is non-linear, so we use kernel ridge regression to predict RT.

Regression models

Regression performance metrics

Often, one is not interested in the concrete predictions of the regression model, but rather in a performance metric that quantifies how good the predictions are. The following metrics can be calculated by the function mv_calculate_performance:

There is usually no need to call mv_calculate_performance directly. By setting the cfg.metric field, the performance metric is calculated automatically in mv_regress. You can provide a cell array of metrics, e.g. cfg.metric = {'mae', 'mse'} to calculate multiple metrics at once.

Examples

This section gives some basic examples. More detailed examples and data can be found in the examples/ subfolder.

Getting started with classification

% Load ERP data (in /examples folder)
[dat,clabel] = load_example_data('epoched3');

% Perform classification for each time point
% Calculate AUC using 10-fold cross-validation with 2 repetitions and an LDA classifier
cfg = [];
cfg.metric     = 'auc';
cfg.cv         = 'kfold';
cfg.k          = 10;
cfg.repeat     = 2;
cfg.classifier = 'lda'; 
acc = mv_classify(cfg, dat.trial, clabel);

See examples/getting_started_with_classification.m for more details.

Getting started with regression


% Perform Ridge Regression with 10-fold cross-validation
% and 2 repetitions.
% Sets regularization hyperparameter lambda to 0.01 
% and calculates MSE as a performance metric.
cfg                = [];
cfg.metric         = 'mse';
cfg.cv             = 'kfold';
cfg.k              = 10;
cfg.repeat         = 2;
cfg.model          = 'ridge';
cfg.hyperparameter = [];
cfg.hyperparameter.lambda = 0.1;

mse = mv_regress(cfg, X, y);

See examples/getting_started_with_regression.m for more details.

Training and testing by hand

% Fetch the data from the 100th time sample
X = dat.trial(:,:,100);

% Get default hyperparameters for the classifier
param = mv_get_hyperparameter('lda');

% Train an LDA classifier
cf = train_lda(param, X, clabel);

% Test classifier on the same data and get the predicted labels
predlabel = test_lda(cf, X);

% Calculate classification accuracy
acc = mv_calculate_performance('accuracy','clabel',predlabel,clabel)

Many MVPA problems can be addressed with the high-level functions such as mv_classify and mv_regress. However, for more specialized analyses you may need low-level access to the train and test functions. examples/understanding_train_and_test_functions.m gives more details on how to use the train/test functions directly.

Preprocessing pipeline

cfg =  [];
cfg.preprocess = {'pca' 'average_samples'};
acc = mv_classify_across_time(cfg, dat.trial, clabel);

See examples/understanding_preprocessing.m for more details.

References

[Bla2011] Blankertz, B., Lemm, S., Treder, M., Haufe, S., & Müller, K. R. (2011). Single-trial analysis and classification of ERP components - A tutorial. NeuroImage, 56(2), 814–825.

[King2014] King, J.-R., & Dehaene, S. (2014). Characterizing the dynamics of mental representations: the temporal generalization method. Trends in Cognitive Sciences, 18(4), 203–210.

[Lemm2011] Lemm, S., Blankertz, B., Dickhaus, T., & Müller, K. R. (2011). Introduction to machine learning for brain imaging. NeuroImage, 56(2), 387–399.

[Tre2016] Treder, M. S., Porbadnigk, A. K., Shahbazi Avarvand, F., Müller, K.-R., & Blankertz, B. (2016). The LDA beamformer: Optimal estimation of ERP source time series using linear discriminant analysis. NeuroImage, 129, 279–291.