mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.66k stars 1.31k forks source link

ENH: encoding models #2796

Closed choldgraf closed 7 years ago

choldgraf commented 8 years ago

Most of what I've been working on for my thesis has been so-called "encoding" models of electrode activity. This is quite similar to the decoding models already implemented in MNE, but where the output is the electrode activity, and inputs are time-lagged stimulus features. Is this something that would be useful to have in MNE?

I don't think it'll need to be as complicated as the decoding module because there isn't generalization across time etc, but I'm thinking of having a similar structure where you supply an epochs-like object and a tmin/tmax to pull specific times.

The output would be a list of regression models and the user could pull coefficients from these are necessary. It would rely heavily on scikit-learn throughout.

Let me know if this might be of interest and I can think up how it'd look in more detail if so.

dengemann commented 8 years ago

This would then be a more contextual API for some of our regression models. Estimation of multivariate timecourses based on external events. Sounds good to me. Would be good to think of the APIs for that.

On 19 Jan 2016, at 19:15, Chris Holdgraf notifications@github.com wrote:

Most of what I've been working on for my thesis has been so-called "encoding" models of electrode activity. This is quite similar to the decoding models already implemented in MNE, but where the output is the electrode activity, and inputs are time-lagged stimulus features. Is this something that would be useful to have in MNE?

I don't think it'll need to be as complicated as the decoding module because there isn't generalization across time etc, but I'm thinking of having a similar structure where you supply an epochs-like object and a tmin/tmax to pull specific times.

The output would be a list of regression models and the user could pull coefficients from these are necessary. It would rely heavily on scikit-learn throughout.

Let me know if this might be of interest and I can think up how it'd look in more detail if so.

— Reply to this email directly or view it on GitHub.

jona-sassenhagen commented 8 years ago

What would you say is missing from stats.regression.linear_regression and stats.regression.linear_regression_raw?

jasmainak commented 8 years ago

Do we have data on which we could demo these models?

jona-sassenhagen commented 8 years ago

I rrrrrreally need to make available a dataset to demonstrate continuous covariates with rERPs ...

choldgraf commented 8 years ago

@jona-sassenhagen other than my knowledge of their existence, I am not sure ;) I'll take a look now

larsoner commented 8 years ago

You mean like a STRF? That would indeed be great. It would be nice if we had that as a way to look at data.

@rkmaddox I assume this would be relevant to you...?

choldgraf commented 8 years ago

It looks like the linear_regression code is a more general OLS function, is that correct?

In this case, it'd be using Ridge Regression most likely (maybe offer support for L1 norm or Elastic Net, though that takes a lot longer to fit), and would incorporate cross-validation internally as well. This can be kind of tricky to do correctly, as most people would be inputting a timeseries which you can't just chunk up randomly. The way I imagine it, people would input epochs-shaped objects, and the cross validation iterator would default to doing a LabelShuffleSplit or LeavePLabelOut iteration, so that datapoints from a single epoch would always stay together in either the train or test splits.

Other stuff that would build on the linear_regression functions would be a function for creating time-lagged matrices. I was also thinking of incorporating specific viz for doing spatio / spectro-temporal receptive fields since it's a bit thing in systems neuro, but that may be too specific.

choldgraf commented 8 years ago

yep @Eric89GXL, my life in grad school has been STRFs :)

jona-sassenhagen commented 8 years ago

https://github.com/mne-tools/mne-python/pull/2641 is very similar to STRFs. I think it's a more general model.

Also it's bad (because I coded it).

Other stuff that would build on the linear_regression functions would be a function for creating time-lagged matrices.

stats.linear_regression_raw

n this case, it'd be using Ridge Regression most likely

stats.linear_regression_raw can take a custom solver.

larsoner commented 8 years ago

yep @Eric89GXL, my life in grad school has been STRFs :)

Mine was too :)

2641 is very similar to STRFs. I think it's a more general model.

Cool. @choldgraf want to close this and comment over there? (Can reopen here if somehow that isn't sufficient for what you're looking to do.)

choldgraf commented 8 years ago

yep, I'll take a look at #2641

jona-sassenhagen commented 8 years ago

First check if I'm correct. My grad school life was looking at object-before-subject sentences, I know very little about STRFs.

rkmaddox commented 8 years ago

I've implemented a similar thing, although it is specifically coded for a continuous input and continuous output: https://github.com/rkmaddox/trf. It does ridge regression, or you can have a smoothness prior. It is pretty inflexible, but is also extremely fast due to optimization and the potential to run with cuda.

I'd be happy for it to become a part of mne-python, but there are a number of interface issues that would need to be worked out first, particularly because there is a tradeoff between interface complexity and speed for some scenarios.

choldgraf commented 8 years ago

Just put some comments in PR #2641. The tl;dr is that it seems to me that the code is specifically for fitting a single linear model, and in the same way the "generalization across time" module is useful because it gives the machinery to automate the model fitting / predicting process, an explicit set of STRF or encoding code could be similarly useful. AKA, the stuff in stats.linear_regression as well as @rkmaddox is great for the fitting itself, but doesn't really address the general PITA that is doing your cross validation, feature making, model scaling, etc correctly. WDYT everybody?

jona-sassenhagen commented 8 years ago

IMO make an encoding module, but refactor, code share, DRY wrt. the stats.regression functions.

larsoner commented 8 years ago

Sounds reasonable to me. @choldgraf @jona-sassenhagen @rkmaddox I trust you all to figure out the best way to do it. I'll just come in at the end with a bunch of nitpicks and efficiency suggestions :)

jona-sassenhagen commented 8 years ago

Can't you just come in now and fix (=completely redo and finish off) my horrible PR? :( I'm lazy.

jasmainak commented 8 years ago

I like the idea of adding more encoding models. However, I ask again what data you would demo this on? We need corresponding examples.

From a design perspective, I prefer the CSP way of doing things rather than the GAT way of doing things. That is, keep the cross-validation iterator out of the estimator object.

larsoner commented 8 years ago

@jasmainak the code @rkmaddox linked to above actually has a demo.py file that uses random data. That could be a start.

jasmainak commented 8 years ago

Random data is fine with me. But is this something that MEG or EEG people would actually find useful? Or are we planning to support ECoG etc. in a bigger way? AFAIK, STRF results are not super-convincing with MEG.

rkmaddox commented 8 years ago

There see some great results with eeg only, though many of them lose the frequency component to be just the TRF. On phone right now, but check out Ed Lalor's recent work.

choldgraf commented 8 years ago

Well I wouldn't finish any contribution until I re-submit my STRF paper (or else my boss will destroy me), but I think I'll have to make some data public as a part of that submission. If that's the case then I can try to clean it up and make it available for MNE. The only challenge is that it'll be an ECoG dataset, which A. is a bit different from MNE typical examples, and B. might be challenging to anonymize...but if I used the speaker output rather than a microphone from the room, that might work.

larsoner commented 8 years ago

I think Simon's MEG lab still uses STRF-like techniques pretty extensively (and definitely has in the past), too.

jasmainak commented 8 years ago

yes I know of his work, he was in Paris recently. But, by his own admission (I might be misquoting), these papers take notoriously long to publish. Is there a "landmark paper" for MEG/EEG using STRF?

jona-sassenhagen commented 8 years ago

@choldgraf wouldn't it be cool if there was an ECoG example data set?

choldgraf commented 8 years ago

I took some time to re-organize my personal code into a format that might work here. It'd be something like this (forewarning that this code is incomplete and messy and the docstrings are messed up because I changed the code/args, but just for the general idea and to decide whether it provides enough usefulness over the linear_regression implementation):

def _scorer_corr(x, y):
    return np.corrcoef(x, y)[1, 0]

def _check_time(X, time):
    if isinstance(time, (int, float)):
        time = np.repeat(time, X.shape[0])
    elif time.shape[0] != X.shape[0]:
        raise ValueError('time lims and X must have the same shape')
    return time

def _build_design_matrix(X, y, sfreq, times, delays, tmin, tmax):
    if X.shape[-1] != y.shape[-1]:
        raise ValueError('X and y have different time dimension')
    if X.shape[0] != y.shape[0]:
        raise ValueError('X and y have different number of epochs')
    if tmin + np.min(delays) < np.min(times):
        raise ValueError('Data will be cut off w delays, use longer epochs')
    tmin = _check_time(X, tmin)
    tmax = _check_time(X, tmax)

    # Iterate through epochs with custom tmin/tmax if necessary
    X_out, y_out, lab_out = [[] for _ in range(3)]
    for i, (epX, epy, tmin, tmax) in enumerate(zip(X, y, tmin, tmax)):
        msk_time = _time_mask(times, tmin, tmax)

        epX_del = delay_timeseries(ep, sfreq, delays)
        epX_out = X_del[:, msk_time]
        epy_out = epy[:, msk_time]
        ep_lab = np.repeat(i + 1, epy_out.shape[-1])

        X_out.append(epX_out)
        y_out.append(epy_out)
        lab_out.append(ep_lab)
    return np.hstack(X_out), np.hstack(y_out), np.hstack(lab_out)

def _check_cv(X, labels, cv):
    cv = 5 if cv is None else cv
    if isinstance(cv, float):
        raise ValueError('cv must be an int or instance of sklearn cv')
    if isinstance(cv, int):
        if len(np.unique(labels)) == 1:
            # Assume single continuous data, do KFold
            cv = KFold(labels.shape[-1], cv)
        else:
            # Assume trials structure, do LabelShufleSplit
            cv = LabelShufleSplit(labels, n_iter=cv, n_test=.2)
    return cv

def delay_timeseries(ts, sfreq, delays):
    """Include time-lags for a timeseries.

    Parameters
    ----------
    ts: array, shape(n_feats, n_times)
        The timeseries to delay
    sfreq: int
        The sampling frequency of the series
    delays: list of floats
        The time (in seconds) of each delay
    Returns
    -------
    delayed: array, shape(n_feats*n_delays, n_times)
        The delayed matrix
    """
    delayed = []
    for delay in delays:
        roll_amount = int(delay * sfreq)
        rolled = np.roll(ts, roll_amount, axis=1)
        if delay < 0:
            rolled[:, roll_amount:0] = 0
        elif delay > 0:
            rolled[:, 0:roll_amount] = 0
        delayed.append(rolled)
    delayed = np.vstack(delayed)
    return delayed

def fit_strf(X, y, sfreq, delays, tmin=None, tmax=None, alphas=1,
             cv_outer=None, cv_inner=None, scorer_outer=_scorer_corr,
             X_names=None, scale_data=True):
    """Fit a STRF model.

    This implementation uses Ridge regression and scikit-learn. It creates time
    lags for the input matrix, then does cross validation to fit a STRF model.

    Parameters
    ----------
    X : array, shape (n_epochs, n_feats, n_times)
        The input data for the regression
    y : array, shape (n_times,)
        The output data for the regression
    sfreq : float
        The sampling frequency for the time dimension
    delays : array, shape (n_delays,)
        The delays to include when creating time lags. The input array X will
        end up having shape (n_feats * n_delays, n_times)
    tmin : float | array, shape (n_epochs,)
        The beginning time for each epoch. Optionally a different time for each
        epoch may be provided.
    tmax : float | array, shape (n_epochs,)
        The end time for each epoch. Optionally a different time for each
        epoch may be provided.
    alphas : float | array, shape (n_alphas)
        The alpha values to choose in the Ridge regression. If len(alphas) > 1,
        it corresponds to using an inner CV loop to choose alpha.
    cv_outer : instance of (KFold, LabelShuffleSplit)
        The cross validation object to use for the outer loop
    cv_inner : instance of same type as cv_outer
        The cross validation object to use for the inner loop,
        if len(alphas) > 1
    scorer_outer : function
        The scorer to use when evaluating on the held-out test set.
        It must accept two 1-d arrays as inputs, and output a scalar value.
    X_names : list of strings/ints/floats, shape (n_feats,) : None
        A list of values corresponding to input features. Useful for keeping
        track of the coefficients in the model after time lagging.
    scale_data : bool
        Whether or not to scale the data to 0 mean and unit var before fit.

    Outputs
    -------
    ests : list of sklearn estimators, length (len(cv_outer),)
        The estimator fit on each cv_outer loop. If len(alphas) > 1, then this
        will be the chosen model using GridSearch on each loop.
    scores: np.array, shape (len(cv_outer),)
        The scores on the held out test set on each loop of cv_outer
    X_names : list of strings, shape (n_feats * n_delays)
        A list of names for each coefficient in the model. It is of structure
        'name_timedelay'.
    """
    # Checks
    alphas = np.atleast_1d(alphas)
    if len(X_names) != X.shape[0]:
        raise ValueError('X_names and X.shape[0] must be the same size')
    time_mask = np.ones_like(y, dtype=bool) if time_mask is None else time_mask
    if time_mask.shape != y.shape:
        raise ValueError('time_mask and y must be same shape')
    if not X.shape[-1] == y.shape[-1] == time_mask.shape[-1]:
        raise ValueError('X and y and time mask must be same shape')

    # Delay X
    X, y, labels = _build_design_matrix(X, y, sfreq, delays, tmin, tmax)
    cv_outer = _check_cv(X, labels, cv)

    if scale_data:
        # Scale features
        print('Scaling features...')
        X_delayed = scale(X_delayed, axis=-1)
        y = scale(y, axis=-1)

    X_names = [str(i) for i in range(len(X))]if X_names is None else X_names
    X_names = ['{0}_{1}'.format(feat, delay)
               for delay in delays for feat in X_names]

    # Fit the models
    ests, scores = [[] for _ in range(2)]
    for i, (tr, tt) in enumerate(cv_outer):
        print('\nCV: {0}'.format(i))
        X_tr = X_delayed[:, tr].T
        X_tt = X_delayed[:, tt].T
        y_tr = y[tr]
        y_tt = y[tt]
        lab_tr = labels[tr]
        lab_tt = labels[tt]

        # Set up model and cross-validation
        mod = Ridge(fit_intercept=False, alpha=alphas)
        if len(alphas) > 1:
            if not isinstance(cv_outer, type(cv_inner)):
                raise ValueError('cv objects must be of same type')
            if isinstance(cv_outer, LabelShuffleSplit):
                cv_inner = LabelShuffleSplit(lab_tr, n_iter=5, test_size=.2)
            else:
                # Assume kfold
                cv_inner = KFold(len(y_tr), n_folds)
            mod = GridSearchCV(mod, param_grid={'alpha': alphas}, cv=cv_inner)

        # Fit model + make predictions
        mod.fit(X_tr, y_tr)
        pred = mod.predict(X_tt)
        scr = scorer_outer(pred, y_tt)

        if len(alphas) > 1:
            print('Alpha scores\n{}'.format(
                '\n'.join([str(s) for s in mod.grid_scores_])))
            mod = mod.best_estimator_
        else:
            print('Score: {0}'.format(scr))

        scores.append(scr)
        ests.append(mod)
    return ests, scores, np.array(X_names)
jasmainak commented 8 years ago

I agree with @jona-sassenhagen that an ECoG dataset would be cool :) For an example, clean results with real data would be the best.

choldgraf commented 8 years ago

and FWIW I haven't seen a "canonical" paper doing STRF stuff in EEG/MEG yet. There are a few that do it, but nothing that tries to do it in a systemic / methods-y way. (e.g., nothing like the fMRI predictive model paper from the Gallant lab, or the general STRF paper by Theunissen and Gallant

choldgraf commented 8 years ago

yeah I agree @jasmainak, it is purely a matter of legality, boss being open to it, and time. Once my paper is out the door then I can focus on non-paper things for a little while, including working that code into MNE and/or making datasets useful for other people :)

jasmainak commented 8 years ago

sounds like a great plan to me :)

jona-sassenhagen commented 8 years ago

@choldgraf would it help you if I made the part of the rERP code that builds time-lagged predictor matrices into a private function? If yes, what kind of input/output would you want?

choldgraf commented 8 years ago

I think it'd be useful as a private function if we'll build STRF stuff around it. I think an ideal delay function would let you supply your own delays, because you don't necessarily want all of the indices w/in a time window, and you also don't necessarily want to decimate time. So it's useful to be able to say "give me 20 ms time bins from 100ms to -400ms".

That said, it sounds like w/r/t STRF stuff, some stuff would be good as additions to the linear_regression stuff, and others might be better as a specific STRF codebase. I think we could co-opt the code that's there for doing TRFs in the name of DRY, though my inclination is to utilize scikit-learn as much as I can to get the flexibility of using their pipelines / lots of estimators / etc.

choldgraf commented 8 years ago

So what do you think would be the best way for me to contribute something here @jona-sassenhagen? Wait for your PR to finish and then build things on top of that? The main points that I see as different between the encoding stuff I had in mind and the linear_regression module would be:

  1. Incorporating prediction into the codebase (e.g., having an object with a fit and predict method, similar to the TimeGeneralization code)
  2. Extending the time-lag function to allow arbitrary time lags
  3. Incorporating cross-validation into the fitting, w/ ability to chunk epochs together in the CVs
  4. Returning lists of coefficients (one per outer CV) rather than a single set of coefficients.
  5. Maybe implementing some under-the-hood preprocessing stuff (e.g., the time decoding module uses StandardScaler I think.
  6. Allowing the user to pass scikit-learn model instances for continuous data (though as you say, maybe it's easier to just add that kind of functionality in the string_types section. AKA, let you supply 'skl_ridge' or 'skl_lasso' and it would define a solver that would return just the fitted coefficients rather than the full estimator)
  7. allowing input data that is continuous (linear_regression gets close but it doesn't give the flexibility of lots of solvers etc that linear_regression_raw does)
jona-sassenhagen commented 8 years ago

That said, it sounds like w/r/t STRF stuff, some stuff would be good as additions to the linear_regression stuff, and others might be better as a specific STRF codebase. I think we could co-opt the code that's there for doing TRFs in the name of DRY, though my inclination is to utilize scikit-learn as much as I can to get the flexibility of using their pipelines / lots of estimators / etc.

With the stuff I care about, the regression problems are easily so massive that scikit-learn becomes insufficient - and also because in the case of the truly interesting case of overlap, for now raw coefficients are by far the most interesting component.

I think it'd be useful as a private function if we'll build STRF stuff around it. I think an ideal delay function would let you supply your own delays, because you don't necessarily want all of the indices w/in a time window, and you also don't necessarily want to decimate time. So it's useful to be able to say "give me 20 ms time bins from 100ms to -400ms".

I think that should be covered by the current function. You can give tmin and tmax for various features, and it constructs a design matrix.

Wait for your PR to finish

No :( I can't promise I'll finish it any time soon. Also I'm a horrible coder.

Incorporating prediction into the codebase (e.g., having an object with a fit and predict method, similar to the TimeGeneralization code)

Under the hood, both fitting and predicting require the design matrix right? And fitting also means solving it? That's where you'd have overlap with linear_regression_raw, cause that's what it basically does.

Extending the time-lag function to allow arbitrary time lags

I'm not sure I understand this point. Currently, since we're primarily regressing on raw files, and since you can pass arbitrary tmin and tmax for each feature, there doesn't seem to be any boundary.

Incorporating cross-validation into the fitting, w/ ability to chunk epochs together in the CVs Returning lists of coefficients (one per outer CV) rather than a single set of coefficients.

Zero overlap here - after all, the solvers and predictors don't know where their input came from.

Maybe implementing some under-the-hood preprocessing stuff (e.g., the time decoding module uses StandardScaler I think.

Yeah we'd have to settle on an API for the solver.

Allowing the user to pass scikit-learn model instances for continuous data (though as you say, maybe it's easier to just add that kind of functionality in the string_types section. AKA, let you supply 'skl_ridge' or 'skl_lasso' and it would define a solver that would return just the fitted coefficients rather than the full estimator)

Currently, the solver keyword allows a function arg.

allowing input data that is continuous (linear_regression gets close but it doesn't give the flexibility of lots of solvers etc that linear_regression_raw does)

So we'd have to extend linear_regression (which is Tal Linzen's code).

I think it would be best if you started looking at the two functions in stats.regression as well as maybe XDawn to see what you can use. If you decide there are parts you can use, we'd factor them out.

kingjr commented 8 years ago

You go away from your laptop for 2 hours, and there's a crazy long discussion to read...

I rrrrrreally need to make available a dataset to demonstrate continuous covariates with rERPs ...

@jona-sassenhagen How about you make an example with simulated data? :]

From a design perspective, I prefer the CSP way of doing things rather than the GAT way of doing things. That is, keep the cross-validation iterator out of the estimator object.

@jasmainak +1, this would makes the sklearn integration easier (because sklearn also keep everything out the cross val). Plus, I don't think there's a way to generalize encoding models (unlike decoding), which was one of the motivations to keep the CV inside the GAT object.

I'm actually unclear on whether the cross val is systematically necessary in encoding models. It isn't used for linear_regression_raw right?

Allowing the user to pass scikit-learn model instances for continuous data (though as you say, maybe it's easier to just add that kind of functionality in the string_types section. AKA, let you supply 'skl_ridge' or 'skl_lasso' and it would define a solver that would return just the fitted coefficients rather than the full estimator)

@choldgraf Why is this easier than passing the object? We'll have to continuously support for sklearn EHN.?

jona-sassenhagen commented 8 years ago

I'm actually unclear on whether the cross val is systematically necessary in encoding models. It isn't used for linear_regression_raw right?

No - I guess you could wrap it in a CV loop, but right now, it's supposed to give uniased straight coefs.

@jona-sassenhagen How about you make an example with simulated data? :]

It would also be good to have either a Brainvision EEG or an EEGLAB EEG file in general. Which all of my files are. And one with RTs.

choldgraf commented 8 years ago

thanks for comments both @kingjr and @jona-sassenhagen ...I'll get back in a few, got some meetings right now. I think in particular @kingjr will have useful insight here, as he's got more experience trying to integrate w/ sklearn. Perhaps it is better to keep things separate, lemme look through the CSP module and I'll give me thoughts in a bit.

agramfort commented 8 years ago

Ed Lalor has a matlab toolbox with real demo data. That could be a start. I am sure Ed would be fine if MNE uses his data if we cite his papers

choldgraf commented 8 years ago

With the stuff I care about, the regression problems are easily so massive that scikit-learn becomes insufficient

In what way does scikit-learn become insufficient? It just takes too long to run the code, or takes too much memory? In either case, have you played around with either using different solvers for the regression problem (e.g., see the "solver" parameter here. That can change efficiency drastically. Alternatively there are other regression algorithms that go way faster (e.g., Stochastic Gradient Descent which works much faster for higher dimensional data).

and also because in the case of the truly interesting case of overlap, for now raw coefficients are by far the most interesting component.

What do you mean most interesting? And what do you mean by "overlap"? In terms of the coefficients, you'd get something very similar from a sklearn (or from whatever fit_encoding_model function I've been imagining here)

I think that should be covered by the current function. You can give tmin and tmax for various features, and it constructs a design matrix.

It looks like right now it just says "give me a tmin and a tmax, and I will create all lags within that window". Not "give me a tmin and a tmax, and the step of each lag". Maybe that's a simple contribution I could make either way.

No :( I can't promise I'll finish it any time soon. Also I'm a horrible coder.

Hmm, so I'm not sure what's the best course of action then. It'll be a while before I get to this either way.

from @jasmainak and @kingjr:

From a design perspective, I prefer the CSP way of doing things rather than the GAT way of doing things. That is, keep the cross-validation iterator out of the estimator object. I'm actually unclear on whether the cross val is systematically necessary in encoding models. It isn't used for linear_regression_raw right?

Yeah, you're right that it's not strictly necessary, though I think that in general it's a good idea to have. If you do no cross-validation, then an encoding model is basically just a matter of feature extraction + time lags + regression, which alltogether is pretty easy to do just with sklearn (or with the linear_regression module). That said, the cross validation etc can be tricky to do, and at least for me it was helpful to have a high-level function for handling the model fitting under the hood.

The feeling that I'm getting from you guys is that an "encoding model" set of code isn't worth adding above the linear_regression code that's already there. Is that correct? I won't spend the time making a PR if people think it's not enough added value to do so.

kingjr commented 8 years ago

The feeling that I'm getting from you guys is that an "encoding model" set of code isn't worth adding above the linear_regression code that's already there.

I personally see two options: either we enhance @jona-sassenhagen linear_regression or we adapt something like @trachelr's CSP patterns to do encoding in a new encoding class.

IMO we should ultimately maximize the homogeneity of the stats API within the package. Users will learn more quickly that all of these analyses are based on similar concepts, and some may become more inclined to make additional PRs if we follow a systematic design. ATM, the MNE-Python stats (including the ANOVA, the GAT etc) feel a bit like a bunch of disconnected tricks.

choldgraf commented 8 years ago

I agree. That's why I was thinking of modeling this after GAT, eg give it a fit and predict method etc. On Jan 20, 2016 7:54 PM, "Jean-Rémi KING" notifications@github.com wrote:

The feeling that I'm getting from you guys is that an "encoding model" set of code isn't worth adding above the linear_regression code that's already there.

I personally see two options: either we enhance @jona-sassenhagen https://github.com/jona-sassenhagen linear_regression or we adapt something like @trachelr https://github.com/trachelr's CSP patterns to do encoding in a new encoding class.

IMO we should ultimately maximize the homogeneity of the stats API within the package. Users will learn more quickly that all of these analyses are based on similar concepts, and some may become more inclined to make additional PRs if we follow a systematic design. ATM, the MNE-Python stats (including the ANOVA, the GAT etc) feel a bit like a bunch of disconnected tricks.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2796#issuecomment-173443888 .

jona-sassenhagen commented 8 years ago

In what way does scikit-learn become insufficient? It just takes too long to run the code, or takes too much memory?

Both. Mostly because fitting/getting the coefs takes so long.

What do you mean most interesting? And what do you mean by "overlap"? In terms of the coefficients, you'd get something very similar from a sklearn (or from whatever fit_encoding_model function I've been imagining here)

Overlap means: the function is intended for multiple regressions, with multiple factors simultaneously (e.g. frequency of prime word plus frequency of target word plus response time at target plus ...) in a setting where it can be assumed the responses to these various predictors overlap. And for such cases, it's not quite clear what temporal dependencies to assume, so linear estimation should be privileged.

It looks like right now it just says "give me a tmin and a tmax, and I will create all lags within that window". Not "give me a tmin and a tmax, and the step of each lag". Maybe that's a simple contribution I could make either way.

There's the decim parameter too, woulnd't that be equivalent?

Hmm, so I'm not sure what's the best course of action then. It'll be a while before I get to this either way.

Start with the code, and IMO a Pythonesque fit/predict methods on an encoder model is super fine. But re-use and refactor as much of Tal's linear_regression and my linear_regression_raw code and hopefully, API as possible.

jona-sassenhagen commented 8 years ago

You should really reopen this issue IMO :)

What I just think would be very inefficient would be having very similar code in 3 functions (the regression functions in stats.regression, XDawn (which would be my job to refactor), and the dedicated encoder. There's already a lot of questions why there are both linear_regression and linear_regression_raw.

choldgraf commented 8 years ago

Both. Mostly because fitting/getting the coefs takes so long.

Gotcha - I've run into this problem too but it helped a lot to use different fitting methods or regression models. What's the dimensionality of the data you usually use? Mine tend to be in the ballpark of ~400 x 10000 or something in there. I've scaled that up by a factor of 10 on the number of datapoints and it does start to slow down, but doing things like Stochastic GD helped a lot.

Overlap means: the function is intended for multiple regressions, with multiple factors simultaneously (e.g. frequency of prime word plus frequency of target word plus response time at target plus ...) in a setting where it can be assumed the responses to these various predictors overlap. And for such cases, it's not quite clear what temporal dependencies to assume, so linear estimation should be privileged.

Gotcha - I think we're on the same page there. If you're fitting a STRF model, then you're fitting hundreds of coefficicients, all of which are completely overlapping in time and many of which have lots of relationships between one another, thus the heavy reliance on cross-validation and regularization.

There's the decim parameter too, woulnd't that be equivalent?

But the decim parameter decimates the data, no? To me that poses 2 concerns:

  1. I use time lags of 20ms steps, but I want to keep my data sampled at 10ms bins (or whatever). Decim would change that.
  2. It looks like the decimation is just done on the data itself, not by calling a low_pass_filter or anything. Wouldn't that introduce artifacts?

Start with the code, and IMO a Pythonesque fit/predict methods on an encoder model is super fine. But re-use and refactor as much of Tal's linear_regression and my linear_regression_raw code and hopefully, API as possible.

OK, I will try to leverage the code that's there, or if possible let people supply sklearn estimators somehow. @kingjr thoughts on how to do that while still supporting the linear_regression code in the stats module as well?

You should really reopen this issue IMO :)

hehe, fair enough :)

What I just think would be very inefficient would be having very similar code in 3 functions (the regression functions in stats.regression, XDawn (which would be my job to refactor), and the dedicated encoder. There's already a lot of questions why there are both linear_regression and linear_regression_raw.

That is a good point. I'm not sure what is going on inside the XDawn module, but I'll have a look later to see what it does. Most of my code is already written (since I'd already done it for myself), so maybe the thing to do is just to make a PR and see what you guys think. I'll assume that the linear_regression code won't change too much with your PR @jona-sassenhagen ?

jona-sassenhagen commented 8 years ago

Gotcha - I've run into this problem too but it helped a lot to use different fitting methods or regression models. What's the dimensionality of the data you usually use? Mine tend to be in the ballpark of ~400 x 10000 or something in there

(n samples, so for 30 min of experiment downsampled to 100 Hz, 180.000) * (n predictors, so in the tens * time lags, in the 100s) for the predictor matrix. The data itself isn't the problem, because dot products are really fast. But you see that for example for a simple half-hour experiment at 100 Hz, you're looking at a 180.000 * 1.500 predictor matrix.

Gotcha - I think we're on the same page there. If you're fitting a STRF model, then you're fitting hundreds of coefficicients, all of which are completely overlapping in time and many of which have lots of relationships between one another, thus the heavy reliance on cross-validation and regularization.

How bad is the ratio of samples to collinearity really though?

I use time lags of 20ms steps, but I want to keep my data sampled at 10ms bins (or whatever). Decim would change that.

Well if you don't have a predictor for a data point, the effect is the same as dropping it by decimation right?

It looks like the decimation is just done on the data itself, not by calling a low_pass_filter or anything. Wouldn't that introduce artifacts?

Ideally, you'd low pass or the data first, or just downsample. You can also low pass the results though. The decim param is mostly there because otherwise you quickly get ungodly matrices because sampling rate shows up in both dimensions of the matrix (e.g., 10 predictors, 30 min sampled at 1000 Hz, 1 sec time window: 10.000 * 1.800.000).

OK, I will try to leverage the code that's there, or if possible let people supply sklearn estimators somehow. @kingjr thoughts on how to do that while still supporting the linear_regression code in the stats module as well?

I guess API can be very different - if people expect STRF to behave in a certain way, that's an API MNE should provide. But actual functionality should be shared IMO.

maybe the thing to do is just to make a PR and see what you guys think

I guess so - then we can see if it would make sense to refactor. Maybe the commonalities are much smaller than I had expected. Either way, I'm interested in the code.

choldgraf commented 8 years ago

(n samples, so for 30 min of experiment downsampled to 100 Hz, 180.000) * (n predictors, so in the tens * time lags, in the 100s) for the predictor matrix. The data itself isn't the problem, because dot products are really fast. But you see that for example for a simple half-hour experiment at 100 Hz, you're looking at a 180.000 * 1.500 predictor matrix.

Yeah, I usually try to downsample to 50Hz for the same reasons (I'm usually fitting models on high-frequency amplitude traces, so not such a big deal to downsample). That's why I ended up building regression code that assumed an epochs-like structure. That way as long as you can define event starts / stops, then you remove any timepoints where nothing is happening. Though probably a lot of experiments don't have clear "time on / time off" points for events.

How bad is the ratio of samples to collinearity really though?

I'm not sure I can give a quantifiable answer to that question...it's true that we've got a lot of samples which helps, but the STRFs look a lot noisier without any kind of regularization. I usually just choose a regularization parameter with some held out data and then use that for the rest of the fitting.

Well if you don't have a predictor for a data point, the effect is the same as dropping it by decimation right?

Yeah, though you'd be losing data for the model fitting

Ideally, you'd low pass or the data first, or just downsample. You can also low pass the results though. The decim param is mostly there because otherwise you quickly get ungodly matrices because sampling rate shows up in both dimensions of the matrix (e.g., 10 predictors, 30 min sampled at 1000 Hz, 1 sec time window: 10.000 * 1.800.000).

I see what you mean. That's why I like to allow for arbitrary time lags. E.g., if you say give me time lags for all indices between tmin/tmax, then that will grow larger as the sampling rate goes higher. If you say "give me time lags between tmin/tmax, each spaced at 20ms", then the number of lags stays the same regardless of the sampling rate.

I guess so - then we can see if it would make sense to refactor. Maybe the commonalities are much smaller than I had expected. Either way, I'm interested in the code

Sounds good, when I get some time I'll write in an encoding module

choldgraf commented 8 years ago

Oops, I never actually reopened this. Either way, just an update, I'm keeping my current version of EncodingModels in my repository here: https://github.com/choldgraf/ecogtools/blob/master/strf.py

It's there because I want to freeze the code when I submit my paper. Once that's done and if people are interested, I can port it over to MNE. I tried to follow a sklearn-style as much as possible. Comments are totally welcome whether it tries to do too much (the EncodingModel class would be the main contribution, not the MPS or TORC stuff).

jona-sassenhagen commented 8 years ago

+1 :D

choldgraf commented 8 years ago

Hey all - what's the status on the encoding model module for MNE? Are new changes merged yet? I'm looking through my STRF code (and the STRF literature a bit), and I'm wondering if it's worth having a separate module or repository specifically for STRFs, as opposed to the general linear modeling module.

Looking through the code now, it looks like the linear modeling stuff is rERP focused. So I suppose the question is whether this is sufficiently different from the task of estimating continuous receptive field weights from a continuous output signal. Any thoughts on that?

I am still more than happy to merge the stuff in ecogtools but it seemed like there may be too many cooks in the kitchen before.

For reference, my latest encoding model code is here. It tries to mimic the fit and predict style of the other machine learning-based methods in MNE.

Thoughts? @jona-sassenhagen @Eric89GXL @kingjr?

jona-sassenhagen commented 8 years ago

Have you checked if you get the same results from the rERP code as from yours? In principle, it should be, right? I use it for continuous (dense) predictors actually.

Your code is great though and since I'm a lazy ass, maybe it'd just be best to try and get yours merged.