ENH: More features for ReceptiveField

larsoner commented 7 years ago

[x] support score method for use with GridSearchCV
[x] use custom Ridge solver for speed and multiple regularization types (i.e., Ross's Fourier-based code)
[x] change to positive lags for causal behavior #4550
[x] backward transform
[x] Add CUDA support for the fitting
[ ] if necessary, make tmin/tmax/sfreq logic consistent with Epochs class
[ ] forward/backward model tutorial?
[ ] plot method that is smart enough to at least deal with 1D (STA) and 2D (STRF) plotting, probably using NonUniformImage class rather than pcolormesh
[ ] decimation (decimate data before fitting or model after fitting?)
[ ] refactor with linear_regression_raw, XDawn (categorical data, #4940)

nbara commented 7 years ago

Hi Eric, would here be the place to suggest another addition to ReceptiveField?

I've been using it a bit lately, especially for backward models (ie stimulus reconstruction), and there's something that (I think) is missing that would be useful. You may have seen this paper by Haufe et al. (2014), where they show how to forward-transorm a backward model (for which the model weights are not easily interpretable).

Ed Lalor's mTRF toolbox does this (see fig4E here), but unless I'm mistaken there's no quick way to do this directly in MNE/sklearn with the current code ?

Ps: pinging @choldgraf as well since we've briefly discussed this on gitter

agramfort commented 7 years ago

it's just a matter of multiplying the coef_ by covariance of the design.

patterns = np.dot(X.T.dot(X), coef_)

maybe you can give it a try? @larsoner and me will offer feedback for sure.

nbara commented 7 years ago

I agree it doesn't seem that hard to implement. The only issue that I foresee might be to deal with the intercept before np.multi_dot([np.dot(X.T, X), coef_, np.inv(np.dot(y.T, y))]) (not sure)? But I wasn't sure that was something you'd find interesting/useful at all.

kingjr commented 7 years ago

you may also need to inverse Y:i.e. Sx W Sy^-1:

np.cov(X.T).dot(coef_.T.dot(np.linalg.pinv(np.cov((Y-Y.mean(0)).T)))).T

On 4 September 2017 at 09:09, Alexandre Gramfort notifications@github.com wrote:

it's just a matter of multiplying the coef_ by covariance of the design.

patterns = np.dot(X.T.dot(X), coef_)

maybe you can give it a try? @larsoner and me will offer feedback for sure.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/3884#issuecomment-326959703, or mute the thread https://github.com/notifications/unsubscribe-auth/AEp7DIU6Z21d2mFAiCpFHfqfMVD8xK9Lks5se_aXgaJpZM4LcIiW .

agramfort commented 7 years ago

thanks for giving it a try

kingjr commented 7 years ago

I agree it doesn't seem that hard to implement. The only issue that I foresee might be to deal with the intercept before np.multidot([np.dot(X.T, X), coef, np.inv(np.dot(y.T, y))]) (not sure)? But I wasn't sure that was something you'd find interesting/useful at all.

+1 for having that feature computed upon request.

choldgraf commented 7 years ago

@nbara would you like to give it a shot? Feel free to ping me for a review if you like.

Quick questions:

At the call to .fit, should we store the covariance matrices of X any y. (or maybe just store X and y and compute covar later?)
Should this be a lazy attribute? E.g., have an inverse_coefs_ attribute that does the matrix computation necessary and errors if the model hasn't been fit yet?

larsoner commented 7 years ago

At the call to .fit, should we store the covariance matrices of X any y. (or maybe just store X and y and compute covar later?)

We should store the X covariance, it is the time-consuming computation IIRC

Should this be a lazy attribute? E.g., have an inversecoefs attribute that does the matrix computation necessary and errors if the model hasn't been fit yet?

We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.

choldgraf commented 7 years ago

We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.

Yeah I think so too. Can we think of edge cases where we'd regret this? E.g. gigantic X matrices? Could be a silent memory issue if the object is suddenly storing an N x N matrix if N is really large...

kingjr commented 7 years ago

I would recommend to add the feature, and make it optional once we identify a clear usecase where memory is a problem.

On 4 September 2017 at 13:23, Chris Holdgraf notifications@github.com wrote:

We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.

Yeah I think so too. Can we think of edge cases where we'd regret this? E.g. gigantic X matrices? Could be a silent memory issue if the object is suddenly storing an N x N matrix if N is really large...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/3884#issuecomment-327005229, or mute the thread https://github.com/notifications/unsubscribe-auth/AEp7DLnk4yD7nN6a8Mws-lRDpSiS-B0Vks5sfDH7gaJpZM4LcIiW .

larsoner commented 7 years ago

E.g. gigantic X matrices?

The covariance (what we'd need to store) is just (n_feat, n_feat) and X is (n_samp, n_feat) with n_samp >> n_feat -- so if you're able to compute the fit at all you probably won't be hurt too much by storing the cov. It's still possible (especially if we start to be smart about mem usage with the computations) to hit memory but I agree with @kingjr we can deal with that when we hit it (and hope/assume YAGNI).

choldgraf commented 7 years ago

that's fine w/ me - just bringing it up as worth thinking about. My concern is more that users won't necessarily think that this will happen intuitively, so it's silently storing an N x N matrix in each model object...what if somebody doesn't know about this and is doing cross-validation with 100s of models, storing the object each time? I think it'd be fine to solve this with a good mention in the docstring for now tho.

nbara commented 7 years ago

We should store the X covariance, it is the time-consuming computation IIRC

Actually no, ...

We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.

... from what I can tell, if one chooses TimeDelayingRidge, then X is not reshaped into (n_times, n_epochs, n_feats, n_delays) in fit. However, the data needs to be in that shape before it's multiplied with the covariances.

I've just given a jab at this. Here's a gist with a function that does the job (not fully tested yet but seems to give sensible weights). https://gist.github.com/nbara/c0f2f05ad5db1d1710e591ce89d7b645

It's not lightweight at all since I've got to _delay_time_series and _reshape_for_est before the final dot product...

PS: Feel free to comment on the gist if you don't want to bloat this thread.

larsoner commented 7 years ago

from what I can tell, if one chooses TimeDelayingRidge, then X is not reshaped into (n_times, n_epochs, n_feats, n_delays) in fit. However, the data needs to be in that shape before it's multiplied with the covariances.

It probably does not need to be reshaped. See how predict etc. are done -- the equivalence between convolution and matrix multplication are used instead.

The best thing to do is probably to open a WIP PR, rather than a gist. Let's get something that works properly with unit tests, then worry about addressing the bottlenecks.

choldgraf commented 7 years ago

any movement on this? @nbara wanna give a shot at a PR?

nbara commented 7 years ago

Yes I'm happy to try, but it's likely that I'll be be needing some help at some point as I'm not sure a) how to make the code work for TimeDelayingRidge and b) what a proper unit test for this would be.

choldgraf commented 7 years ago

Happy to help out :-)

On Sat, Sep 9, 2017, 8:53 AM Nicolas Barascud notifications@github.com wrote:

Yes I'm happy to try, but it's likely that I'll be be needing some help at some point as I'm not sure a) how to make the code work for TimeDelayingRidge and b) what a proper unit test for this would be.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/3884#issuecomment-328285724, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwSHYScUvZDDx2HW3tU_08eYpQyAkf6ks5sgrSFgaJpZM4LcIiW .

nbara commented 7 years ago

Ok starting to work on it now. Are we sure we want to have this computed by default in fit? Because I don't think it would be terribly useful for forward models (why would one want to backward-transform forward weights?)

choldgraf commented 7 years ago

sounds good - I think it would need to be computed in fit...that's the only situation in which the object would have access to the X and the y variables, no?

choldgraf commented 7 years ago

I guess another option would be to have a special function for this, e.g.,:

mne.decoding.inverse_transform(coefs, X, y)

agramfort commented 7 years ago

what's the cost compared to estimation of coef_ ? if it's small let's compute it all the time.

nbara commented 7 years ago

I guess another option would be to have a special function for this, e.g.,:

mne.decoding.inverse_transform(coefs, X, y)

Yeah, something like this. Initially I imagined this becoming a class method, such as rf.inverse_coefs(X, y). In my mind this was not something that every user would need and use often, it's mostly a convenience for those who do backward modelling.

what's the cost compared to estimation of coef_ ? if it's small let's compute it all the time.

Depends how well I manage to code this :) More seriously, I need to work on this a bit more before I can answer this. In principle this could be pretty lightweight, but I need to take a closer look at TimeDelayingRidge to be sure.

kingjr commented 7 years ago

I don't have the exact usescases and constraints in mind, but just to be sure we are on the same line of thoughts, here is what we do for classic decoding analyses:

# don't compute patterns if not necessary
clf = Ridge()
clf.fit(X, y)

to get the patterns

clf = mne.decoding.LinearModel(Ridge())
clf.fit(X, y)
clf.patterns_

to have this in a pipeline

clf = make_pipeline(StandardScaler(), LinearModel(Ridge())
clf.fit(X, y)
mne.decoding.get_coef(clf, 'patterns_', inverse_transform=True)

And with mne wrapper compatibility e.g

clf = mne.decoding.SlidingEstimator(LinearModel(Ridge()))
clf.fit(X, y)
get_coef(clf, 'patterns_')

In other words, we classically don't compute the patterns (cov etc) at fitting; if users want to have the patterns, they use the LinearModel additional step, and the get_coef to have interpretable coefficients when multiple preprocessing steps are used.

It would be great to have an analogous API.

choldgraf commented 7 years ago

@kingjr so the idea is that the only thing that LinearModel does is expose a patterns_ attribute upon the call to fit? If the computation is non-trivial, then I'd be fine with doing that. If it is relatively trivial (in terms of time), then I think it's also fine to just compute it at fit...

nbara commented 7 years ago

@choldgraf Just ran a few quick tests. Using your tutorial data, I spend an extra ~1.3 seconds in fit to compute the inverse coefs on my macbook. (note that I only got it to work with estimator=Ridge() yet and it might take more/less with TimeDelayingRidge)

choldgraf commented 7 years ago

@nbara what size of inputs/outputs were you using? 1.3 seconds isn't too bad...

choldgraf commented 7 years ago

oops - just saw you said using my tutorial data. So this was the data in tutorials and not examples yeah?

nbara commented 7 years ago

These exactly. So X.shape was (7677, 128)

EDIT: Will do more tests with bigger arrays tomorrow in the morning.

choldgraf commented 7 years ago

hmmm - could you try the same comparison, but playing around with the size of each dimension? E.g., how long when you cut the samples in half? How long when you double the # of features?

larsoner commented 7 years ago

If the current code issues Ridge, the auto and cross covariances need to be recalculated, so it will add time. With TDR (or a custom Ridge) where these are stored I doubt it will add much time

agramfort commented 7 years ago

I am fine exposing a patterns_ attribute in term of API

choldgraf commented 7 years ago

@agramfort would patterns_ only be for the inverted coefficients? In that case, is the word "patterns" standard in the eeg/meg decoding world? Just wanna make sure we're being semantically consistent here...

agramfort commented 7 years ago

yes patterns is standard these days. fillters is semantically the same as coef_

choldgraf commented 7 years ago

so

coefs_ == filters_ == "receptive field"

and

patterns_ == "electrode weights"

?

I'm +1 on calling it patterns_ in this case. I have a slight preference for it just storing the covariance matrices and computing patterns_ lazily when people ask for it. Though I also think @kingjr 's approach is pretty clever (though it complicates the API a bit).

Can we think of any other situation where we'd want to do this set of operations? (storing covariance of X/y in the call to fit, and then using them to invert coefficients). If so, then I think that's a good case to make this another class that can wrap around ReceptiveField in the way JR describes above.

agramfort commented 7 years ago

whatever works but KISS

larsoner commented 7 years ago

@choldgraf @nbara I've been thinking about it and I think it makes more sense to use positive lags in the forward/encoding (stimulus->brain) models than negative lags. For example when X is speech and y is brain/EEG (which it is in both of our examples), the system h in the equation y = X * h needs to be active in positive/causal lags. But in both of our examples we do it the other way. Even the last STRF plot in this example is time-reversed relative to the one in the paper.

It's not too late to fix this as we haven't released. WDYT?

choldgraf commented 7 years ago

the way I think about it is in terms of experiment time. Where do those features in X exist relative to each timepoint in y. In an encoding model, they are negative since they happen at t - N timepoints relative to y.

I think the problem is that when we call them "time delays" (or lags), we're not saying "where relative to y does this feature of X exist". We're saying "what is the value of N" (in the equation above).

Personally, I think it's more intuitive to use negative numbers to represent "things that happen in the past", but if the field has standards around positive numbers, I'm not going to bikeshed this one. I can see a case for both.

I don't wanna get into a situation like EEG has where half of their plots have the y-axis flipped, that is just the dumbest thing ever. :-)

larsoner commented 7 years ago

if the field has standards around positive numbers

I think the standard might be to use positive numbers for causal behavior in forward models. Do you agree? I am no expert here. The Lalor paper seems to present their forward model that way, at least.

larsoner commented 7 years ago

It might be that by convention STRF plots are sometimes (probably half the time :) ) shown with negative lags, but in terms of the mathematics of the model being convolution of a signal (speech) with a kernel (STRF) to get an output (brain activation), the lags should be positive I think.

choldgraf commented 7 years ago

I am not really interested in what is technically correct, as much as what is intuitively correct to most people (the vast majority of which don't really know what convolution is mathematically).

Re: standards, here's a quick lit summary:

the chang nature paper used positive lags
this truly excellent nature communications paper used positive time lags
this seminal paper in STRF fitting in animals uses positive time lags
this cool paper from S.David uses positive time lags.

so in short, I think you're right, we should use positive time lags :-)

choldgraf commented 7 years ago

(FWIW, my initial intention was to find papers that used negative lags in their plots, but it was actually kind of difficult...including for my own paper...which tells me that I'm probably wrong in my intuition haha)

larsoner commented 7 years ago

Okay, I'll mark this one for 0.15 so we change the tmin/tmax def assuming nobody else has objections. I should be able to do it this week.

rkmaddox commented 7 years ago

+1 for positive lags. If you think of the TRF (no S) when a stimulus is a bunch of clicks (so, x = click train, y = EEG signal), then the brain response should be at positive lags, because a click causes the brain signal to wobble after the click happens.

I agree that it gets a little hairier with the STRF, but positive lags still make by far the most sense to me.

larsoner commented 7 years ago

(And sorry in advance to any early dev adopters who might have code that will break due to this change.)

choldgraf commented 7 years ago

@larsoner this was a good catch before 0.15...it would have been way more annoying to fix this after users had started incorporating it using the pip version. IMO when it's on master anything goes in terms of breaking changes :-)

nbara commented 7 years ago

+1 for positive lags as well

larsoner commented 7 years ago

@nbara before you put too much time/effort into updating an example in #4546 (which I just bothered you about) let me take care of the lags. Because yours should then be reversed / negative while plotting.

choldgraf commented 7 years ago

looks like we're on the same page...in that case: https://youtu.be/U3HMZsDioB8?t=4m17s

agramfort commented 7 years ago

good we don't release often enough :)

larsoner commented 6 years ago

@nbara are you still using ReceptiveField much? Any interest in adding the inverse transform bits? Sounds like we have converged on setting it to patterns_. I say we just compute it when doing fit since it should be a small cost compared to the other steps.

mne-tools / mne-python

ENH: More features for ReceptiveField #3884