Open larsoner opened 7 years ago
Hi Eric, would here be the place to suggest another addition to ReceptiveField
?
I've been using it a bit lately, especially for backward models (ie stimulus reconstruction), and there's something that (I think) is missing that would be useful. You may have seen this paper by Haufe et al. (2014), where they show how to forward-transorm a backward model (for which the model weights are not easily interpretable).
Ed Lalor's mTRF toolbox does this (see fig4E here), but unless I'm mistaken there's no quick way to do this directly in MNE/sklearn with the current code ?
Ps: pinging @choldgraf as well since we've briefly discussed this on gitter
it's just a matter of multiplying the coef_ by covariance of the design.
patterns = np.dot(X.T.dot(X), coef_)
maybe you can give it a try? @larsoner and me will offer feedback for sure.
I agree it doesn't seem that hard to implement. The only issue that I foresee might be to deal with the intercept before np.multi_dot([np.dot(X.T, X), coef_, np.inv(np.dot(y.T, y))])
(not sure)? But I wasn't sure that was something you'd find interesting/useful at all.
you may also need to inverse Y:i.e. Sx W Sy^-1:
np.cov(X.T).dot(coef_.T.dot(np.linalg.pinv(np.cov((Y-Y.mean(0)).T)))).T
On 4 September 2017 at 09:09, Alexandre Gramfort notifications@github.com wrote:
it's just a matter of multiplying the coef_ by covariance of the design.
patterns = np.dot(X.T.dot(X), coef_)
maybe you can give it a try? @larsoner and me will offer feedback for sure.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/3884#issuecomment-326959703, or mute the thread https://github.com/notifications/unsubscribe-auth/AEp7DIU6Z21d2mFAiCpFHfqfMVD8xK9Lks5se_aXgaJpZM4LcIiW .
thanks for giving it a try
I agree it doesn't seem that hard to implement. The only issue that I foresee might be to deal with the intercept before np.multidot([np.dot(X.T, X), coef, np.inv(np.dot(y.T, y))]) (not sure)? But I wasn't sure that was something you'd find interesting/useful at all.
+1 for having that feature computed upon request.
@nbara would you like to give it a shot? Feel free to ping me for a review if you like.
Quick questions:
At the call to .fit
, should we store the covariance matrices of X
any y
. (or maybe just store X and y and compute covar later?)
Should this be a lazy attribute? E.g., have an inverse_coefs_
attribute that does the matrix computation necessary and errors if the model hasn't been fit yet?
At the call to .fit, should we store the covariance matrices of X any y. (or maybe just store X and y and compute covar later?)
We should store the X
covariance, it is the time-consuming computation IIRC
Should this be a lazy attribute? E.g., have an inversecoefs attribute that does the matrix computation necessary and errors if the model hasn't been fit yet?
We should just compute it when doing fit(...)
. It will be a very small cost relative to the fitting operation I think.
We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.
Yeah I think so too. Can we think of edge cases where we'd regret this? E.g. gigantic X matrices? Could be a silent memory issue if the object is suddenly storing an N x N
matrix if N is really large...
I would recommend to add the feature, and make it optional once we identify a clear usecase where memory is a problem.
On 4 September 2017 at 13:23, Chris Holdgraf notifications@github.com wrote:
We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.
Yeah I think so too. Can we think of edge cases where we'd regret this? E.g. gigantic X matrices? Could be a silent memory issue if the object is suddenly storing an N x N matrix if N is really large...
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/3884#issuecomment-327005229, or mute the thread https://github.com/notifications/unsubscribe-auth/AEp7DLnk4yD7nN6a8Mws-lRDpSiS-B0Vks5sfDH7gaJpZM4LcIiW .
E.g. gigantic X matrices?
The covariance (what we'd need to store) is just (n_feat, n_feat)
and X
is (n_samp, n_feat)
with n_samp >> n_feat
-- so if you're able to compute the fit at all you probably won't be hurt too much by storing the cov. It's still possible (especially if we start to be smart about mem usage with the computations) to hit memory but I agree with @kingjr we can deal with that when we hit it (and hope/assume YAGNI).
that's fine w/ me - just bringing it up as worth thinking about. My concern is more that users won't necessarily think that this will happen intuitively, so it's silently storing an N x N matrix in each model object...what if somebody doesn't know about this and is doing cross-validation with 100s of models, storing the object each time? I think it'd be fine to solve this with a good mention in the docstring for now tho.
We should store the X covariance, it is the time-consuming computation IIRC
Actually no, ...
We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.
... from what I can tell, if one chooses TimeDelayingRidge
, then X is not reshaped into (n_times, n_epochs, n_feats, n_delays)
in fit
. However, the data needs to be in that shape before it's multiplied with the covariances.
I've just given a jab at this. Here's a gist with a function that does the job (not fully tested yet but seems to give sensible weights). https://gist.github.com/nbara/c0f2f05ad5db1d1710e591ce89d7b645
It's not lightweight at all since I've got to _delay_time_series
and _reshape_for_est
before the final dot product...
PS: Feel free to comment on the gist if you don't want to bloat this thread.
from what I can tell, if one chooses TimeDelayingRidge, then X is not reshaped into (n_times, n_epochs, n_feats, n_delays) in fit. However, the data needs to be in that shape before it's multiplied with the covariances.
It probably does not need to be reshaped. See how predict
etc. are done -- the equivalence between convolution and matrix multplication are used instead.
The best thing to do is probably to open a WIP PR, rather than a gist. Let's get something that works properly with unit tests, then worry about addressing the bottlenecks.
any movement on this? @nbara wanna give a shot at a PR?
Yes I'm happy to try, but it's likely that I'll be be needing some help at some point as I'm not sure a) how to make the code work for TimeDelayingRidge and b) what a proper unit test for this would be.
Happy to help out :-)
On Sat, Sep 9, 2017, 8:53 AM Nicolas Barascud notifications@github.com wrote:
Yes I'm happy to try, but it's likely that I'll be be needing some help at some point as I'm not sure a) how to make the code work for TimeDelayingRidge and b) what a proper unit test for this would be.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/3884#issuecomment-328285724, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwSHYScUvZDDx2HW3tU_08eYpQyAkf6ks5sgrSFgaJpZM4LcIiW .
Ok starting to work on it now. Are we sure we want to have this computed by default in fit
? Because I don't think it would be terribly useful for forward models (why would one want to backward-transform forward weights?)
sounds good - I think it would need to be computed in fit
...that's the only situation in which the object would have access to the X
and the y
variables, no?
I guess another option would be to have a special function for this, e.g.,:
mne.decoding.inverse_transform(coefs, X, y)
what's the cost compared to estimation of coef_ ? if it's small let's compute it all the time.
I guess another option would be to have a special function for this, e.g.,:
mne.decoding.inverse_transform(coefs, X, y)
Yeah, something like this. Initially I imagined this becoming a class method, such as rf.inverse_coefs(X, y)
. In my mind this was not something that every user would need and use often, it's mostly a convenience for those who do backward modelling.
what's the cost compared to estimation of coef_ ? if it's small let's compute it all the time.
Depends how well I manage to code this :) More seriously, I need to work on this a bit more before I can answer this. In principle this could be pretty lightweight, but I need to take a closer look at TimeDelayingRidge to be sure.
I don't have the exact usescases and constraints in mind, but just to be sure we are on the same line of thoughts, here is what we do for classic decoding analyses:
# don't compute patterns if not necessary
clf = Ridge()
clf.fit(X, y)
to get the patterns
clf = mne.decoding.LinearModel(Ridge())
clf.fit(X, y)
clf.patterns_
to have this in a pipeline
clf = make_pipeline(StandardScaler(), LinearModel(Ridge())
clf.fit(X, y)
mne.decoding.get_coef(clf, 'patterns_', inverse_transform=True)
And with mne wrapper compatibility e.g
clf = mne.decoding.SlidingEstimator(LinearModel(Ridge()))
clf.fit(X, y)
get_coef(clf, 'patterns_')
In other words, we classically don't compute the patterns (cov etc) at fitting; if users want to have the patterns, they use the LinearModel
additional step, and the get_coef
to have interpretable coefficients when multiple preprocessing steps are used.
It would be great to have an analogous API.
@kingjr so the idea is that the only thing that LinearModel
does is expose a patterns_
attribute upon the call to fit
? If the computation is non-trivial, then I'd be fine with doing that. If it is relatively trivial (in terms of time), then I think it's also fine to just compute it at fit
...
@choldgraf Just ran a few quick tests. Using your tutorial data, I spend an extra ~1.3 seconds in fit
to compute the inverse coefs on my macbook. (note that I only got it to work with estimator=Ridge()
yet and it might take more/less with TimeDelayingRidge)
@nbara what size of inputs/outputs were you using? 1.3 seconds isn't too bad...
oops - just saw you said using my tutorial data. So this was the data in tutorials
and not examples
yeah?
These exactly. So X.shape was (7677, 128)
EDIT: Will do more tests with bigger arrays tomorrow in the morning.
hmmm - could you try the same comparison, but playing around with the size of each dimension? E.g., how long when you cut the samples in half? How long when you double the # of features?
If the current code issues Ridge, the auto and cross covariances need to be recalculated, so it will add time. With TDR (or a custom Ridge) where these are stored I doubt it will add much time
I am fine exposing a patterns_ attribute in term of API
@agramfort would patterns_ only be for the inverted coefficients? In that case, is the word "patterns" standard in the eeg/meg decoding world? Just wanna make sure we're being semantically consistent here...
yes patterns is standard these days. fillters is semantically the same as coef_
so
coefs_
== filters_
== "receptive field"
and
patterns_
== "electrode weights"
?
I'm +1 on calling it patterns_
in this case. I have a slight preference for it just storing the covariance matrices and computing patterns_
lazily when people ask for it. Though I also think @kingjr 's approach is pretty clever (though it complicates the API a bit).
Can we think of any other situation where we'd want to do this set of operations? (storing covariance of X/y in the call to fit, and then using them to invert coefficients). If so, then I think that's a good case to make this another class that can wrap around ReceptiveField
in the way JR describes above.
whatever works but KISS
@choldgraf @nbara I've been thinking about it and I think it makes more sense to use positive lags in the forward/encoding (stimulus->brain) models than negative lags. For example when X is speech and y is brain/EEG (which it is in both of our examples), the system h
in the equation y = X * h
needs to be active in positive/causal lags. But in both of our examples we do it the other way. Even the last STRF plot in this example is time-reversed relative to the one in the paper.
It's not too late to fix this as we haven't released. WDYT?
the way I think about it is in terms of experiment time. Where do those features in X
exist relative to each timepoint in y
. In an encoding model, they are negative since they happen at t - N
timepoints relative to y
.
I think the problem is that when we call them "time delays" (or lags), we're not saying "where relative to y
does this feature of X exist". We're saying "what is the value of N
" (in the equation above).
Personally, I think it's more intuitive to use negative numbers to represent "things that happen in the past", but if the field has standards around positive numbers, I'm not going to bikeshed this one. I can see a case for both.
I don't wanna get into a situation like EEG has where half of their plots have the y-axis flipped, that is just the dumbest thing ever. :-)
if the field has standards around positive numbers
I think the standard might be to use positive numbers for causal behavior in forward models. Do you agree? I am no expert here. The Lalor paper seems to present their forward model that way, at least.
It might be that by convention STRF plots are sometimes (probably half the time :) ) shown with negative lags, but in terms of the mathematics of the model being convolution of a signal (speech) with a kernel (STRF) to get an output (brain activation), the lags should be positive I think.
I am not really interested in what is technically correct, as much as what is intuitively correct to most people (the vast majority of which don't really know what convolution is mathematically).
Re: standards, here's a quick lit summary:
so in short, I think you're right, we should use positive time lags :-)
(FWIW, my initial intention was to find papers that used negative lags in their plots, but it was actually kind of difficult...including for my own paper...which tells me that I'm probably wrong in my intuition haha)
Okay, I'll mark this one for 0.15 so we change the tmin/tmax def assuming nobody else has objections. I should be able to do it this week.
+1 for positive lags. If you think of the TRF (no S) when a stimulus is a bunch of clicks (so, x = click train, y = EEG signal), then the brain response should be at positive lags, because a click causes the brain signal to wobble after the click happens.
I agree that it gets a little hairier with the STRF, but positive lags still make by far the most sense to me.
(And sorry in advance to any early dev
adopters who might have code that will break due to this change.)
@larsoner this was a good catch before 0.15...it would have been way more annoying to fix this after users had started incorporating it using the pip version. IMO when it's on master
anything goes in terms of breaking changes :-)
+1 for positive lags as well
@nbara before you put too much time/effort into updating an example in #4546 (which I just bothered you about) let me take care of the lags. Because yours should then be reversed / negative while plotting.
looks like we're on the same page...in that case: https://youtu.be/U3HMZsDioB8?t=4m17s
good we don't release often enough :)
@nbara are you still using ReceptiveField
much? Any interest in adding the inverse transform bits? Sounds like we have converged on setting it to patterns_
. I say we just compute it when doing fit
since it should be a small cost compared to the other steps.
score
method for use withGridSearchCV
Epochs
classplot
method that is smart enough to at least deal with 1D (STA) and 2D (STRF) plotting, probably usingNonUniformImage
class rather thanpcolormesh
linear_regression_raw
,XDawn
(categorical data, #4940)