ENH: add support for cross generalization in time generalization

mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python

https://mne.tools

BSD 3-Clause "New" or "Revised" License

2.61k stars 1.3k forks source link

ENH: add support for cross generalization in time generalization #1302

Closed dengemann closed 10 years ago

dengemann commented 10 years ago

cc @kingjr @jdsitt @agramfort

this is to allow time generalization analyses between datasets. To me this would require adding a second epochs_list, e.g. epochs_list_other parameter to mne.decoding.time_generalization.

agramfort commented 10 years ago

why not. As long as it's the same subject it makes sense to me.

kingjr commented 10 years ago

Yep, it does make sense, see http://goo.gl/kET8ez, Figure 4 for comments on this type of analyses.

I would use a epochs_list_train, epoch_list_generalize or epoch_list_test as an input name.

However, I would recommend

not using categorical outputs from the SVM, but also continuous ones (e.g. using predict_probas). This is because in time generalization you often get prediction biases e;g. the classifier always predict category A whether trials are from A or B. We can discuss the reason of that if you want.
not just reporting a score but actually give the SVM output too. This is important because in many cases, when we train on one dataset and generalize to another, there is no wrong or right answer. For instance, one can train on face versus house and see how the classifier generalize to cats, dogs, chairs, etc.

So overall, there needs to be a redesign of the time_generalization function.

I have my little code for that, if you want to work from it let me know.

agramfort commented 10 years ago

@kingjr would you consider giving us a hand to add this feature?

dengemann commented 10 years ago

@kingjr @agramfort I think with @jdsitt we know what to do here. The only thing you need to add is support for returning a second score matrix for X2 and y2. The rest is a matter of the inputs. If you pass SVC(probability=True) as clf and use the roc_auc scorer it should do exactly what JR suggests.

dengemann commented 10 years ago

@agramfort one question. I just realized that parallelization is performed over folds instead of over times. Any reason for that?

agramfort commented 10 years ago

Sklearn way but no strong feeling. I burn all my CPUS the way it is done now

dengemann commented 10 years ago

Sklearn way but no strong feeling. I burn all my CPUS the way it is done now

but what if you have 5 folds and 24 CPUs?

kingjr commented 10 years ago

The main reason for me was that the initial pipeline I had made allows you to use different window widths (@jdsitt uses this code, so you can check it out with him). In the extreme scenario, you can specify that the classifier is trained on all channels x all time points - in which case you end up with no possible parallelization. Folding on the other hand is constant. The second reason is that parallelization increases the need for memory which on my computers was not sufficiently large to allow 24 CPU * 500 trials * 306 sensors * n time samples for instance - and * m frequency bin potentially.

It would be nice to have an option that allows you to choose the level of parallelization.

[The rational for choosing different window widths is detailed in this paper. The general point being that i) noise and ii) signal may be partially independent across time samples, and thus be combined to increase SNR. However, the larger the window width and the smaller the temporal resolution of the effect of interest.]

kingjr commented 10 years ago

@dengemann if you're at the ICM on friday or saturday, perhaps I could come & see all this with you in person.

dengemann commented 10 years ago

@kingjr will be at ICM over the next 5-6 days including Fri Sat and probably Sun ;-) Would be cool to meet. I was thinking the same with regard to the level of parallelization. The rest should be straight forward to implement. It would be great to test against your old script with some dataset to make sure we get it right.

kingjr commented 10 years ago

FYI: see #1309

agramfort commented 10 years ago

cf #1309