StratifiedKFold with shuffling in the decoding step?

mne-tools / mne-bids-pipeline

Automatically process entire electrophysiological datasets using MNE-Python.

https://mne.tools/mne-bids-pipeline/

BSD 3-Clause "New" or "Revised" License

137 stars 65 forks source link

StratifiedKFold with shuffling in the decoding step? #141

Closed hoechenberger closed 4 years ago

hoechenberger commented 4 years ago

Currently we create the cross-validation object in our decoding step (08) of the pipeline via: https://github.com/mne-tools/mne-study-template/blob/b61a5ca66aaef1f631d7ce2def3b1cde5d611729/08-sliding_estimator.py#L80-L81

By default, StratifiedKFold does not shuffle, meaning that the passed random_state doesn't have any effect (it produces a warning, though).

So – should we enable shuffling? Intuitively I would say yes, but want to hear your opinion, @agramfort

agramfort commented 4 years ago

we could. But I would even favor using StratifiedShuffleSplit

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html

but we should do this after showing empirical impact on a few datasets

jasmainak commented 4 years ago

do we want shuffling? nearby epochs could make the decoding easier for the classifier? e.g., if there was a temporary dc offset

agramfort commented 4 years ago

true that can be a bad idea if you have short ISI

hoechenberger commented 4 years ago

Maybe we shouldn't create our own cv object in the first place and simply go with the mne-python defaults instead?

https://mne.tools/dev/generated/mne.decoding.cross_val_multiscore.html#mne.decoding.cross_val_multiscore

cv int, cross-validation generator | iterable

Determines the cross-validation splitting strategy. Possible inputs for cv are: None, to use the default 3-fold cross validation, integer, to specify the number of folds in a (Stratified)KFold, An object to be used as a cross-validation generator. An iterable yielding train, test splits. For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, sklearn.model_selection.StratifiedKFold is used. In all other cases, sklearn.model_selection.KFold is used.

agramfort commented 4 years ago

+1 to default to mne defaults

hoechenberger commented 9 months ago

It appears that with #472, we changed the behavior again and have been shuffling epochs since, before feeding them into the sliding estimator. This deviates from MNE's default behavior.

If I disable shuffling, I get worse decoding scores – I suppose this could be indicative of information leakage, hence too optimistic performance? (bad)!

Or would one argue that the estimator learned to better generalize across all trials, hence it performs better (and this would be a positive thing)?

In my EEG recordings, typically data quality gets worse as the experiment progresses: ERPs in, say, blocks 7 and 8 are much noisier than in blocks 1 and 2. Now - should I better shuffle or not?

larsoner commented 9 months ago

Are you sure it's the shuffling rather than the stratification? To me it's weird that shuffling would make any difference, but I could see how stratification could

hoechenberger commented 9 months ago

I'm sorry, I hijacked this thread and actually started talking about an analysis I conducted outside of the pipeline (which pushed me to look at the pipeline code to see how we approach things here)

I'm solving a regression problem with a Ridge classifier, both with entire epochs and on a time-by-time basis. In both cases, shuffling decreases classification performance. I then got concerned about what we're doing in MNE-BIDS-Pipeline

If you're interested, I may be able to share code and data

agramfort commented 9 months ago

to me when you shuffle epochs performance should be better as you have the risk of leaking future data in the train. To avoid block/run effects you typically train and test splitting across runs/blocks.

Message ID: @.***>

hoechenberger commented 9 months ago

to me when you shuffle epochs performance should be better as you have the risk of leaking future data in the train.

Yes this exactly was my concern!

To avoid block/run effects you typically train and test splitting across runs/blocks.

Interesting thought regarding the block structure.

We currently don't keep track of this in the pipeline

And the blocks I have in my experiments are typically a bit short (for example, 8 blocks à 30 trials)

The natural thing here would be running an unshuffled 8-fold CV, I suppose? Is that what you're getting at?

Edit: No, wait. You're saying we run CV on each block separately?

I suppose with 30 trials per block, I can only really do LOO instead of K-Fold ... thoughts?

hoechenberger commented 9 months ago

I now tried TimeSeriesSplit to generate a window spanning one experimental block per fold. Performance is horrible, but maybe there's just no clear effect in my data, then. Or there's just not enough data per block to properly train the classifier.