mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.72k stars 1.32k forks source link

`threshold='tfce'`? #5534

Closed jona-sassenhagen closed 6 years ago

jona-sassenhagen commented 6 years ago

Arguably, TFCE with default parameters is better than "regular" cluster-based permutation tests and "default" thresholds. Maybe we could allow threshold='tfce' to make it easier for users. We could use a comparatively low number, and give a warning that it will be slow, and advise users to manually use different settings if they just want to prototype.

Conceptually, this would turn the threshold parameter into a speed gauge - for those who want to make TFCE faster for prototyping.

larsoner commented 6 years ago

My fear is that whatever we choose as the default will probably not work very well for most people. The optimal parameters in terms of speed/accuracy tradeoff are data dependent. Basically TFCE is like a Riemann sum approximation of an integration -- clearly the start and step will depend on what you're trying to integrate.

We could try to guess them by, say, taking the max(abs(stat)) and choosing the spacing based on that, like:

max_ = np.abs(stat_fun(X)).max()
start = step = max_ / 5.

But then the TFCE params are data dependent, which itself is not so nice.

I think I'd rather show people how to use the params properly than try to be smart here.

jona-sassenhagen commented 6 years ago

Well again, you assume this will all things considered lead to the average result being more reliable? What if it means most people will instead use "normal" manually thresholded cluster-based permutation tests? I'm not so sure.

larsoner commented 6 years ago

Well again, you assume this will all things considered lead to the average result being more reliable?

Yes but this is not the only consideration. There is a speed/accuracy tradeoff, and it's not obvious how to make it.

To make things completely reliable/accurate we could set the start to zero and the step to be some tiny number, but this operation will essentially run forever. The other extreme is essentially what we do now, which is probably equivalent (I'd need to think about it) to using TFCE with start=threshold, step=np.inf, which is guaranteed to complete in a small amount of time (one clustering iter) but not be so accurate. So in this sense, what we do now is well defined essentially as the "guaranteed fast but not so great approximation".

In between these two there are approximations that probably make good tradeoffs between accuracy and speed. Robustly choosing a good one is not necessarily an easy problem, though, because it depends on the data. Maybe using something like starting at the 10th percentile and going in increments relative to the difference between the 90th and 10th percentiles would work in 90% of cases...?

If you are motivated to extensively test these tradeoffs and can find one that works in such a high percentage of cases, then it could make sense to use as an automated threshold='tfce'.

As for changing this to the default value for threshold, thinking about it more I'm actually not opposed to it. I imagine the vast majority of people override the threshold value already in their analyses. Changing it to tfce will make things take a lot longer but should be more accurate.

jona-sassenhagen commented 6 years ago

Ok now you've made it sound like work :|

larsoner commented 6 years ago

Maybe it won't be -- won't know until you try :)

larsoner commented 6 years ago

Closing as I've mentioned this in the stats revamping issue #4859