Closed jona-sassenhagen closed 6 years ago
My fear is that whatever we choose as the default will probably not work very well for most people. The optimal parameters in terms of speed/accuracy tradeoff are data dependent. Basically TFCE is like a Riemann sum approximation of an integration -- clearly the start
and step
will depend on what you're trying to integrate.
We could try to guess them by, say, taking the max(abs(stat))
and choosing the spacing based on that, like:
max_ = np.abs(stat_fun(X)).max()
start = step = max_ / 5.
But then the TFCE params are data dependent, which itself is not so nice.
I think I'd rather show people how to use the params properly than try to be smart here.
Well again, you assume this will all things considered lead to the average result being more reliable? What if it means most people will instead use "normal" manually thresholded cluster-based permutation tests? I'm not so sure.
Well again, you assume this will all things considered lead to the average result being more reliable?
Yes but this is not the only consideration. There is a speed/accuracy tradeoff, and it's not obvious how to make it.
To make things completely reliable/accurate we could set the start to zero and the step to be some tiny number, but this operation will essentially run forever. The other extreme is essentially what we do now, which is probably equivalent (I'd need to think about it) to using TFCE with start=threshold, step=np.inf
, which is guaranteed to complete in a small amount of time (one clustering iter) but not be so accurate. So in this sense, what we do now is well defined essentially as the "guaranteed fast but not so great approximation".
In between these two there are approximations that probably make good tradeoffs between accuracy and speed. Robustly choosing a good one is not necessarily an easy problem, though, because it depends on the data. Maybe using something like starting at the 10th percentile and going in increments relative to the difference between the 90th and 10th percentiles would work in 90% of cases...?
If you are motivated to extensively test these tradeoffs and can find one that works in such a high percentage of cases, then it could make sense to use as an automated threshold='tfce'
.
As for changing this to the default value for threshold
, thinking about it more I'm actually not opposed to it. I imagine the vast majority of people override the threshold
value already in their analyses. Changing it to tfce
will make things take a lot longer but should be more accurate.
Ok now you've made it sound like work :|
Maybe it won't be -- won't know until you try :)
Closing as I've mentioned this in the stats revamping issue #4859
Arguably, TFCE with default parameters is better than "regular" cluster-based permutation tests and "default" thresholds. Maybe we could allow
threshold='tfce'
to make it easier for users. We could use a comparatively low number, and give a warning that it will be slow, and advise users to manually use different settings if they just want to prototype.Conceptually, this would turn the
threshold
parameter into a speed gauge - for those who want to make TFCE faster for prototyping.