PREP can't always flag low amplitude channels as bad-by-deviation

a-hurst commented 3 years ago

I've been picking away at refactoring the NoisyChannels unit tests so that each 'bad-by' type is covered by a separate test. In the process, though, I think I've discovered a weak spot in the original PREP logic: if I use a different test file, the 'bad-by-low-deviation test' below stops working:

https://github.com/sappelhoff/pyprep/blob/bc46978b912737bc3b843438a3cbe1cd9d4dc33d/tests/test_find_noisy_channels.py#L51-L60

This holds true even if I replace the "divide by 10" with a "divide by 100,000": the "robust channel deviation" z-score seems to plateau at around -4, never reaching the +/- 5 threshold for being bad by deviation.

Looking at the actual math, this makes sense:

PREP calculates the variance within each channel using 0.7413 * the interquartile range for the signal.
PREP calculates the median and variance (again, 0.7413 * the IQR) of those values and uses them to do a non-traditional Z-score of the channel variances (i.e., (variance - median_variance) / variance_of_the_variance).
PREP compares the absolute values of those Z-scores to a threshold (default = 5) and flags any channels that exceed it as "bad-by-deviation"

The problem here is that in step 2, the variances calculated in step 1 have a minimum value of zero. As such, even a channel with an amplitude of zero won't be detected as bad-by-deviation if the median isn't at least 5x the variance of the variances.

A quick-and-dirty fix is to z-score the log of the channel variances, which makes the variances more normally-distributed and thus makes the detection of low-amplitude channels much easier. However, this also increases the threshold for detecting high-amplitude bad-by-deviation channels, with the required multiplication factor going from 2.4 to 3.9 for my test channel.

Any ideas on how to better handle this?

sappelhoff commented 3 years ago

Wouldn't channels with suspiciously ~~like~~ low deviation be flagged by the bad by flat check? Else, would decreasing the threshold from 5 to 3.x help? (3.x being that canonical value that I always forget the meaning of and what x actually is)

a-hurst commented 3 years ago

They'd only be flagged as bad-by-flat if their standard deviations were extremely low (i.e., practically no signal). This check seems to be for channels where they still have a signal, but its variance is just considerably lower than the median variance.

Decreasing the z-score threshold to 3.29 (what I think you're thinking of, corresponds to a two-tailed test at p < 0.001) would definitely make detecting the low-amplitude channels easier, but even then the Z-test still rests on the assumption that you're using it with a normally-distributed population. Since the population here is a bunch of variance measures (which would be right-skewed), the right thing to do would be to either use a different test that doesn't make that assumption, or to try and transform the variances so they can be treated as roughly normal.

I guess the most important question here is: how weak or strong should a signal need to be, relative to other channels in the same dataset, in order to warrant exclusion from the average reference signal? If we can figure out a good rule of thumb for that, it should be easier to figure out how to improve the stats.

sappelhoff commented 3 years ago

the right thing to do would be to either use a different test that doesn't make that assumption, or to try and transform the variances so they can be treated as roughly normal.

I think the PREP devs didn't think of what they did as a statistical test that matches certain assumptions (maybe I am wrong). --> Rather, I think they tried a bunch of thresholds on known robust metrics and "creative" robust metrics to see "what works".

Now apparently they did not notice that low deviation channels are not detected as easily - but it's also possible that they were relying on other methods to catch these channels.

If we can improve it, great. However for your question:

how weak or strong should a signal need to be, relative to other channels in the same dataset, in order to warrant exclusion from the average reference signal?

I think that's highly context dependent, and finding a good metric here can be its own scientific paper. --> If a thresh of 3.29 works for us for now, let's take it, document it, and stick with it until we find something better.

a-hurst commented 3 years ago

I think the PREP devs didn't think of what they did as a statistical test that matches certain assumptions

Probably, but I think it still makes sense to treat it like one: we're testing to see whether each channel's amplitude meaningfully deviates from all the others, so it's still a statistical procedure (like outlier removal for reaction times) if not exactly a statistical test.

Regardless, I think you're right that this deserves some simulation tests or something: maybe I'll write an R script for reading in the eegbci dataset and visualizing the effects of some different outlier exclusion techniques. Not likely to touch that for a while though, fixing the remaining MATLAB inconsistencies definitely takes priority!

sappelhoff commented 1 month ago

Reading back on this, I think the log transform may be a good change indeed. Did you mean changing these lines: https://github.com/sappelhoff/pyprep/blob/88978048cb0de0a1264aeb6a77b5edbd07dcb932/pyprep/find_noisy_channels.py#L288-L294

to:

-chan_amplitudes = _mat_iqr(self.EEGData, axis=1) * IQR_TO_SD 
+chan_amplitudes = np.log(_mat_iqr(self.EEGData, axis=1) * IQR_TO_SD )
...

? :-)

sappelhoff / pyprep

PREP can't always flag low amplitude channels as bad-by-deviation #83