sappelhoff / pyprep

PyPREP: A Python implementation of the Preprocessing Pipeline (PREP) for EEG data
https://pyprep.readthedocs.io/en/latest/
MIT License
136 stars 33 forks source link

Strange PyPrep/Ransac outputs #100

Closed golnousha closed 3 years ago

golnousha commented 3 years ago

Hi there, I am building a large pipeline for EEG resting-state preprocessing. I am using Pyprep to detect bad channels on some clinical datasets (quite noisy to start with). When I run the function and check the outputs for :

  1. noisy_channels_original
  2. interpolated_channels
  3. still_noisy_channels

I would expect here for 2 and 3 to be a subset of of 1. However, I sometimes find the interpolated list (2) to be longer than the original list of channels (1), and the final list (3) to be often empty. At other times, 3 doesn't contain the channels flagged in 1 that were not interpolated.

Have others encountered this issue? Thanks for the feedback

a-hurst commented 3 years ago

Hi @golnousha, thanks for letting us know this is a point of confusion! Helps us know what we need to focus on in the documentation.

First of all, are your EEG recordings average-referenced prior to PyPREP? If it isn't, that likely explains the discrepancies you're seeing: noisy_channels_original is the initial detected noisy channels prior to PyPREP's internal average referencing. This is because PyPREP wants to make sure that any flat or NaN-containing channels are ignored when calculating the initial average reference. I've encountered this big-time with BioSemi recordings, where the initial CMS/DRL referencing masks most bad-by correlation and bad-by-RANSAC channels on its first pass.

Also, even if the original signals are average-referenced, (2) will still generally be longer than (1). This is because of PyPREP's iterative approach to finding a "clean" reference signal (copied here from the docs):

  1. First, an initial pass of noisy channel detection is performed to identify channels bad by NaN values, flat signal, or low SNR: the data is then average-referenced excluding these channels. These channels are subsequently marked as “unusable” and are excluded from any future average referencing.
  2. Noisy channel detection is performed on a copy of the re-referenced signal, and any newly detected bad channels are added to the full set of channels to be excluded from the reference signal.
  3. After noisy channel detection, all bad channels detected so far are interpolated, and a new estimate of the robust average reference is calculated using the mean signal of all good channels and all interpolated bad channels (except those flagged as “unusable” during the first step).
  4. A fresh copy of the re-referenced signal from Step 1 is re-referenced using the new reference signal calculated in Step 3.
  5. Steps 2 through 4 are repeated until either two iterations have passed and no new noisy channels have been detected since the previous iteration, or the maximum number of reference iterations has been exceeded (default: 4).

In a nutshell, this means that PyPREP will usually end up with more detected bad channels after re-referencing then before, since removing the influence of some noisy channels on the average reference signal can reveal others that weren't detected earlier (hence the iterative approach).

Hope that helps! Also, please make sure you're using the latest GitHub version of PyPREP for any production work: we've recently fixed a lot of bugs that are still present in the current PyPI release.