raphaelvallat / yasa

YASA (Yet Another Spindle Algorithm): a Python package to analyze polysomnographic sleep recordings.
https://raphaelvallat.com/yasa/
BSD 3-Clause "New" or "Revised" License
425 stars 115 forks source link

Bad channel detection #44

Closed Pierre-Bartet closed 10 months ago

Pierre-Bartet commented 3 years ago

Nice job and very honest comparison to other models in the article. Do you plan to implement some bad channel detection ? Unproperly connected or highly noisy leads are currently not identified by the artifact detection (same for pyprep and MNE).

raphaelvallat commented 3 years ago

Thank you @Pierre-Bartet, appreciate it! I agree that it would be a nice feature, i.e. detecting disconnected / flat or very noisy channels, not only for sleep staging but as a general preprocessing function in YASA (e.g. yasa.detect_bad_channels). Do you know of existing algorithms work well and that we could re-implement in YASA? Or should we design something from scratch?

Pierre-Bartet commented 3 years ago

For very simple cases (for example flat), MNE or pyprep seems OK, but they can fail on cases that are obvious for human reviewers. Are there some epochs labeled as "bad EEG signal" in the different training datasets you used for the sleep stage classification? They could be used to train some simple models.

raphaelvallat commented 3 years ago

Not epochs, but we do have some markers of overall EEG signal quality (e.g. https://sleepdata.org/datasets/mesa/variables/quo2m15). However, I think I'd prefer to use a more heuristic approach for this one and stay away from ML model if possible.

Pierre-Bartet commented 3 years ago

Not epochs, but we do have some markers of overall EEG signal quality

Ok, it would have been really helpful at least to evaluate the model performances, whether it be a ML model or a heuristic approach. I think a problem is that clinical study data is often much better than real world data, because bad signals are often immediately manually rejected.

However, I think I'd prefer to use a more heuristic approach for this one and stay away from ML model if possible.

Whatever give the best and most robust performances is the best, but I agree that a good priority order is:

  1. No need for a model at all, for example because the hardware makes it physically impossible to have noise or immediately reject it
  2. Simple hard coded rules, which physically / physiologically make sense and are expected to be robust
  3. Simple models such as regressions or random forests
  4. Complex models such as neural networks