ucdavis / erplab

ERPLAB Toolbox is a free, open-source Matlab package for analyzing ERP data. It is tightly integrated with EEGLAB Toolbox, extending EEGLAB’s capabilities to provide robust, industrial-strength tools for ERP processing, visualization, and analysis. A graphical user interface makes it easy for beginners to learn, and Matlab scripting provides enormous power for intermediate and advanced users.
http://erpinfo.org/erplab
267 stars 73 forks source link

Data quality metric for a 500 ms-long time window #188

Open martyna-spyra opened 7 months ago

martyna-spyra commented 7 months ago

Hi, I'm looking for help with data quality metrics, any advice will be greatly appreciated!

I'm interested in comparing the quality of various EEG datasets used to train machine learning models. The datasets I'm comparing were preprocessed, epoched into 500 ms-long time windows time-locked to the stimulus onset, and bin-listed into 40 different classes. Given that the recordings come from a typical ERP experimental setup (although with a high number of classes), I was wondering if any of the ERPlab quality metrics would be suitable here. I find the clarity offered by the SME metric very intuitive for short time windows, but I would assume that it probably wouldn't remain reliable for longer time windows, such as 500 ms. Is any of the existing methods of measuring signal quality suitable for longer time spans? Would it, for example, be suitable to apply SME with a sliding window (overlapping or not?), or with peak-latency finding? Or should I look into a completely different approach for measuring SNR?

Also please feel free to redirect me elsewhere with my question. Thank you!

stevenjluck commented 7 months ago

That’s a great question!

I think you’ll want to obtain the SME values in a way that corresponds to how the machine learning models will be trained. If you are training the models with the averaged voltage over the 500-ms period, then you could simply get the SME for the averaged voltage over this period.

If you’re going to train the model on individual time points, I would recommend getting the SME at each time point (which is equivalent to the standard error of the mean at each time point), and take the RMS of the single-point SME values to get an aggregate measure of data quality across the time period.

It seems like the same issue would arise when considering your multiple electrode sites. To obtain a single aggregate value across electrode sites, you could take the RMS of the single-site SME value.

I hope this helps!

Steve


Steve Luck, Ph.D. (he/him/his) Distinguished Professor, Department of Psychologyhttp://psychology.ucdavis.edu/ Core Faculty, Center for Mind & Brainhttps://mindbrain.ucdavis.edu University of California, Davis 267 Cousteau Place [directionshttps://mindbrain.ucdavis.edu/directions] Room 126 Davis, CA 95618 (530) 754-4524 @.**@.> http://mindbrain.ucdavis.edu/people/sjluck http://lucklab.ucdavis.edu http://erpinfo.org

From: m-spy @.> Date: Friday, March 8, 2024 at 5:18 AM To: ucdavis/erplab @.> Cc: Subscribed @.***> Subject: [ucdavis/erplab] Data quality metric for a 500 ms-long time window (Issue #188)

Hi, I'm looking for help with data quality metrics, any advice will be greatly appreciated!

I'm interested in comparing the quality of various EEG datasets used to train machine learning models. The datasets I'm comparing were preprocessed, epoched into 500 ms-long time windows time-locked to the stimulus onset, and bin-listed into 40 different classes. Given that the recordings come from a typical ERP experimental setup (although with a high number of classes), I was wondering if any of the ERPlab quality metrics would be suitable here. I find the clarity offered by the SME metric very intuitive for short time windows, but I would assume that it probably wouldn't remain reliable for longer time windows, such as 500 ms. Is any of the existing methods of measuring signal quality suitable for longer time spans? Would it, for example, be suitable to apply SME with a sliding window (overlapping or not?), or with peak-latency finding?

Also please feel free to redirect me elsewhere with my question. Thank you!

— Reply to this email directly, view it on GitHubhttps://github.com/ucdavis/erplab/issues/188, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC67QFYA5AI2PTU24RRJ7DTYXG3ANAVCNFSM6AAAAABEMZ2PAOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE3TMMBSG42TEMA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

martyna-spyra commented 6 months ago

Hi Steve,

This was very helpful and I really appreciate your reply on this! Computing the SME for each time point, then taking the RMS over an entire epoch for each electrode site, and then for selected electrode sites within a ROI really worked for our use case.

Best wishes, Martyna