ycq091044 / ContraWR

JMIR AI'23: EEG dataset processing and EEG Self-supervised Learning
42 stars 6 forks source link
contrastive-learning eeg-signals eeg-signals-processing self-supervised-learning

Open EEG Data Preprocessing and SSL Baselines

This repo provides

1. Folder Tree

2. Data Preparation

2.1 Instructions for Sleep EDF

2.2 Instructions for SHHS

3. Running the Experiments

3.1 supervised model

cd ./src
# run on the SLEEP-EDF dataset
python -W ignore supervised.py --dataset SLEEP --n_dim 128
# run on the SHHS dataset
python -W ignore supervised.py --dataset SHHS --n_dim 256

3.2 run the self-supervised learning model

# run on the SLEEP-EDF dataset
python -W ignore self_supervised.py --dataset SLEEP --model ContraWR --n_dim 128
# run on the SHHS dataset
python -W ignore self_supervised.py --dataset SHHS --model ContraWR --n_dim 256
# try other self-supervised models: "MoCo", "SimCLR", "BYOL", "SimSiam"

Citation

@article{yang2023self,
  title={Self-supervised EEG Representation Learning for Automatic Sleep Staging},
  author={Yang, Chaoqi and Xiao, Danica and Westover, M Brandon and Sun, Jimeng},
  journal={JMIR AI},
  year={2023}
}
@article{yang2023self,
  title={Self-supervised EEG Representation Learning for Automatic Sleep Staging},
  author={Yang, Chaoqi and Xiao, Danica and Westover, M Brandon and Sun, Jimeng},
  journal={arXiv preprint arXiv:2110.15278},
  year={2023}
}

If you find this repo is useful, please cite our paper. Feel free to contact me chaoqiy2@illinois.edu or send an issue for any problem.

Clarification on Bandpass Filtering

The intuition is that the low-pass signals and high-pass signals might be both useful. So a broader idea is to maintain either the low-frequency or high-frequency or both low-and-high frequency information for data augmentation. My primary thinking is to design a low-pass filter (a, b) and a high-pass filter (c, d) for each dataset, where a < b < c < d.

Theoretically, these four values are hyperparameters and need to be set based on the validation set. Here, in our paper, the values are set more in an ad-hoc way since the datasets are fairly large and it is impossible to run a grid search for a perfect (a, b, c, d) combination. So what I did is first choose a combination and get the validation results. Based on the val results and some intuitions, we refine the combination and get the new validation results again and finally converge to the current values.