This repo provides
./preprocess
(data preprocessing files for SHHS and Sleep EDF)
./src
# create the data folder and enter
mkdir SLEEP_data; cd SLEEP_data
wget -r -N -c -np https://physionet.org/files/sleep-edfx/1.0.0/
# enter this folder and run preprocessing
cd ../preprocess
python sleepEDF_cassette_process.py --windowsize 30 --multiprocess 8
windowsize
means how long is each "signal epoch", usually it is 30 seconds, multiprocess
means how many process will be used. The same below.# create the data folder and enter
mkdir SHHS_data; cd SHHS_data
[THEN DOWNLOAD YOUR DATASET HERE, NAME THE DATA FOLDER "SHHS"]
# enter this folder and run preprocessing
cd ../src_preprocess
python shhs_process.py --windowsize 30 --multiprocess 8
windowsize
means how long is each "signal epoch", usually it is 30 seconds, multiprocess
means how many process will be used. The same below.cd ./src
# run on the SLEEP-EDF dataset
python -W ignore supervised.py --dataset SLEEP --n_dim 128
# run on the SHHS dataset
python -W ignore supervised.py --dataset SHHS --n_dim 256
# run on the SLEEP-EDF dataset
python -W ignore self_supervised.py --dataset SLEEP --model ContraWR --n_dim 128
# run on the SHHS dataset
python -W ignore self_supervised.py --dataset SHHS --model ContraWR --n_dim 256
# try other self-supervised models: "MoCo", "SimCLR", "BYOL", "SimSiam"
@article{yang2023self,
title={Self-supervised EEG Representation Learning for Automatic Sleep Staging},
author={Yang, Chaoqi and Xiao, Danica and Westover, M Brandon and Sun, Jimeng},
journal={JMIR AI},
year={2023}
}
@article{yang2023self,
title={Self-supervised EEG Representation Learning for Automatic Sleep Staging},
author={Yang, Chaoqi and Xiao, Danica and Westover, M Brandon and Sun, Jimeng},
journal={arXiv preprint arXiv:2110.15278},
year={2023}
}
If you find this repo is useful, please cite our paper. Feel free to contact me chaoqiy2@illinois.edu or send an issue for any problem.
The intuition is that the low-pass signals and high-pass signals might be both useful. So a broader idea is to maintain either the low-frequency or high-frequency or both low-and-high frequency information for data augmentation. My primary thinking is to design a low-pass filter (a, b) and a high-pass filter (c, d) for each dataset, where a < b < c < d.
Theoretically, these four values are hyperparameters and need to be set based on the validation set. Here, in our paper, the values are set more in an ad-hoc way since the datasets are fairly large and it is impossible to run a grid search for a perfect (a, b, c, d) combination. So what I did is first choose a combination and get the validation results. Based on the val results and some intuitions, we refine the combination and get the new validation results again and finally converge to the current values.