Preprocessing TUH data - Githubissues

b-brebion commented 1 year ago

Hello,

Thanks for your work! I wanted to know how the pickle files in the load_filtered_eeg.py file (L74, L76, L92, L94) were obtained? Also, my first objective is to run your unsupervised pipeline with the TUH dataset, but in the TUH EEG Seizure Corpus (TUSZ) I notice that the statistics concerning the number of recordings with seizures, or the duration of these recordings are much bigger than those announced in your publication (even when taking separately the Train, Dev, or Eval sets of the corpus for example). Is it the preprocessing of the data that brings you to these numbers?

Thanks!

ilkyyldz95 commented 1 year ago

Thank you for noticing the missing info! We have performed some of the preprocessing starting from online data in our previous work, which was missing in this repo.

You can preprocess each dataset following our previous work at https://github.com/ilkyyldz95/EEG_VAE. read_eeg_xxx files perform preprocessing for each dataset xxx out of MIT, UPenn and TUH.

These details were added in README.

b-brebion commented 1 year ago

Thanks I was indeed able to preprocess the data with this code. By the way, the format of the TUH dataset files has been completely modified since then (file hierarchy, seizures are now per channel instead of per patient, etc...), so I had to make several changes. I can open a Pull Request with my own version of the read_eeg_tuh.py file if you want.

Another thing, I wanted to have the confirmation that on a training with unsupervised learning (with the first command line proposed in the README for example) the validation loss would always remain at 0 since it concerns the supervised part (why is there a complete evaluate() function in the AnomalyRunner class then?), and that the evaluation of the model is done after a complete training, using the data including seizures?

ilkyyldz95 commented 1 year ago

Thank you, it would be great if you could pull request the EEG_VAE repo for the updates.

It is correct that during unsupervised training loss would be 0, as the validation data is held-out from the training data (--val_ratio 0.2) with all 0 classes. Then the testing command calls the evaluate function for supervised evaluation.

If you prefer, you can provide a separate validation set with both classes by choosing --val_pattern _your_validation_data_filename , instead of --val_ratio.

ilkyyldz95 / EEG_MVTS

Preprocessing TUH data #1