MMM data loading and preprocessing

JoBurchert commented 4 weeks ago

Hi, thanks for the interesting paper! I am currently trying to reproduce the results from the paper and have some questions regarding the preprocessing. Following the documentation, I have obtained the SEED dataset, which has the following structure:

SEED_EEG -- ExtractedFeatures_1s -- ExtractedFeatures_4s -- Preprocessed_EEG -- SEED_RAW_EEG

Are you using the extracted features for the 1s or 4s versions? Furthermore, there are several issues with the script SEED_DE.py. In the current version, the filename includes the datapath, which causes an error when trying to open the file in line 24. Additionally, the sorting for the filenames in line 12 returns the wrong ordering of the patients. Here, the patient ID will be returned as the following: ['10_xxx.mat', '11_xxx.mat', ..., '15_xxx.mat', '1_xxx.mat', '2_xxx.mat', ..., '9_xxx.mat'], which will lead to a misalignment with the labels.mat.

I would also be interested in the preprocessing for the SEED datasets as well as the TUEG, since those follow a different schema. Could you be so kind as to also include those in the repo?

As a last point, you describe how you perform the DE feature extraction in Eq. 1-5 in the appendix of your paper; however, I was unable to locate these steps in your code. Could you help me out in this regard and point me in the right direction?

Thanks a lot in advance!

victorywys commented 3 weeks ago

Hi,

Thank you for your interest in our paper and for bringing up these issues!

Data Preprocessing: In our experiments, we used both the 1s and 4s versions of the data. For the 4s data, it requires DE processing code that we have not yet released whose copyrights are held by the authors of the SEED datasets. However, the 1s data can be used directly with the extracted features provided in the dataset, i.e., ExtractedFeatures_1s.
Issues in SEED_DE.py:

Line 24. I'm not sure why it causes an error. Did you set the data_path in line 9 to your local path? Or can you share with us more details about the error you are encountering?
For the sorting problem, since each data_file.mat records a single experiment for one person, and the order of stimuli in an experiment is fixed, all data_file.mat follow the same set of labels. Therefore, there shouldn’t be a misalignment no matter how the filenames are sorted.

Preprocessing for SEED The preprocessing of EEG data in SEED undergoes the same process as the original dataset detailed here (Dataset Summary -> SEED_EEG -> B. "Extracted_features") with the codes mentioned in 1. Sincerely sorry that due to the copyright issue, we can not publish this part of codes. However, the 1s extracted features by the authors of SEED are the same as what we're using and are directly available for the experiments.

Thank you for your understanding. If you have any more questions or need further assistance, please feel free to ask.

JoBurchert commented 3 weeks ago

Thanks for your reply,

regarding the SEED_DE.py the combination of line 11 and 24 is causing issues because 'filenames' also contains the full path to the data and are then joined with the 'data_path' again producing the following error:

Traceback (most recent call last): File "/home/burchert/.local/lib/python3.10/site-packages/scipy/io/matlab/_mio.py", line 39, in _open_file return open(file_like, mode), True FileNotFoundError: [Errno 2] No such file or directory: '../data/SEED/SEED_EEG/ExtractedFeatures_1s/../data/SEED/SEED_EEG/ExtractedFeatures_1s/10_20131130.mat'

victorywys commented 2 weeks ago

Apologize for the bugs, we found we've incorrectly modified it when cleaning up the comments unnecessary code lines. We have now fixed it in pull request #24 . Thanks for your contribution!

microsoft / PhysioPro

MMM data loading and preprocessing #23