tiehangd / MUPS-EEG

Efficient Transfer Learning with Meta Update for Cross Subject EEG Classification
38 stars 15 forks source link

Maybe a bug in dataloader/data_preprocessing.py? the train data and test data you use are the same. #4

Closed DLily0129 closed 3 years ago

DLily0129 commented 3 years ago

Hi, I am very interested in your research direction and have done some related research.

But I found a hidden bug in your code dataloader/data_preprocessing.py. In general, there is no problem with the logic of the code (that is, there is no problem with the processing logic in the Matlab environment, and I have also seen similar Matlab version codes), but in the language of Python, problems will occur.

For two types of data: "T.mat" for training and ".E.mat" for testing, the same variable "data_return" is used in your code. You first assign training data(variable "a_X" transposed and intercepted) to variable "data_return" and append it to the "data" list. Then you assign testing data(variable "b_X" transposed and intercepted) to variable "data_return" and append it to the "data" list.

But please note that, in Python, the data stored in the list is the memory address corresponding to the data instead of the real data. Therefore, when the testing data(variable "b_X" transposed and intercepted) is assigned to the "data_return" variable, the training data which is appended to the "data" list previously will also change at the same time, because the training data saved before is just the address of the "data_return" variable.

Below are the relevant variables that I have seen using the "debug" mode in Pycharm. The same data in the "data" list confirms what I said above.

This is before the training data(transposed and intercepted variable "a_X" stored in variable "data_return") is appended to the "data" list. 捕获

This is after the training data(transposed and intercepted variable "a_X" stored in variable "data_return") is appended to the "data" list. The 0th dimension is different because it is intercepted by the variable "NO_valid_trial". 捕获2

This is to assign the testing data(transposed and intercepted variable "b_X") to the "data_return" variable, you can find that the 0th data element in the "data" list has also changed. 捕获3

This is after appending the testing data(transposed and intercepted variable "b_X" stored in variable "data_return") to the "data" list, you can find that the training data is the same as the test data. 捕获4

And this is the result of looping multiple times. 捕获5

Therefore, the training data and test data you use are exactly the same. I think this may make the results of your paper no longer credible.

tiehangd commented 3 years ago

Hi DLily0129, thank you for the comment. Please note we are doing cross subject EEG classification here and we are combining some subjects' "T." and "E." data as training and use other subjects as testing, which means we are not adhereing to the denotation in its original dataset of "T." for training and "E." for testing.