nbara / python-meegkit

🔧🧠 MEEGkit: MEG & EEG processing toolkit in Python
https://nbara.github.io/python-meegkit/
BSD 3-Clause "New" or "Revised" License
186 stars 51 forks source link

complexwarning when using asr? #77

Closed jeremylinwx closed 1 week ago

jeremylinwx commented 5 months ago

i have 2 different calibration data, one of them causes the complexwarning to appear but the other doesnt. no error occurs regardless, but the calibration data that causes the complexwarning results in a weird looking reconstructed signal.

nbara commented 5 months ago

Hi @jeremylinwx , thanks for flagging this.

I will need more context in order to resolve this. Can you share a small chunk of data that causes the problem so I can replicate the issue?

jeremylinwx commented 5 months ago

hi, thanks for responding so quickly i cant seem to share the .mat file that causes this warning; is there any other way for me to share it with you? also, i have recreated the issue, maybe you could try it as well: it seems that data that has been preprocessed with ICA in EEGLAB and then used as calibration with meegkit messes up the reconstructed signal

nbara commented 5 months ago

Github won't let you sent .mat files.

You either need to zip it, or host it somewhere like wetransfer

jeremylinwx commented 5 months ago

calibration.mat.zip try this?

nbara commented 5 months ago

Thanks that works, I'll look into it ASAP

jeremylinwx commented 5 months ago

thank you; also, we arent on the latest version of meegkit; im not sure which version we are actually using, it just says 0.1; let me know if you need the versions of any of the other packages we have in our environment

jeremylinwx commented 5 months ago

Hello again. Just to clarify and to not cause any confusion I want to point out that the original complexwarning I mentioned was occurring in an older version of meegkit (0.1). After updating to the latest version (0.1.7) I get a different error when using the 'euclid' method. I have attached all the files used to replicate the error.

The error seems to be in the geometric_median() function which was not present in the first version of meegkit (0.1) that I was using. I tried replace lines 578 , 579 in asr.py with 575, 576 and the TEST-meegkit.py was able to run without error. The data not processed with ICA resulted in a good reconstructed signal. However, the data processed with ICA signal resulted in a bad reconstructed signal.

File details: -TEST-meegkit.py: Script that loads the .mat files and recreates the error. -calibration_noblink.mat: data without any ICA preprocessing -calibration_ICA.mat: data with ICA preprocess -matlabn3.mat: just some random data to perform asr on using the calibration signal.

testing_files.zip

nbara commented 5 months ago

Hi @jeremylinwx I'm trying to make sens of the data you sent over.

Can you say a bit more about the shapes of the data I should be expecting (number of channels, number of trials, time points, sampling frequency)?

nbara commented 5 months ago

Why do you only apply a bandpass filter to the trial array, and not the calibration_signal ?

nbara commented 5 months ago

Ok, last question : why do you go through the trouble of applying a bandpass filter on the trial data, and then you apply ASR on the unfiltered data?

Replacing this by sig_sampled seems to give an output/plot.

asr.fit(calibration_signal)

# should be 
# window = sliding_window(sig_sampled, window=250, step=250)
window = sliding_window(trial, window=250, step=250)

out = np.zeros_like(window)

for i in range(window.shape[1]):
    print(i)
    out[:, i, :] = asr.transform(window[:, i, :])
jeremylinwx commented 5 months ago

Hi @jeremylinwx I'm trying to make sens of the data you sent over.

Can you say a bit more about the shapes of the data I should be expecting (number of channels, number of trials, time points, sampling frequency)?

Yes, sorry, the matlab3.mat file is a single raw 8s-trial with 19 channels sampled at 500Hz; so it is 19 x 4000; calibration files has the same number of channels, both are 1 minute long and preprocessed with bandpass filtering from 1-40Hz and down sampled to 250Hz, so they are 19 x 15000, and one has ICA done and blink components removed (calibration_ICA.mat)

jeremylinwx commented 5 months ago

Ok, last question : why do you go through the trouble of applying a bandpass filter on the trial data, and then you apply ASR on the unfiltered data?

Replacing this by sig_sampled seems to give an output/plot.

asr.fit(calibration_signal)

# should be 
# window = sliding_window(sig_sampled, window=250, step=250)
window = sliding_window(trial, window=250, step=250)

out = np.zeros_like(window)

for i in range(window.shape[1]):
    print(i)
    out[:, i, :] = asr.transform(window[:, i, :])

Right sorry, this was my mistake. The preprocessing pipeline should be filtering --> downsampling --> ASR. The sliding_window function should indeed have been called on sig_sampled rather than the raw trial. Making that change does fix the issue and there is no error anymore.

But I am still curious why running ICA and removing components from the calibration data results in distortions in the reconstructed data with ASR. As far as I can tell, the supplied calibration data files contains no information about ICA and only contains numerical values.

Also, I have been testing the version of meegkit I have implemented with the latest version (0.1.7); using the same calibration files, it seems that the latest version is more lenient in handling artifacts and does not correct the signal as aggressively; was this implemented by design? (I will attach some pictures comparing the 2 versions once I have them organised)

jeremylinwx commented 5 months ago

ASR_comparison.zip

These are just some examples I have ran using the same calibration data file. Blue is the original and orange is the reconstructed data. version1 refers to the 0.1 version of meegkit and version7 refers to the latest 0.1.7 version of meegkit available here

nbara commented 5 months ago

But I am still curious why running ICA and removing components from the calibration data results in distortions in the reconstructed data with ASR. As far as I can tell, the supplied calibration data files contains no information about ICA and only contains numerical values.

So you are applying ICA before fitting the ASR model?

And then you are applying ASR on the non-ICA'd data? That sounds fishy to me.

Either way, I don't think ASR is intended to work with rank-deficient data (TBC). I'm curious, have you encountered a paper where they do this (ICA then ASR)? My instinct would be to do the opposite (ASR then ICA).

Also, I have been testing the version of meegkit I have implemented with the latest version (0.1.7); using the same calibration files, it seems that the latest version is more lenient in handling artifacts and does not correct the signal as aggressively; was this implemented by design? (I will attach some pictures comparing the 2 versions once I have them organised)

I have not touched the ASR code in a while so this is really the case it is not intended behaviour. Will wait for you confirmation before digging into this further.

nbara commented 5 months ago

ASR_comparison.zip

These are just some examples I have ran using the same calibration data file. Blue is the original and orange is the reconstructed data. version1 refers to the 0.1 version of meegkit and version7 refers to the latest 0.1.7 version of meegkit available here

This is not intended. Do you get any error / warning in 0.1.7 that you don't get in 0.1 ?

jeremylinwx commented 5 months ago

But I am still curious why running ICA and removing components from the calibration data results in distortions in the reconstructed data with ASR. As far as I can tell, the supplied calibration data files contains no information about ICA and only contains numerical values.

So you are applying ICA before fitting the ASR model?

And then you are applying ASR on the non-ICA'd data? That sounds fishy to me.

Either way, I don't think ASR is intended to work with rank-deficient data (TBC). I'm curious, have you encountered a paper where they do this (ICA then ASR)? My instinct would be to do the opposite (ASR then ICA).

Also, I have been testing the version of meegkit I have implemented with the latest version (0.1.7); using the same calibration files, it seems that the latest version is more lenient in handling artifacts and does not correct the signal as aggressively; was this implemented by design? (I will attach some pictures comparing the 2 versions once I have them organised)

I have not touched the ASR code in a while so this is really the case it is not intended behaviour. Will wait for you confirmation before digging into this further.

Not exactly. We usually do ASR on our data before ICA. These are all done offline on MATLAB. However, our application is implemented on Python. The preprocessing pipeline is still in the same order. Its just I noticed this "complexWarning" while doing some testing on our application, and then noticed that the reconstructed data was distorted. After digging a little bit more, I found that I could recreate the "complexWarning" and distorted data if I used a calibration signal that has been ICA-ed. I can't trace the source of the calibration signal that initially created this issue since I wasn't the one that recorded it, but a quick look at it shows there aren't any blink artifacts in the 1-minute of data, which probably meant that it was processed with ICA.

jeremylinwx commented 5 months ago

ASR_comparison.zip These are just some examples I have ran using the same calibration data file. Blue is the original and orange is the reconstructed data. version1 refers to the 0.1 version of meegkit and version7 refers to the latest 0.1.7 version of meegkit available here

This is not intended. Do you get any error / warning in 0.1.7 that you don't get in 0.1 ?

So far, no. Both versions run fine, no changes to the default values of the ASR class, just the reconstructed signals that differ quite drastically.

nbara commented 4 months ago

So far, no. Both versions run fine, no changes to the default values of the ASR class, just the reconstructed signals that differ quite drastically.

Ok I'll look into it

nbara commented 4 months ago

@jeremylinwx can you tell me what version you are referring to when you say 0.1 ?

The oldest version I uploaded on pypi is 0.1.3, and it is not giving me any difference

0.1.7:

0 1 7

0.1.3:

0 1 3
jeremylinwx commented 4 months ago

I am not exactly sure which version of meegkit this is, it just says 0.1; it was installed by someone else who had left the lab awhile ago and we aren't sure how he got hold of this package. I have attached what we have. meegkit_0.1.zip

nbara commented 1 week ago

I'm closing this as I can't reproduce it. If we find another working implementation of ASR that gives different results, I shall re-open