RANSAC correlations (and logic) differ between PyPREP and MATLAB PREP

sappelhoff / pyprep

A Python implementation of the Preprocessing Pipeline (PREP) for EEG data

https://pyprep.readthedocs.io/en/latest/

MIT License

126 stars 30 forks source link

RANSAC correlations (and logic) differ between PyPREP and MATLAB PREP #65

Closed a-hurst closed 3 years ago

a-hurst commented 3 years ago

This bit has been harder to compare than earlier parts, but I think I can say with a high degree of confidence now that RANSAC correlations differ between PyPREP and MATLAB PREP.

Before I get into the details, some other relevant stuff I've learned:

MatPREP's internal spherical_interpolation function and MNE's internal _make_interpolation_matrix functions produce identical outputs, given the same inputs! This is a huge relief, and one less thing we have to worry about or figure out.
Conversely, there's something weird about how MNE's mne.channels.read_custom_montage imports .elp files, like the one bundled with EEGLAB we've been using for the MatPREP data. Specifically, if I just load my pre-cleanlined and montage'd .set/.fdt files into MNE, the Cartesian X/Y/Z coordinates for the electrodes more-or-less line up with MATLAB (the X and Y axes are swapped) and the interpolation matrices match exactly. However, if I re-read the standard-10-5-cap385.elp montage into MNE and apply it to the file, the coordinates and interpolation matrices both differ quite a bit. Not sure what this is all about, but regardless I don't think it's a PyPREP bug.

Anyway, the gist of the difference (and why it's been so hard to dig into): MatPREP and PyPREP have two very different approaches of getting the same RANSAC correlation results. Here's a quick overview:

MATLAB PREP's RANSAC:

Choose random channels for each RANSAC sample.
Generate interpolation matrix for all channels for each RANSAC sample.
For each 5 second RANSAC window in the data, calculate the predicted signal for each channel across each of the 50 RANSAC samples, and then correlate the predicted values with the actual values to get a correlation for each channel in that window.
After calculating a matrix of correlations of size [number_of_ransac_windows, number_of_channels], apply thresholding and identify channels that fall below target.

PyPREP's RANSAC:

Choose random channels for each RANSAC sample.
Determine the amount of available RAM and split the channels-to-predict into chunks accordingly.
For each chunk of channels to predict:
- For each of the 50 RANSAC samples, calculate the interpolation matrix and predict the full signal for those channels, saving the output into one large eeg_predictions matrix.
- After interpolating the signal for all channels in the chunk for all 50 samples, calculate the median EEG signal for each channel from those 50 individual predictions.
- For each 5 second RANSAC window in the data, calculate the correlation between the median predicted signal and the actual signal for the channels in that chunk.
After calculating a matrix of correlations of size [number_of_ransac_windows, number_of_channels], apply thresholding and identify channels that fall below target.

Anyway, the output correlations of each implementation differ, even with the same input. I know there's a deliberate difference somewhere regarding a median calculation, but I couldn't remember what it was and given how different the code layout is it hasn't popped out to me. @sappelhoff @yjmantilla @jasmainak do any of you remember what this difference was, so I can try reversing it and see if it makes things match up?

Also regardless of other factors, the MatPREP approach seems to have much lower RAM demands since at no point are they estimating signals across the whole duration of the file. Should we consider changing PyPREP to work in a similar manner?

sappelhoff commented 3 years ago

This reminds me a lot of what @yjmantilla found in https://github.com/sappelhoff/pyprep/pull/53#issuecomment-784741114 - have you taken a look at that?

However, if I re-read the standard-10-5-cap385.elp montage into MNE and apply it to the file, the coordinates and interpolation matrices both differ quite a bit. Not sure what this is all about, but regardless I don't think it's a PyPREP bug.

could be an MNE bug --> if you have an MWE and can confirm that there is some kind of bug, opening as an MNE-Python issue might be worthwhile

I know there's a deliberate difference somewhere regarding a median calculation, but I couldn't remember what it was and given how different the code layout is it hasn't popped out to me. @sappelhoff @yjmantilla @jasmainak do any of you remember what this difference was, so I can try reversing it and see if it makes things match up?

I think you should start digging here about this: https://github.com/sappelhoff/pyprep/issues/21#issuecomment-610481083

Also regardless of other factors, the MatPREP approach seems to have much lower RAM demands since at no point are they estimating signals across the whole duration of the file. Should we consider changing PyPREP to work in a similar manner?

we had some discussion on that in #53 :thinking: in general I think that my RANSAC implementation back then did not optimally deal with RAM and we should fix that one way or the other.

a-hurst commented 3 years ago

This reminds me a lot of what @yjmantilla found in #53 (comment) - have you taken a look at that?

I hadn't, thanks! That looks like AutoReject's RANSAC behaves similarly to MATLAB PREP's.

could be an MNE bug --> if you have an MWE and can confirm that there is some kind of bug, opening as an MNE-Python issue might be worthwhile

Not sure how to determine whether the difference is due to a bug in MNE vs a bug in EEGLAB, but I'll look into it regardless! If I trim the .set/.fdt data so that it's small enough to share, a MWE would only be 2 or 3 lines of code.

I think you should start digging here about this: #21 (comment)

Ah, those two lines @jasmainak pointed out are the same two I stop being able to match up PyPREP's numbers with MatPREP's. Oddly enough, the matrix of EEG data that gets passed to that function as the first argument (squeeze(Xwin(:, :, k))', see here) matches up nicely with the numbers in PyPREP's data, but inside the function itself (just using breakpoints to stop and inspect), the values of XX seem totally different from those in the input matrix. No clue why, assuming there's some sort of MATLAB quirk I'm unaware of?

we had some discussion on that in #53 🤔 in general I think that my RANSAC implementation back then did not optimally deal with RAM and we should fix that one way or the other.

For the sake of debugging and getting RANSAC results to match, should I try reorganizing PyPREP's RANSAC to better match MatPREP's & AutoReject's? I'm happy to give it a shot, I just want to be sure it's something you'd be interested in first. I'd make sure to retain the proper median calculation, but would also add a matlab_strict argument or something to do same calculation as MatPREP for comparison purposes.

sappelhoff commented 3 years ago

Not sure how to determine whether the difference is due to a bug in MNE vs a bug in EEGLAB, but I'll look into it regardless!

awesome, thanks.

No clue why, assuming there's some sort of MATLAB quirk I'm unaware of?

🤷 😕

For the sake of debugging and getting RANSAC results to match, should I try reorganizing PyPREP's RANSAC to better match MatPREP's & AutoReject's? I'm happy to give it a shot, I just want to be sure it's something you'd be interested in first.

thanks - it's always good practice to clarify before working on a PR :-)

I think in #53 @yjmantilla and I agreed that we (well, @yjmantilla would do the work 😉 ) would make autoreject a dependency of pyprep, and use AR's ransac within pyprep.

I think with you (@a-hurst) currently being on a streak and debugging all sorts of issues along the pipeline, it may be a viable option to NOT make AR a dependency, and instead fix our RANSAC from the ground up (as you suggest). I would in general be in favor of that, because (i) adding fewer dependencies is nice, and we already have RANSAC here that we could "just fix" (if you can make it happen), (ii) the AR package is really more about autoreject ... RANSAC there is just a small feature that was used to benchmark autoreject against, when Mainak wrote up the paper --> whereas in pyprep, RANSAC is a core feature of the pipeline.

what do you think @yjmantilla ?

I'd make sure to retain the proper median calculation, but would also add a matlab_strict argument or something to do same calculation as MatPREP for comparison purposes.

👍

PS: you recently already opened an issue about the number of iterations for robust re-referencing on https://github.com/VisLab/EEG-Clean-Tools ... could you perhaps open another one on the median calculation and how it seems to not be a real median in some cases? I think it's only fair to inform those devs about mistakes we find along the way (even if the response rate over there seems to be low) :)

yjmantilla commented 3 years ago

I agree. Mainly the advantage or AR's ransac are:

that it is already validated against matlab (though the details and code is not available to us since it is part of jasmainak work in his phd i think, so that may be buried somewhere).
It is optimized to run across epochs, so ram consumption will be lower when #epochs > #channels which i would guess is a common case. In the future we could look into that.
Has some nice progress bars

I would be really happy to see pyprep's ransac working and compared against matlab. Thats why the effort of converting it into functions and setting the correlations as an output was done. Im really happy to see @a-hurst making use of it, that was the whole idea. Sadly, I cant really code much in pyprep right now because i have a ton of stuff going own. So thats why my PR is basically frozen.

So, if we can manage to validate against matlab we solve the main problem. Afterwards we can learn the nice bits of AR's RANSAC and incorporate them as we see fit. I think my PR could become just a particular comparison of our RANSAC to AR's RANSAC, rather than making the AR's ransac the one used in pyprep.

a-hurst commented 3 years ago

it may be a viable option to NOT make AR a dependency, and instead fix our RANSAC from the ground up (as you suggest). I would in general be in favor of that

Cool, I'll give it a shot! We're so close now to having NoisyChannels being functionally identical to the MatPREP implementation, given how long ago I opened #41 I'm excited to finally be this close at mostly solving it. Once we've established numeric equivalence with the original PREP, I'm also excited about being able to build off that as a baseline for future improvements (e.g. better handling of channels near the bottom of the cap to reduce false-positives).

PS: you recently already opened an issue about the number of iterations for robust re-referencing on https://github.com/VisLab/EEG-Clean-Tools ... could you perhaps open another one on the median calculation and how it seems to not be a real median in some cases?

For sure, though they don't seem to check the issues page too often. Still waiting on a response for my other inquiry.

I would be really happy to see pyprep's ransac working and compared against matlab. Thats why the effort of converting it into functions and setting the correlations as an output was done. Im really happy to see @a-hurst making use of it, that was the whole idea.

The functional RANSAC has definitely helped a lot, so thanks for all your efforts! I'm guessing my reimplementation would look sort of similar to AR's, in that I'm going to modify it to work window-wise instead of channel wise (since that makes MATLAB comparison easiest, and because of the reduced RAM demands for most datasets).

jasmainak commented 3 years ago

I read quickly through the messages. Just a thought -- if you guys want to take the Ransac implementation in autoreject and make algorithmic improvements and then maintain it in autoreject, it just might fit in there. For example, you could do cross-validation to fit some of the parameters in Ransac. Then you could get channel-wise bad segments using RANSAC but with an autoreject-like framework. Might get some speed improvements in autoreject etc. that way. That way, we can consolidate our efforts, learn from the past and also get academic credits in the process.

I'm not exactly sure what's the purpose of hunting down the tiny differences. IMO what we really care about at the end is whether the signal is clean or not. The tiny differences probably don't matter ultimately?

a-hurst commented 3 years ago

I'm not exactly sure what's the purpose of hunting down the tiny differences. IMO what we really care about at the end is whether the signal is clean or not. The tiny differences probably don't matter ultimately?

For me, the idea of getting full numeric parity with MATLAB PREP is so that we have a solid baseline to build off of, such that any deviations from it (like your median calculation improvement) are due to deliberate (and recorded) choice instead of unintentional differences in implementation (like the last 4 PRs for PyPREP addressed).

Right now (well, ignoring the latest round of PRs, I haven't re-tested), PyPREP ends up being a lot more aggressive than MATLAB PREP in terms of flagging channels as bad (see my post with visualizations here), and likewise a lot more variable between different random seeds. By getting exact parity with MATLAB PREP, that issue is addressed and we can finally move on to improving over the original.

As for your suggestion re: AutoReject, I think you're right that a unified RANSAC implementation is probably an ideal approach. For the sake of hacking this to completion for the first half of #58 I'm probably going to keep PyPREP and AutoReject separate, but if my code ends up being any good I'm definitely open to upstreaming changes to AutoReject and keeping the RANSAC implementations together in one codebase!

yjmantilla commented 3 years ago

What if we tested the ransac correlations between MATLAB and AR's RANSAC. That should be a match since Jasmainak already tested this.

We would need to copy the AR's RANSAC code though because two things:

Median Calculation Error
Draw random channels at precisely the same location as MatPrep (see https://github.com/sappelhoff/pyprep/issues/36#issuecomment-738540991)

I think those two things may already be incorporated in this code he shared us:

https://gist.github.com/jasmainak/d459dd128ded1e93b54050c7fab08e79

(Correlations at self.corr_)

So if we can replicate that test and have a working example of AR's RANSAC being the same as MATPrep Ransac then we could contribute directly to AR's RANSAC and start the unified RANSAC there.

I forgot to mention that the AR's RANSAC already has implemented some joblib parralelism so thats of value too.

If you want to see what would happen if whole pyprep would run with AR's RANSAC , the code for that is:

        if mode == 'autoreject':
            ransac = Ransac(picks=good_idx,n_resample=n_samples, min_channels=fraction_good, min_corr=corr_thresh, unbroken_time=fraction_bad, n_jobs=njobs, random_state=435656, verbose= 'tqdm')
            info = mne.create_info(self.ch_names_new.tolist(), self.sample_rate, ch_types='eeg', verbose=None)
            # Option 1: Single Epoch
            #EEGData_epoch = self.EEGData[np.newaxis, :, :]
            #epochs = mne.EpochsArray(EEGData_epoch,info)

            # Option 2: Epochs of corr_window_secs seconds as windows
            raw = mne.io.RawArray(self.EEGData,info)
            epochs = mne.make_fixed_length_epochs(raw, duration=corr_window_secs, preload=True,reject_by_annotation=False, verbose=None)

            # Either way...
            epochs.set_montage(self.montage)
            epochs_clean = ransac.fit(epochs)
            #print(epochs_clean.bad_chs_)
            self.bad_by_ransac = [i for i in epochs_clean.bad_chs_]
            return None
        ############AUTO REJECT####################

(whole code with context at https://github.com/sappelhoff/pyprep/issues/36#issuecomment-737662816 )

a-hurst commented 3 years ago

Just a quick update: I found some time yesterday to tackle this, and now I've got a hacked-together version producing identical correlation, thresholding, and bad channel predictions as MATLAB PREP!

The main bits have been broken down into reasonably-testable functions, and I've added a matlab_strict argument to toggle between MatPREP's wonky method of calculating medians and PyPREP/AutoReject's improved method. I'm cleaning up a PR now (in between other tasks), so expect that shortly!

yjmantilla commented 3 years ago

Amazing! This will be almost the single most important PR in pyprep!

a-hurst commented 3 years ago

However, if I re-read the standard-10-5-cap385.elp montage into MNE and apply it to the file, the coordinates and interpolation matrices both differ quite a bit. Not sure what this is all about, but regardless I don't think it's a PyPREP bug.

could be an MNE bug --> if you have an MWE and can confirm that there is some kind of bug, opening as an MNE-Python issue might be worthwhile

@sappelhoff Just wanted to note that I figured this one out too: apparently MNE imports the montage to be slightly off-center (?), given that when I correct the (x, y, z) origin to be (0, 0, 0) using the below code the raw RANSAC correlations match up extremely well with the ones you get when using the EEGLAB-imported coordinates from the .set/.fdt files (< 0.000001 differences):

ch_pos = self.raw_mne._get_channel_positions()
montage_origin = fit_sphere_to_headshape(
     self.raw_mne.info,
     verbose=False, 
     units="m"
)
ch_pos_centered = ch_pos - montage_origin[1]

Without the correction, the RANSAC correlation differences are much larger, within the range of < 0.01. Note that MNE's public interpolation functions do this same correction before interpolating signals.

Anyway, not sure if that counts as an MNE bug or not. I'm going to test out MatPREP with an MNI-space montage and see how it behaves: if it corrects-to-center as well, I'll open a new PR to always apply the above correction before RANSAC.

sappelhoff commented 3 years ago

Okay! Great to know, thanks for the update.

Anyway, not sure if that counts as an MNE bug or not.

not sure either 😕

I'm going to test out MatPREP with an MNI-space montage and see how it behaves: if it corrects-to-center as well, I'll open a new PR to always apply the above correction before RANSAC.

ok, sounds good!

a-hurst commented 3 years ago

@sappelhoff @yjmantilla Okay, so after some comparison testing between the window-wise RANSAC in my PR and the original PyPREP RANSAC, I think I've finally figured out the reason for the huge variability in PyPREP's RANSAC between seeds and relative to Matlab PREP. It had little to do with the correlation method or the median calculation method, and almost everything to do with a minor mistake in some array-reshaping code. See the lines below, which are supposed to chunk the actual and predicted data into equally-sized windows for correlation:

https://github.com/sappelhoff/pyprep/blob/032cad16793a5cbd00173d9960d19e365feb654c/pyprep/ransac.py#L274-L280

What we want it to do is take data like this (where A, B, and C are different channels):

[['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8'],
 ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8'],
 ['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8']]

... and reshape it into this (where the number of correlation windows is 4, and there are 2 samples per window):

[[['A1', 'A2'],
  ['B1', 'B2'],
  ['C1', 'C2']],

 [['A3', 'A4'],
  ['B3', 'B4'],
  ['C3', 'C4']],

 [['A5', 'A6'],
  ['B5', 'B6'],
  ['C5', 'C6']],

 [['A7', 'A8'],
  ['B7', 'B8'],
  ['C7', 'C8']]]

...such that when you iterate over the first axis, the windows of data look like the above chunks.

Instead, the PyPREP code above generates this:

[[['A1', 'A2', 'A3', 'A4'],
  ['A5', 'A6', 'A7', 'A8']],

 [['B1', 'B2', 'B3', 'B4'],
  ['B5', 'B6', 'B7', 'B8']],

 [['C1', 'C2', 'C3', 'C4'],
  ['C5', 'C6', 'C7', 'C8']]]

... which is then sliced along the last axis, resulting in windows of data like this:

[[['A1', 'A5'],
  ['B1', 'B5'],
  ['C1', 'C5']],

 [['A2', 'A6'],
  ['B2', 'B6'],
  ['C2', 'C6']],

 [['A3', 'A7'],
  ['B3', 'B7'],
  ['C3', 'C7']],

 [['A4', 'A8'],
  ['B4', 'B8'],
  ['C4', 'C8']]]

Obviously, that's going to throw off the correlations quite a bit! By adjusting the above code to work like this:

    # For the actual data
    data_window = data[chans_to_predict, : n * w_correlation]
    data_window = data_window.reshape(len(chans_to_predict), w_correlation, n)
    data_window = data_window.swapaxes(1, 0)

    # For the ransac predicted eeg
    pred_window = ransac_eeg[: len(chans_to_predict), : n * w_correlation]
    pred_window = pred_window.reshape(len(chans_to_predict), w_correlation, n)
    pred_window = pred_window.swapaxes(1, 0)

and then iterating over the first axis for calculating correlations, I get identical values to window-wise RANSAC when matlab_strict is False and I sub out the custom correlation function with np.corrcoef.

@sappelhoff now that I've finally tracked down the source, would you prefer if I made a new pull request including just the changes required for MATLAB equivalence and saved the window-wise stuff for a separate PR? I'd imagine that would be easier to review.

sappelhoff commented 3 years ago

I think I've finally figured out the reason for the huge variability in PyPREP's RANSAC between seeds and relative to Matlab PREP.

It had little to do with the correlation method or the median calculation method, and almost everything to do with a minor mistake in some array-reshaping code.

😨 my god ... thanks for catching this.

I get identical values to window-wise RANSAC when matlab_strict is False and I sub out the custom correlation function with np.corrcoef.

okay, I don't really get that 🤔

with "custom correlation function", do you mean MatPrep's sum(x ** 2) instead of np.corrcoef's sum((x - mean(x)) ** 2) (Pearson)?

So when doing it the full pyprep way (matlab_strict=False, and np.corrcoef instead of sum(x ** 2)) you get identical results to MatPREP? Only for a subset of results, or for all results (that seems impossible given the few differences we already found, such as the median calculation)?

now that I've finally tracked down the source, would you prefer if I made a new pull request including just the changes required for MATLAB equivalence and saved the window-wise stuff for a separate PR?

I am definitely all for separate PRs, with good (understandable) titles, nice what's new entries, and so on --> just like you've been doing all this time. That will make it much easier to understand the changes in the future.

a-hurst commented 3 years ago

with "custom correlation function", do you mean MatPrep's sum(x ** 2) instead of np.corrcoef's sum((x - mean(x)) ** 2) (Pearson)?

So when doing it the full pyprep way (matlab_strict=False, and np.corrcoef instead of sum(x ** 2)) you get identical results to MatPREP? Only for a subset of results, or for all results (that seems impossible given the few differences we already found, such as the median calculation)?

Whoops, sorry, I should have been more clear: with the above changes, the current PyPREP method (using proper median calculation and np.corrcoef) exactly matches the window-wise PyPREP method in my PR (modified to also use proper median calculation and np.corrcoef). Since the window-wise output matches MatPREP exactly when using its special median and correlation logic, we know that the only three mathematical differences between current PyPREP and MatPREP are a) this reshaping bug, b) the median calculation method, and c) the correlation calculation method.

I am definitely all for separate PRs, with good (understandable) titles, nice what's new entries, and so on --> just like you've been doing all this time. That will make it much easier to understand the changes in the future.

Cool, will get right on it. So satisfying to have finally hunted down this bug after months of off-and-on attempts to find it. If I ever had a white whale of code, this would be it!

sappelhoff commented 3 years ago

we know that the only three mathematical differences between current PyPREP and MatPREP are a) this reshaping bug, b) the median calculation method, and c) the correlation calculation method.

ok thanks, got it.

So satisfying to have finally hunted down this bug after months of off-and-on attempts to find it. If I ever had a white whale of code, this would be it!

yes, great job! :) Looking forward to getting all of this work merged and finishing up with adding CI tests ... and then probably a 0.4 release!