[ENH, WIP] Add multivariate connectivity methods

tsbinns commented 1 year ago

@adam2392 Sorry for the delay, but I finally have something I think we can move forward with.

What I have implemented?

Added multivariate measures of the imaginary part of coherency (maximised imaginary coherence, MIC, and multivariate interaction measure, MIM; this includes spatial patterns of connectivity for MIC).
Added multivariate spectral Granger causality (GC). Also made a slight change to the implementation which keeps the final results the same, but significantly cleans up the code.
These methods have been implemented in the existing spectral_connectivity_epochs function, with the original structure of e.g. the indices parameter (i.e. no support for ragged indices added yet, as we discussed).
Updated the documentation of spectral_connectivity_epochs to reflect these new measures.
Added new attributes to the connectivity classes. These are used to store information associated with the multivariate measures (rank, n_lags, and patterns).
Support for parallelisation of these methods using the MNE wrapper for joblib is present.
Added a (basic) set of unit tests for the new measures.
Extensive examples for the newly-added measures. These include background information/explanations of the methods, demonstrations of how to use the various settings in the implementations, how to interpret the results, what the limitations of these methods are, etc...

What are the limitations?

Only bivariate or multivariate methods can be requested at any one time. This relates to the existing spectral_connectivity_epochs function making assumptions about the number of signals in the seeds/targets matching the number of connections, which is not the case for the multivariate methods at this time. It is quite easy (and clean) to adapt the functions to handle this differently when only multivariate methods are called, however things become much trickier when trying to work with bivariate and multivariate methods simultaneously. If you would like the ability to compute bivariate and multivariate methods simultaneously to be included, perhaps we can adapt some of the functions to make this easier, otherwise we could stick with the current limitation I have imposed.
Only one multivariate connection can be computed at any one time. This is not possible without changing the stucture of indices.
When computing multivariate connectivity, the number of seeds and targets for this single connection must be identical. Again, this is not possible without allowing indices be ragged.
Support for computing GC on frequency bands is not yet possible. This relates to the cross-frequency nature of GC measures. I had a solution which I had implemented in the code for my previous PR, however I think it would be good to discuss this particular issue before proceeding.

What have I not added?

No unit tests for checking whether the methods are "correct". I remember we discussed this in terms of adding some simulated data with a known connectivity distribution and testing the methods using this data. This is still on my to-do list; I just need to figure out how I could best do this with the smallest amount of data possible (so as not to bloat the package).
No version of the methods in spectral_connectivity_time. Again, this is still on my to-do list. I have structured the MIC/MIM and GC computation code in a way that we could very easily re-use the code I have added here and just switch out the time dimension (used in case of time-frequency modes) to store epoch information, so I do not consider this a major hurdle.

There is, however, a bigger issue: the assertion of the connectivity object in test_spectral_connectivity_parallel matching that which has been saved and then reloaded now fails, because there is ~1 kB size difference in the objects. As far as I can tell, all of the critical information in the objects are identical (e.g. the results are the same, all entries in attrs are the same). Interestingly, this only occurs for the multitaper and fourier modes. I checked in the original test, and the size of the saved and reloaded connectivity object is also not a perfect match, however it seems to be rounded to the same "~XX kB". My hunch would be that the attrs I added to the connectivity classes (rank, n_lags, and patterns), even when unfilled, are pushing the estimated size in the repr to be rounded up when combined with the pre-existing small difference in size that occurs when saving and reloading a connectivity object. Even though I think everything is in order, I am clearly not happy with this test suddenly failing, so if you have any ideas as to why this is happening, I would really appreciate it!

The test in question: https://github.com/tsbinns/mne-connectivity/blob/100e235c8520664510401d73394df1c953a6721f/mne_connectivity/spectral/tests/test_spectral.py#L134-L145

An example of the reprs from the test with the multitaper mode:

What are the next steps?

Adding further unit tests, as well as implementing these measures in spectral_connectivity_time are my two biggest targets. Once I have your input, I am happy to also start addressing the limitations I listed (e.g. by adding support for ragged indices). Making sure all tests are passing would also be nice!

One again, I am very sorry for the delay (a lot of other work got in the way), but I am still very excited to move forward with this! If there is anything you think would be best discussed over a call, I am very happy to do that again. I will also answer anything here as soon as I can.

Cheers, Thomas

adam2392 commented 1 year ago

Sorry for the delay. We fixed any irrelevant CI issues in #139, but the rest it seems stems from changes in this PR. @tsbinns

tsbinns commented 1 year ago

@adam2392 I have been continuing to add the changes/fixes discussed above.

The main thing now would be how to handle the ragged arrays that could occur when working with multivariate methods (e.g. indices, results, patterns). We discussed some ideas for this, but never came to a decision. Would you prefer it if I add some MultivariateMixIn class, or would you want something more general like an N-dimensional results class of its own that could be used for other purposes as well? I think that would be the last big step before this can be finalised.

Apologies for the delays in sorting this; unfortunately I have not been able to dedicate as much time as I would have liked to the PR, but it still something I am very much interested in!

adam2392 commented 1 year ago

The main thing now would be how to handle the ragged arrays that could occur when working with multivariate methods (e.g. indices, results, patterns). We discussed some ideas for this, but never came to a decision. Would you prefer it if I add some MultivariateMixIn class, or would you want something more general like an N-dimensional results class of its own that could be used for other purposes as well? I think that would be the last big step before this can be finalised.

Let's revisit this when this PR is merged if that's okay with you? IIUC, we don't need the ragged part for all the methods, and thus, per https://github.com/mne-tools/mne-connectivity/pull/125#issuecomment-1435723509, we can implement this first?

I think the ragged part would take some discussion and ideally some thought to make it as lightweight as possible and compatible with where scientific python is heading: https://discuss.scientific-python.org/t/ragged-array-summit/465/4. We might consider adding an optional dependency, implementing the soln. ourselves, or a combination.

Apologies for the delays in sorting this; unfortunately I have not been able to dedicate as much time as I would have liked to the PR, but it still something I am very much interested in!

No problem! I've been quite swamped. Ping me whenever you have the changes/fixes up.

tsbinns commented 1 year ago

DONE:

removed the explicit attributes for multivariate info, now just stored in attrs
general improvements to make the examples clearer
general documentation updates
fixed the bugs that were causing tests to fail

IN PROGRESS:

adding the methods to spectral_connectivity_time

adam2392 commented 1 year ago

adding the methods to spectral_connectivity_time

Hi @tsbinns how is this part going? Would you like me to review anything? Thanks!

tsbinns commented 1 year ago

adding the methods to spectral_connectivity_time

Hi @tsbinns how is this part going?

Hi @adam2392, this is still a WIP, however from Monday I have 2 full weeks to dedicate to this, so it will be finished then. I will tag you once I have it and all tests are passing.

Would you like me to review anything?

I would be interested to hear your thoughts on the examples I added (mic_mim.py and granger_causality.py) and documentation changes I made for spectral_connectivity_epochs. Do you think there are any improvements I could make? Cheers!

tsbinns commented 1 year ago

Hi @adam2392, I have now:

added support for the methods in _spectral_connectivitytime
added tests for the methods in both _spectral_connectivityepochs and time (including a regression test for some example data for comparing results against the established MATLAB implementation of the methods)
fixed some bugs which I had previously missed

In terms of the goals of this PR I believe that is everything finished, assuming it is up to standard. I would be very grateful for any feedback!

tsbinns commented 11 months ago

Thanks very much for the feedback @adam2392! I believe I have addressed everything, but please unresolve any comments you think I should work on further.

Also please let me know when you think it's appropriate for me to update the whats_new.rst.

Cheers!

adam2392 commented 11 months ago

@tsbinns feel free to add a changelog entry rn that summarizes this major feature effort! Follow the pattern there to link your name/github page.

I will take a look later in-depth on the code.

adam2392 commented 11 months ago

LGTM. Thanks @tsbinns and team!

drammock commented 11 months ago

congrats on getting this one merged in! great effort.

mne-tools / mne-connectivity