[MRG] Adding connectivity data structure implemented with xarray backbone

adam2392 commented 3 years ago

Closes: #13

Begin development of a connectivity data structure that holds connectivity data.

Also updated the circleCI for attempts at building docs for dev/ and building docs for stable/.

Also removed GH actions for style checking for all python versions and now doing only for Python3.6.

TODO:

[x] Refactor existing methods to use this
[ ] ~~Add an example to show-case it~~
[x] Adjust the unit tests to work with the api change
[x] Adjust the docs to work with the api change

adam2392 commented 3 years ago

My opinion is perhaps I will probably keep the connectivity containers rather lightweight for now. They are literally a structured lumpy array for all intents and purposes (which makes me think... should we be using xarray?).

We can tack on the plotting and saving next, which will greatly simplify user life. Then simultaneously add the io to load in a connectivity dataset, so its full loop.

adam2392 commented 3 years ago

@larsoner can you take a look at the CircleCI changes here and lmk if that's what you had in mind?

The only question is once there is a new maintenance build (e.g. maint/0.2), then how do I take the existing stable/ docs folder in gh-pages and rename them to maint/0.1? Do I do that manually?

drammock commented 3 years ago

+1 for trying out xarray here. It's something I've often wished for in MNE-Python and I would be much more willing to take the plunge if some of the pitfalls could be worked out in advance on a smaller subset of our codebase. This seems like a good test case if @adam2392 is up for it. That said, if it looks like too big of a rabbit hole, I wouldn't want it to derail getting the actual migration done.

larsoner commented 3 years ago

Yes during release of 0.2 you manually copy stable/ to 0.1/, you should see a step like this in the MNE-Python release wiki instructions

larsoner commented 3 years ago

And yes that's about what I'd expect for CircleCI

adam2392 commented 3 years ago

+1 for trying out xarray here. It's something I've often wished for in MNE-Python and I would be much more willing to take the plunge if some of the pitfalls could be worked out in advance on a smaller subset of our codebase. This seems like a good test case if @adam2392 is up for it. That said, if it looks like too big of a rabbit hole, I wouldn't want it to derail getting the actual migration done.

I'll file this as a new issue then to see if its possible. I'm not as familiar with Xarray, so I will try keeping the containers pretty lightweight for now.

adam2392 commented 3 years ago

I've added some changes to incorporate xarray-like api now. Unit tests/docs still need work, but this is the general direction and I really like it! xarray gives a nice structured API.

Any thoughts @larsoner @britta-wstnr ?

adam2392 commented 3 years ago

Okay now everything works internally, imo is backwards compatible but now the sphinx documentation building now breaks:

sphinx-build -b html -d _build/doctrees  -nWT --keep-going . _build/html
Running Sphinx v4.0.2
making output directory... done
checking for /home/circleci/project/doc/references.bib in bibtex cache... not found
parsing bibtex file /home/circleci/project/doc/references.bib... parsed 9 entries
[autosummary] generating autosummary for: api.rst, bibliography.rst, index.rst, install.rst, whats_new.rst
[autosummary] generating autosummary for: /home/circleci/project/doc/generated/mne_connectivity.envelope_correlation.rst, /home/circleci/project/doc/generated/mne_connectivity.phase_slope_index.rst, /home/circleci/project/doc/generated/mne_connectivity.spectral_connectivity.rst
loading intersphinx inventory from https://docs.python.org/3/objects.inv...
loading intersphinx inventory from https://mne.tools/dev/objects.inv...
loading intersphinx inventory from https://mne.tools/mne-bids/dev/objects.inv...
loading intersphinx inventory from https://numpy.org/devdocs/objects.inv...
loading intersphinx inventory from https://scipy.github.io/devdocs/objects.inv...
loading intersphinx inventory from https://matplotlib.org/objects.inv...
loading intersphinx inventory from https://pandas.pydata.org/pandas-docs/dev/objects.inv...
loading intersphinx inventory from https://scikit-learn.org/stable/objects.inv...
loading intersphinx inventory from https://docs.pyvista.org/objects.inv...
loading intersphinx inventory from https://joblib.readthedocs.io/en/latest/objects.inv...
loading intersphinx inventory from https://nipy.org/nibabel/objects.inv...
loading intersphinx inventory from http://nilearn.github.io/objects.inv...
generating gallery...
Using Sphinx-Gallery to convert rst text blocks to markdown for .ipynb files.
generating gallery for auto_examples... [ 11%] mne_inverse_psi_visual.py
make: *** [Makefile:56: html] Killed

larsoner commented 3 years ago

CircleCI kills things when memory usage is too high. To me it suggests that some of the memory optimizations / list / generator support from MNE-Python's code and example were removed

larsoner commented 3 years ago

... and locally if you run the example with mprof run ...; mprof plot or just look at a process manager while it builds, you can see if this is the case or not

larsoner commented 3 years ago

I don't think we need to insert Nan if it's a sparse representation. Instead, if the element is not in the matrix we assume it's not computed

adam2392 commented 3 years ago

Unfortunately, scipy sparse would not work I think, since all these connectivity methods place a meaning on the value "0".

So I'm thinking of making this work at the "container" level. I'm thinking that if indices is defined, then that means there is one less "axis" defined in the data array. e.g.

dense all-to-all: (N, N, freqs, times), with names (node_in, node_out, freqs, times) would become (len(indices), freqs, times), and then the names would be (node_in to node_out, freqs, times) and other metadata that depends on axes would need to also be re-defined appropriately.

larsoner commented 3 years ago

Unfortunately, scipy sparse would not work I think, since all these connectivity methods place a meaning on the value "0".

I don't think this is necessarily a problem because you can have sparse matrices with explicit zeros. Also for ndarray-sparse there is https://github.com/pydata/sparse. But I agree it's not ideal.

dense all-to-all: (N, N, freqs, times), with names (node_in, node_out, freqs, times) would become (len(indices), freqs, times), and then the names would be (node_in to node_out, freqs, times) and other metadata that depends on axes would need to also be re-defined appropriately.

One unified dense way would be to always have it be:

Data of shape (n_estimated, freqs, times)
attribute indices, which is an ndarray of shape (2, n_estimated)
attribute node_names, which is of length n_nodes

All index pairs in indices must be between 0 and n_nodes - 1 (inclusive). The all-to-all case can have indices='all', which would be functionally equivalent to indices=(0, 0), (0, 1), ..., (0, n_nodes-1), (1, 0), (1, 1), ..., (n_nodes-1, n_nodes-2), (n_nodes-1, n_nodes-1)) (but why bother storing it). We could have indices='symmetric' which would just store the upper-right triangle (and not bother with the indices, for compactness).

adam2392 commented 3 years ago

One unified dense way would be to always have it be:

Data of shape (n_estimated, freqs, times)

attribute indices, which is an ndarray of shape (2, n_estimated)

attribute node_names, which is of length n_nodes

All index pairs in indices must be between 0 and n_nodes - 1 (inclusive). The all-to-all case can have indices='all', which would be functionally equivalent to indices=(0, 0), (0, 1), ..., (0, n_nodes-1), (1, 0), (1, 1), ..., (n_nodes-1, n_nodes-2), (n_nodes-1, n_nodes-1)) (but why bother storing it). We could have indices='symmetric' which would just store the upper-right triangle (and not bother with the indices, for compactness).

I like this API sketch, because it can also account for the 'symmetric' case. I suppose then, indices is either all, symmetric or a list of tuples basically with the default being all.

However, in terms of array operations, like shape, dims, etc. should I assume them to be like a "node X node" array? I'm a little hesitant on exposing it as a (n_nodes*n_nodes, freqs, times) shaped array, but I could imagine we provide optionality for representing it also as a connectivity matrix in the get_data() and other functions.

larsoner commented 3 years ago

However, in terms of array operations, like shape, dims, etc. should I assume them to be like a "node X node" array? I'm a little hesitant on exposing it as a (n_nodes*n_nodes, freqs, times) shaped array, but I could imagine we provide optionality for representing it also as a connectivity matrix in the get_data() and other functions.

Yes internally I would store everything as (n_estimated, ...), even full/all matrices (so (n_nodes * n_nodes, ...). We can provide options in get_data(..., output='square') (default 'condensed' or 'raveled' or something?) to output the dense/square shaped (n_nodes, n_nodes, ...) versions regardless of what indices is. (It's just for some examples, like the PSI visual one you're working on, doing output='full' might lead to a MemoryError, so it's not recommended in those cases.) We can cross this bridge later, though.

britta-wstnr commented 3 years ago

I like the idea of having an option to get either the square or condensed output, that would make individual plotting solutions way easier for the user (given that there are probably more inputs to compute connectivity for than we can account for).

adam2392 commented 3 years ago

My only qualm that is coming up rn with my changes is how to make things play nicely with xarray. E.g. I'm wondering if xarray is still worth it? And also how should the collapsed "list of node_in -> node_out" be named?

My intuition is saying keep xarray, but come up some better naming, and then just try to wrangle the API to make it work, since xarray brings some niceties to the dimensions + allows storage of "metadata".

Perhaps I'll show you all through Zoom later also. Accounting for the indices makes the code more complex.

larsoner commented 3 years ago

Short version of what we discussed off-GH:

Store indices, node names, and compact version of xarray (with 0->N-1 labeled "estimated" axis)
Provide convenience function to return square-ized xarray with labeled axes
Start from basic implementation and start adapting all examples
See what methods need to be added
Ensure all new methods are unit tested and documented, etc.

adam2392 commented 3 years ago

Yay finally everything passed: https://99-369628653-gh.circle-artifacts.com/0/dev/index.html

britta-wstnr commented 3 years ago

I looked at the rendered examples and compared the output to the MNE homepage (stable) and the plots all match. Well done on getting it all up and running here!

adam2392 commented 3 years ago

Lmk if you have any qualms or issues with the way things look and I can go ahead and merge this in!

larsoner commented 3 years ago

CIs are failing, let me know when they're green and I'll take a look!

adam2392 commented 3 years ago

CIs are failing, let me know when they're green and I'll take a look!

Oops there were some import errors. Kay its good now!

adam2392 commented 3 years ago

https://109-369628653-gh.circle-artifacts.com/0/dev/index.html

larsoner commented 3 years ago

@adam2392 something must be wrong with your local git configuration, the commits you're pushing are not properly linking to your GitHub account (see the icons to the left of your commits toward the end of this page, they should link to your GitHub avatar but don't). Looking at the git log locally I see:

Author: Adam Li <adam2392@Adams-MacBook-Pro-2.local>

so you might need to edit ~/.gitconfig to have:

[user]
        name = ...
        email = ...

set properly. git config --global can probably be used to do the same...

adam2392 commented 3 years ago

https://118-369628653-gh.circle-artifacts.com/0/dev/index.html

adam2392 commented 3 years ago

@adam2392 something must be wrong with your local git configuration, the commits you're pushing are not properly linking to your GitHub account (see the icons to the left of your commits toward the end of this page, they should link to your GitHub avatar but don't). Looking at the git log locally I see:
Author: Adam Li <adam2392@Adams-MacBook-Pro-2.local>
so you might need to edit ~/.gitconfig to have:
[user]
        name = ...
        email = ...
set properly. git config --global can probably be used to do the same...

Yay it's fixed!

adam2392 commented 3 years ago

Cool, I think this is good to go now!

I'll address

symmetric, example of Connectivity, refactoring envelope_correlation in the next PR.

mne-tools / mne-connectivity

[MRG] Adding connectivity data structure implemented with xarray backbone #6