Closed adam2392 closed 3 years ago
My opinion is perhaps I will probably keep the connectivity containers rather lightweight for now. They are literally a structured lumpy array for all intents and purposes (which makes me think... should we be using xarray?).
We can tack on the plotting and saving next, which will greatly simplify user life. Then simultaneously add the io to load in a connectivity dataset, so its full loop.
@larsoner can you take a look at the CircleCI changes here and lmk if that's what you had in mind?
The only question is once there is a new maintenance build (e.g. maint/0.2
), then how do I take the existing stable/
docs folder in gh-pages
and rename them to maint/0.1
? Do I do that manually?
+1 for trying out xarray
here. It's something I've often wished for in MNE-Python and I would be much more willing to take the plunge if some of the pitfalls could be worked out in advance on a smaller subset of our codebase. This seems like a good test case if @adam2392 is up for it. That said, if it looks like too big of a rabbit hole, I wouldn't want it to derail getting the actual migration done.
Yes during release of 0.2 you manually copy stable/
to 0.1/
, you should see a step like this in the MNE-Python release wiki instructions
And yes that's about what I'd expect for CircleCI
+1 for trying out
xarray
here. It's something I've often wished for in MNE-Python and I would be much more willing to take the plunge if some of the pitfalls could be worked out in advance on a smaller subset of our codebase. This seems like a good test case if @adam2392 is up for it. That said, if it looks like too big of a rabbit hole, I wouldn't want it to derail getting the actual migration done.
I'll file this as a new issue then to see if its possible. I'm not as familiar with Xarray, so I will try keeping the containers pretty lightweight for now.
I've added some changes to incorporate xarray-like api now. Unit tests/docs still need work, but this is the general direction and I really like it! xarray gives a nice structured API.
Any thoughts @larsoner @britta-wstnr ?
Okay now everything works internally, imo is backwards compatible but now the sphinx documentation building now breaks:
sphinx-build -b html -d _build/doctrees -nWT --keep-going . _build/html
Running Sphinx v4.0.2
making output directory... done
checking for /home/circleci/project/doc/references.bib in bibtex cache... not found
parsing bibtex file /home/circleci/project/doc/references.bib... parsed 9 entries
[autosummary] generating autosummary for: api.rst, bibliography.rst, index.rst, install.rst, whats_new.rst
[autosummary] generating autosummary for: /home/circleci/project/doc/generated/mne_connectivity.envelope_correlation.rst, /home/circleci/project/doc/generated/mne_connectivity.phase_slope_index.rst, /home/circleci/project/doc/generated/mne_connectivity.spectral_connectivity.rst
loading intersphinx inventory from https://docs.python.org/3/objects.inv...
loading intersphinx inventory from https://mne.tools/dev/objects.inv...
loading intersphinx inventory from https://mne.tools/mne-bids/dev/objects.inv...
loading intersphinx inventory from https://numpy.org/devdocs/objects.inv...
loading intersphinx inventory from https://scipy.github.io/devdocs/objects.inv...
loading intersphinx inventory from https://matplotlib.org/objects.inv...
loading intersphinx inventory from https://pandas.pydata.org/pandas-docs/dev/objects.inv...
loading intersphinx inventory from https://scikit-learn.org/stable/objects.inv...
loading intersphinx inventory from https://docs.pyvista.org/objects.inv...
loading intersphinx inventory from https://joblib.readthedocs.io/en/latest/objects.inv...
loading intersphinx inventory from https://nipy.org/nibabel/objects.inv...
loading intersphinx inventory from http://nilearn.github.io/objects.inv...
generating gallery...
Using Sphinx-Gallery to convert rst text blocks to markdown for .ipynb files.
generating gallery for auto_examples... [ 11%] mne_inverse_psi_visual.py
make: *** [Makefile:56: html] Killed
CircleCI kills things when memory usage is too high. To me it suggests that some of the memory optimizations / list / generator support from MNE-Python's code and example were removed
... and locally if you run the example with mprof run ...; mprof plot
or just look at a process manager while it builds, you can see if this is the case or not
I don't think we need to insert Nan if it's a sparse representation. Instead, if the element is not in the matrix we assume it's not computed
Unfortunately, scipy sparse would not work I think, since all these connectivity methods place a meaning on the value "0".
So I'm thinking of making this work at the "container" level. I'm thinking that if indices
is defined, then that means there is one less "axis" defined in the data array. e.g.
dense all-to-all: (N, N, freqs, times)
, with names (node_in, node_out, freqs, times)
would become (len(indices), freqs, times)
, and then the names would be (node_in to node_out, freqs, times)
and other metadata that depends on axes would need to also be re-defined appropriately.
Unfortunately, scipy sparse would not work I think, since all these connectivity methods place a meaning on the value "0".
I don't think this is necessarily a problem because you can have sparse matrices with explicit zeros. Also for ndarray-sparse there is https://github.com/pydata/sparse. But I agree it's not ideal.
dense all-to-all: (N, N, freqs, times), with names (node_in, node_out, freqs, times) would become (len(indices), freqs, times), and then the names would be (node_in to node_out, freqs, times) and other metadata that depends on axes would need to also be re-defined appropriately.
One unified dense way would be to always have it be:
(n_estimated, freqs, times)
indices
, which is an ndarray of shape (2, n_estimated)
node_names
, which is of length n_nodes
All index pairs in indices
must be between 0
and n_nodes - 1
(inclusive). The all-to-all case can have indices='all'
, which would be functionally equivalent to indices=(0, 0), (0, 1), ..., (0, n_nodes-1), (1, 0), (1, 1), ..., (n_nodes-1, n_nodes-2), (n_nodes-1, n_nodes-1))
(but why bother storing it). We could have indices='symmetric'
which would just store the upper-right triangle (and not bother with the indices, for compactness).
One unified dense way would be to always have it be:
- Data of shape
(n_estimated, freqs, times)
- attribute
indices
, which is an ndarray of shape(2, n_estimated)
- attribute
node_names
, which is of lengthn_nodes
All index pairs in
indices
must be between0
andn_nodes - 1
(inclusive). The all-to-all case can haveindices='all'
, which would be functionally equivalent toindices=(0, 0), (0, 1), ..., (0, n_nodes-1), (1, 0), (1, 1), ..., (n_nodes-1, n_nodes-2), (n_nodes-1, n_nodes-1))
(but why bother storing it). We could haveindices='symmetric'
which would just store the upper-right triangle (and not bother with the indices, for compactness).
I like this API sketch, because it can also account for the 'symmetric' case. I suppose then, indices
is either all
, symmetric
or a list of tuples basically with the default being all
.
However, in terms of array operations, like shape, dims, etc. should I assume them to be like a "node X node" array? I'm a little hesitant on exposing it as a (n_nodes*n_nodes, freqs, times)
shaped array, but I could imagine we provide optionality for representing it also as a connectivity matrix in the get_data()
and other functions.
However, in terms of array operations, like shape, dims, etc. should I assume them to be like a "node X node" array? I'm a little hesitant on exposing it as a (n_nodes*n_nodes, freqs, times) shaped array, but I could imagine we provide optionality for representing it also as a connectivity matrix in the get_data() and other functions.
Yes internally I would store everything as (n_estimated, ...)
, even full/all matrices (so (n_nodes * n_nodes, ...)
. We can provide options in get_data(..., output='square')
(default 'condensed'
or 'raveled'
or something?) to output the dense/square shaped (n_nodes, n_nodes, ...)
versions regardless of what indices
is. (It's just for some examples, like the PSI visual one you're working on, doing output='full'
might lead to a MemoryError, so it's not recommended in those cases.) We can cross this bridge later, though.
I like the idea of having an option to get either the square or condensed output, that would make individual plotting solutions way easier for the user (given that there are probably more inputs to compute connectivity for than we can account for).
My only qualm that is coming up rn with my changes is how to make things play nicely with xarray. E.g. I'm wondering if xarray is still worth it? And also how should the collapsed "list of node_in -> node_out" be named?
My intuition is saying keep xarray, but come up some better naming, and then just try to wrangle the API to make it work, since xarray brings some niceties to the dimensions + allows storage of "metadata".
Perhaps I'll show you all through Zoom later also. Accounting for the indices makes the code more complex.
Short version of what we discussed off-GH:
Yay finally everything passed: https://99-369628653-gh.circle-artifacts.com/0/dev/index.html
I looked at the rendered examples and compared the output to the MNE homepage (stable) and the plots all match. Well done on getting it all up and running here!
Lmk if you have any qualms or issues with the way things look and I can go ahead and merge this in!
CIs are failing, let me know when they're green and I'll take a look!
CIs are failing, let me know when they're green and I'll take a look!
Oops there were some import errors. Kay its good now!
@adam2392 something must be wrong with your local git
configuration, the commits you're pushing are not properly linking to your GitHub account (see the icons to the left of your commits toward the end of this page, they should link to your GitHub avatar but don't). Looking at the git log
locally I see:
Author: Adam Li <adam2392@Adams-MacBook-Pro-2.local>
so you might need to edit ~/.gitconfig
to have:
[user]
name = ...
email = ...
set properly. git config --global
can probably be used to do the same...
@adam2392 something must be wrong with your local
git
configuration, the commits you're pushing are not properly linking to your GitHub account (see the icons to the left of your commits toward the end of this page, they should link to your GitHub avatar but don't). Looking at thegit log
locally I see:Author: Adam Li <adam2392@Adams-MacBook-Pro-2.local>
so you might need to edit
~/.gitconfig
to have:[user] name = ... email = ...
set properly.
git config --global
can probably be used to do the same...
Yay it's fixed!
Cool, I think this is good to go now!
I'll address
symmetric
, example of Connectivity, refactoring envelope_correlation
in the next PR.
Closes: #13
Begin development of a connectivity data structure that holds connectivity data.
Also updated the circleCI for attempts at building docs for
dev/
and building docs forstable/
.Also removed GH actions for style checking for all python versions and now doing only for Python3.6.
TODO:
Add an example to show-case it