greenelab / connectivity-search-analyses

hetnet connectivity search research notebooks (previously hetmech)
BSD 3-Clause "New" or "Revised" License
8 stars 5 forks source link

Consider xarray for storing hetnets as multidimensional arrays #12

Closed dhimmel closed 6 years ago

dhimmel commented 7 years ago

Check out the xarray package (https://github.com/pydata/xarray) to store hetnets as a multidimensional array.

dhimmel commented 7 years ago

@kkloste, the lack of row / column names in numpy.ndarray and scipy.sparse are becoming a real impediment.

I'm thinking that all user-facing functions should return xarray.DataArrays for this reason. We'll have to achieve the right balance where internal functions do what's efficient, while external functions return something useful (i.e. with names).

Currently, I think diffusion.diffuse and degree_weight.dwwc are the two user facing functions.

kkloste commented 7 years ago

That all makes sense to me.

dhimmel commented 7 years ago

Here's a diagram (Figure 1A from Discovering disease-disease associations by fusing systems-level molecular data) that illustrates how a hetnet can be represented as an adjacency matrix for each edge type (metaedge):

Note that the Gene × Gene position has multiple adjacency matrixes. This is because there are multiple edge types that go from gene to gene. Note that our adjacency matrices are binary (rather than weighted like in the diagram).

dhimmel commented 6 years ago

We've exported hetnets to xarrays in 2.xarray.ipynb. However, due to it's lack of sparse matrix support, we haven't proceeded with using xarray. See #95 for discussion of various graph data structures. We're proceeding with our custom HetMat solution.