Closed ivirshup closed 3 years ago
I think this has something to do with treating the graph as undirected, since you get the same result if you only provide the upper triangular matrix:
You're right, it's using undirected graph under the hood.
pinging @giovp regarding whether this is was done by design or by mistake. Also, I am in support of the weights, but maybe make it optional for backwards compatibility.
I think it's true and I think this is a bug, just checked again and the problem lies here: https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.from_scipy_sparse_matrix.html
we should explicitly set nx.from_scipy_sparse_matrix(A, create_using=nx.DiGraph)
otherwise by default it would use default=nx.Graph
. Great catch @ivirshup !
This should also be changed here: https://github.com/theislab/squidpy/blob/b3b5f8a4784250c6792137853092a3d5c9dbf1c7/squidpy/gr/_nhood.py#L245
Also agree on the weights, and would also prefer your numba implementations over wrapping networkx! Would be great if you can open a PR @ivirshup , thank you!
closed via #302
Description
I have a few questions about the
interaction_matrix
function. First, am I using the same definition as you?To me, interaction matrix
M
for graphG
and labellingc
,M[i, j]
should be the count of edges between thei
nodes and thej
nodes.If this is the case, shouldn't
M.sum()
be equal to the edge count of the graph?I think this has something to do with treating the graph as undirected, since you get the same result if you only provide the upper triangular matrix:
But these graphs aren't always going to be symmetric, so should this be the case?
Side note, wouldn't it be nice to be able to use the weights of the edges?
I've also got a little implementation that has these properties, and is a bit faster. Would be happy to make the PR.
Impl
```python import pandas as pd import numpy as np from numba import njit from scipy import sparse def interaction_matrix(g: sparse.spmatrix, labels, *, dtype=None, weights=False) -> pd.DataFrame: labels = pd.Series(labels).astype("category", copy=False) g = sparse.csr_matrix(g, copy=False) if weights: g_data = g.data else: g_data = np.ones(len(g.data), dtype=bool) if dtype is None: if pd.api.types.is_bool_dtype(g.dtype) or pd.api.types.is_integer_dtype(g.dtype): dtype = np.intp else: dtype = np.float64 n_cats = len(labels.cat.categories) output = np.zeros((n_cats, n_cats), dtype=dtype) return pd.DataFrame( _interaction_matrix(g_data, g.indices, g.indptr, np.asarray(labels.cat.codes), output=output), index=labels.cat.categories, columns=labels.cat.categories, ) @njit def _interaction_matrix(data, indices, indptr, cats, output): indices_list = np.split(indices, indptr[1:-1]) data_list = np.split(data, indptr[1:-1]) for i in range(len(data_list)): cur_row = cats[i] cur_indices = indices_list[i] cur_data = data_list[i] for j, val in zip(cur_indices, cur_data): cur_col = cats[j] output[cur_row, cur_col] += val return output # Also good, a bit slower def interaction_matrix(g, labels, *, weights=False): from sklearn.preprocessing import LabelBinarizer binarizer = LabelBinarizer(sparse_output=True) L = binarizer.fit_transform(labels) if not weights: g = g.astype(bool) return pd.DataFrame( L.T @ g @ L, index=binarizer.classes_, columns=binarizer.classes_, ) ```Version