Closed almaan closed 1 year ago
Hi Alma,
thanks for raising your thoughts here!
I’ll try to clarify the output a bit and tag @ivirshup here.
sc.pp.neighbors
produces two main results, which it indeed stores in the ad.obsp
:
A distance matrix in adata.obsp['distances']
. This matrix has shape (n_obs, n_obs): for each observation, only n_neighbors-1
entries will be non-zero. The nearest neighbor of an observation, itself with distance 0, is discarded, hence the -1
. It is probably what you have been thinking of in your description.
A connectivity graph in adata.obsp['connectivity']
. This graph has shape (n_obs, flexible), where the flexible number of connections for each observation are determined during the UMAP algorithm.
Hence if you’re interested in the distance matrix, adata.obsp['distances']
would be what you’re looking for! Coming back to your code example, here the test should be a pass:
# Import packages
import scanpy as sc
import anndata as ad
import numpy as np
# set random seed
np.random.seed(42)
# create dummy data
adata = ad.AnnData(shape=(1000,1))
adata.obsm['rep'] = np.random.random(size = (1000,2))
# get spatial connectivities
k = 10
sc.pp.neighbors(adata, n_neighbors=k, use_rep = 'rep', knn = True)
# get and count connectivities for each cell
gr = adata.obsp['distances']
nn = (np.array(gr.todense()) > 0).sum(axis=1).flatten()
# check if neighbors are equal to k-1
np.testing.assert_equal(nn, k-1)
Might actually try to clarify this in documentation, small PR addressing this will follow soon.
How does that sound to you? Please persist if you think I miss the point!
That being said, I think that the computation of the distance matrix and the connectivity graph are both correct.
At the moment I believe this might be more of a documentation issue rather than a bug so I changed the label - if you'd have a follow-up or related issue kindly let me know :)
We will close the issue for now, hopefully this has been addressed helpfully :)
However, please don't hesitate to reopen this issue or create a new one if you have any more questions or run into any related problems in the future.
Thanks for being a part of our community! :)
Hej,
first of all, sorry for a slow reply. Thank you @eroell for the explanation, that fully makes sense - but as you're saying, perhaps a clarification in the documentation would make sense. I wasn't the only one in my team confused by this.
Thank you for maintaining this package and all the great work you guys are doing!
Hi,
thanks a lot for the follow-up here! Happy to hear the explanation made sense - Hopefully the change in the doc adds some clarification from the start.
Thanks for being a part of this community!
Please make sure these conditions are met
What happened?
Hej!
Thanks for maintaining such a great package! This issue relates another issue posted (by me) in the
squidpy
repo, but I think it might be worth bringing up here as well.The issue in question is how the
sc.pp.neighbors
function returns an inconsistent number of neighbors even whenknn=True
. In the documentation ofsc.pp.neighbors
it's stated that :and
as well as
Hence I would expect that the number of non-zero elements in
adata.obsp['connectivities']
in an object to whichsc.pp.neighbors(adata, n_neighbors = k, knn = True)
have been applied, would sum tok
for each row. However, when inspecting these results, it is not true. The number of non-zero elements in a row varies between both higher as well as lower values than the specifiedn_neighbors
(obviously, sometimes it's also the expectedn_neighbors
value).Perhaps I'm misunderstanding something, but this behavior is somewhat counterintuitive to me and not what I expect; happy to be corrected though!
/Alma
Minimal code sample
Error output
No response
Versions