scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.9k stars 599 forks source link

Ingest error when neighbors from bbknn #1201

Closed fidelram closed 4 years ago

fidelram commented 4 years ago

Ingest tries to search for the metric used when neighbors was called. When this information is not available it fails. Is there a workaround for this?

adata = sc.datasets.pbmc68k_reduced()
adata_ref = sc.datasets.pbmc3k_processed()

var_names = adata_ref.var_names.intersection(adata.var_names)
adata_ref = adata_ref[:, var_names]
adata = adata[:, var_names]

# add fake batch
adata_ref.obs['batch'] = pd.Categorical(np.random.choice(a=[0, 1, 2],    size=adata_ref.shape[0]))

sc.pp.pca(adata_ref)
sc.external.pp.bbknn(adata_ref, batch_key='batch')
sc.tl.umap(adata_ref)
sc.tl.ingest(adata, adata_ref, obs='louvain', embedding_method='umap')
scanpy/scanpy/tools/_ingest.py in _init_neighbors(self, adata, neighbors_key)
    283             dist_args = ()
    284 
--> 285         self._metric = neighbors['params']['metric']
    286         dist_func = named_distances[self._metric]
    287 

KeyError: 'metric'

Versions:

scanpy==1.4.7.dev83+g5345a50.d20200506

Koncopd commented 4 years ago

The workaround is to add metric manually to .uns['neighbors']['params']['metric']. However, i'm not sure it is conceptually right to use ingest with bbknn.

fidelram commented 4 years ago

Thanks.

jenzopr commented 4 years ago

Hi @Koncopd Can you elaborate a bit more on what to set .uns['neighbors']['params']['metric'] to?

Thanks!

LuckyMD commented 4 years ago

I think it's normally just the string "euclidean", but you can just test what is stored in .uns['neighbors']['params']['metric'] after running sc.pp.neighbors() on some test data.