theislab / scib

Benchmarking analysis of data integration tools
MIT License
287 stars 61 forks source link

kBET error: Please increase max_iterations or reduce k. #183

Closed Hrovatin closed 3 years ago

Hrovatin commented 3 years ago

I am trying to run scIB on different integration results of single dataset. For some integrations results I get the below error from kBET. There is a message what to do to get rid of it the error, but scIB offers no parameters to adjust the variables in question.


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-69-776881c12051> in <module>
----> 1 compute_metrics(adata_full=adata_full,latent_adata=latent_adata,metrics=metrics,cell_type_col='cell_subtype')

<ipython-input-65-b5d9db1547fc> in compute_metrics(adata_full, latent_adata, metrics, cell_type_col, batch_col, PC_regression, ASW_batch, kBET, graph_connectivity, graph_LISI, NMI_ARI, ASW_cell_type, isolated_label_F1, isolated_label_ASW)
     16     # kBET
     17     if kBET:
---> 18         metrics['kBET']=1-np.nanmean(sm.kBET(adata=latent_adata, batch_key=batch_col, label_key=cell_type_col, 
     19                                              embed='X_emb',
     20                                          type_ = 'embed',

~/scib/scIB/metrics.py in kBET(adata, batch_key, label_key, embed, type_, hvg, subsample, heuristic, verbose)
   1649             else: #a single component to compute kBET on
   1650                 #need to check neighbors (k0 or k0-1) as input?
-> 1651                 nn_index_tmp = diffusion_nn(adata_sub, k=k0).astype('float')
   1652                 score = kBET_single(
   1653                             matrix=matrix,

~/scib/scIB/metrics.py in diffusion_nn(adata, k, max_iterations)
    852 
    853     if (M>0).sum(1).min() < (k+1):
--> 854         raise ValueError(f'could not find {k} nearest neighbors in {max_iterations}'
    855                          'diffusion steps.\n Please increase max_iterations or reduce'
    856                          ' k.\n')

ValueError: could not find 10 nearest neighbors in 16diffusion steps.
 Please increase max_iterations or reduce k.
danielStrobl commented 3 years ago

Which version of scIB are you using? We also ran into this error and fixed it with this PR: https://github.com/theislab/scib/pull/161 I think we also raised the standard value of max_iterations to 26 in more recent versions. This generally happens when the integration method produces a highly disconnected graph

Hrovatin commented 3 years ago

I am using a bit older version - will try this one out. If I update the version do I need to re-run all evaluations that successfully passed on the older version or are results comparable between older version and this fix?

danielStrobl commented 3 years ago

This fix just sets the kBET score to 0 if it can't find enough neighbors, so it only affects the cases where it failed before. Same goes for raising max_iterations, so the results are comparable before and after the fix

Hrovatin commented 3 years ago

Thanks.

This generally happens when the integration method produces a highly disconnected graph

All may embedding are very similar on UMAP (output of integration method is latent space), but kBET fails only for some of them, even after increasing max_iterations to 26.

LuckyMD commented 3 years ago

The "kBET fail" here is not really the kBET algorithm itself, but the preprocessing of the data to be input into kBET. We require a minimum of I think 45 neighbors per node (I think it was 45 at least). If we don't have that number, we perform diffusion over the graph (subsetted to a cell identity label, and only on sufficiently large connected components) to find more. If even after 26 iterations we don't get 45 neighbors, then the subsetted knn graph is so sparse and tree-like that we regard this as a bad kBET result anyway. So basically.. there must be differences between your embeddings for particular cell identity labels.