welch-lab / VeloVAE

Deep Generative Modeling of RNA Velocity
BSD 3-Clause "New" or "Revised" License
35 stars 6 forks source link

Fixed anndata neighbors key issue #7

Closed jjia1 closed 3 months ago

jjia1 commented 3 months ago

Pull request type

Please check the type of change your PR introduces:

What is the current behavior?

Issue Number: #6 PR Number on Fork: fixed neighbors key bug #1

Current error status

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[46], line 4
      2 keys = ['fullvb']
      3 grid_size = (1,2)
----> 4 res, res_type = vv.post_analysis(adata,
      5                                  'continuous',
      6                                  methods,
      7                                  keys,
      8                                  compute_metrics=True,
      9                                  raw_count=False,
     10                                  #genes=gene_plot,
     11                                  grid_size=(1,2),
     12                                  #figure_path=figure_path,
     13                                  cluster_key='celltype')

File ~/miniconda3/envs/velovae/lib/python3.10/site-packages/velovae/analysis/evaluation.py:455, in post_analysis(adata, test_id, methods, keys, gene_key, compute_metrics, raw_count, genes, plot_type, cluster_key, cluster_edges, nplot, frac, embed, grid_size, sparsity_correction, figure_path, save, **kwargs)
    453 for i, method in enumerate(methods):
    454     if compute_metrics:
--> 455         stats_i, stats_type_i = get_metric(adata,
    456                                            method,
    457                                            keys[i],
    458                                            vkeys[i],
    459                                            cluster_key,
    460                                            gene_key,
    461                                            cluster_edges,
    462                                            embed,
    463                                            n_jobs=(kwargs['n_jobs']
    464                                                    if 'n_jobs' in kwargs
    465                                                    else None))
    466         stats_type_list.append(stats_type_i)
    467         # avoid duplicate methods with different keys

File ~/miniconda3/envs/velovae/lib/python3.10/site-packages/velovae/analysis/evaluation.py:273, in get_metric(adata, method, key, vkey, cluster_key, gene_key, cluster_edges, embed, n_jobs)
    266 stats['Vel Consistency (Velocity Genes)'] = mean_vel_consistency_sub
    268 # Compute velocity metrics on all genes
    269 (iccoh, mean_iccoh,
    270  cbdir_embed, mean_cbdir_embed,
    271  cbdir, mean_cbdir,
    272  tscore, mean_tscore,
--> 273  mean_vel_consistency) = get_velocity_metric(adata,
    274                                              key,
    275                                              vkey,
    276                                              cluster_key,
    277                                              cluster_edges,
    278                                              None,
    279                                              embed,
    280                                              n_jobs)
    281 stats['CBDir (Embed)'] = mean_cbdir_embed
    282 stats['CBDir'] = mean_cbdir

File ~/miniconda3/envs/velovae/lib/python3.10/site-packages/velovae/analysis/evaluation.py:80, in get_velocity_metric(adata, key, vkey, cluster_key, cluster_edges, gene_mask, embed, n_jobs)
     35 def get_velocity_metric(adata,
     36                         key,
     37                         vkey,
   (...)
     41                         embed='umap',
     42                         n_jobs=None):
     43     """
     44     Computes Cross-Boundary Direction Correctness and In-Cluster Coherence.
     45     The function calls scvelo.tl.velocity_graph.
   (...)
     78             - float: Velocity Consistency
     79     """
---> 80     mean_constcy_score = velocity_consistency(adata, vkey, gene_mask)
     81     if cluster_edges is not None:
     82         try:

File ~/miniconda3/envs/velovae/lib/python3.10/site-packages/velovae/analysis/evaluation_util.py:1621, in velocity_consistency(adata, vkey, gene_mask)
   1607 def velocity_consistency(adata, vkey, gene_mask=None):
   1608     """Velocity Consistency as reported in scVelo paper
   1609 
   1610     Args:
   (...)
   1619         float: Average score over all cells.
   1620     """
-> 1621     nbs = adata.uns['neighbors']['indices']
   1623     velocities = adata.layers[vkey]
   1624     nan_mask = ~np.isnan(velocities[0]) if gene_mask is None else gene_mask

KeyError: 'indices'

What is the new behavior?

Current version evaluation_utils.py calls the neighbor indices as adata.uns['neighbors']['indices']. This doesn't work with newer versions of anndata and/or Scanpy where adata.uns['neighbors'].keys() will print out:

dict_keys(['connectivities_key', 'distances_key', 'params'])

where the indices are inside of the connectivities_key. I found the offending behavior in the following functions:

  1. velocity_consistency
  2. inner_cluster_coh
  3. cross_boundary_correctness
  4. gen_cross_boundary_correctness
  5. gen_cross_boundary_correctness_test
  6. calibrated_cross_boundary_correctness
  7. also added import to include csr_matrix from scipy.sparse

Other information

Output from

# Get all module names in the current namespace
module_names = set(sys.modules) & set(globals())

# Print the name and version of each module
for name in module_names:
    module = sys.modules[name]
    if hasattr(module, '__version__'):
        print(f"{name}=={module.__version__}")
    elif hasattr(module, 'VERSION'):
        print(f"{name}=={module.VERSION}")
    else:
        print(f"{name}")
logging==0.5.1.2
torch==2.3.1
math
random
warnings
textwrap
sys
anndata==0.10.7
os
cellrank==2.0.4
gc

Please let me know if you need anything else.

g-yichen commented 3 months ago

Hi,

First, thank you for fixing this issue. I think the fix should work, but noticed that you deleted lines 1115-1130 in gen_cross_boundary_correctness and 1257-1272 in gen_cross_boundary_correctness_test. These lines obtain the embedding coordinates and velocity arrays from the AnnData object. My apologies if the input argument with the same name, x_emb, confuses you. Yeah, that was a bad naming and as an input argument it just specifies the key string to obtain the actual embedding coordinates. Please let me know if this makes sense to you.

Thank you!

jjia1 commented 3 months ago

Hi,

First, thank you for fixing this issue. I think the fix should work, but noticed that you deleted lines 1115-1130 in gen_cross_boundary_correctness and 1257-1272 in gen_cross_boundary_correctness_test. These lines obtain the embedding coordinates and velocity arrays from the AnnData object. My apologies if the input argument with the same name, x_emb, confuses you. Yeah, that was a bad naming and as an input argument it just specifies the key string to obtain the actual embedding coordinates. Please let me know if this makes sense to you.

Thank you!

Thanks @g-yichen. I'll re-add those lines back into the PR to avoid any issues down the line. Please take a look once I've finished. J

jjia1 commented 3 months ago

Hi @g-yichen I restored the missing lines of code. You can ignore the readme file I just left that for my personal fork. Please let me know if there's any other changes I should make for the PR.

Best, J