theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
408 stars 103 forks source link

Velocity scores clarification and possible enhancement #1176

Open jwalewski opened 8 months ago

jwalewski commented 8 months ago

Hello,

I would like some clarification on the velocity scores reported by scv.tl.rank_velocity_genes() and the data stored in adata.var["velocity_score"].

First, I stored the values in this for my anndata object to a pandas dataframe which I then wrote to a csv. Given the description available at https://scvelo.readthedocs.io/en/stable/scvelo.tl.rank_velocity_genes.html?highlight=rank_velocity_genes, I expected a different gene list for each cluster. However, this dataframe instead has two columns, simply "gene" and "score". Is this meant to be all of the different clusters? If so, why does each gene only appear once?

Additionally, why are scores only positive integers (I have values ranging from 0 to 14)? The documentation suggests that genes with either a very high or low unspliced/spliced ratio would contribute to the score - and I would expect either fractional or negative numbers to differentiate ratios greater or less than 1.

Please let me know if I'm missing anything or if these features do not currently exist in scvelo (and I assume the workaround would be to instead subset the anndata object and then get velocity scores for only the subset - which seems odd since rank_velocity_genes(adata, groupby="clusters") works for me.