theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
408 stars 103 forks source link

Interpretation of phase portraits that occur with inverted velocity vectors #1122

Closed JahnaviBhaskaran closed 9 months ago

JahnaviBhaskaran commented 11 months ago

Hi authors, thank you for this amazing package. I would like to clarify my interpretation of the phase portraits that I obtain in my analysis. I have snRNA-seq + snATAC-seq multiome data of stem cells undergoing differentiation. I have performed both MultiVelo (on RNA and ATAC) and scVelo (only on RNA) analysis on the same and I obtain very similar results. I am posting the question in the scVelo issues channel as I would like to get your opinion on the dynamics of the unspliced-vs-spliced counts of the top genes that contribute to parameter estimation.

The velocity vectors that I obtain from MultiVelo/scVelo analysis are in opposite direction to known biology, with the root and the end cells being switched up. To explore the reasons for the observed inversion in velocity vectors, I examined the ratio of unspliced versus spliced counts for the top genes that contribute to parameter estimation of the model. I selected these genes by adata.var_names[adata.var.fit_likelihood.argsort()[::-1]][:8].

Based on the different categories of non-ideal gene splicing dynamics elegantly explained in Bergen, Soldatov, et al., 2021, I think the top genes in my analysis are not informative enough for accurate parameter estimation due to the reasons listed below, and thus velocity analysis cannot be successfully applied to our dataset. Could you please confirm if I am interpreting the phase portraits correctly?

My interpretation of the phase portraits: In the figure below, the cells are coloured based on the cluster assignment. The cream cluster (red arrow in APP) and light teal cluster (yellow arrow in APP) represent the most undifferentiated and differentiated states respectively.

SCD5 exhibits multiple rates of changes in the ratio of unspliced vs spliced transcripts for different clusters (I have used red and orange dashed lines to putatively assign the different fits based on visual inspection). Hence, a single model is not sufficient to explain the overall dynamics of the gene. For such cases, scVelo offers the chance to perform a differential kinetics test and to compute velocities based on multiple kinetic regimes using scv.tl.velocity(adata, diff_kinetics=True)(https://scvelo.readthedocs.io/en/stable/DifferentialKinetics/#Differential-Kinetic-Test). However, from my current understanding, MultiVelo does not include this yet.

BUD23 is another example of a gene that exhibits differential kinetics. However, here, cells from the same cluster exhibit different rates of changes in the ratio unspliced to spliced transcripts. For example, based on the black solid line representing the fit, cells within the blue dashed circle in BUD23 are inferred to show an increase in the abundance of unspliced transcripts, followed by an increase in the abundance of spliced transcripts. However, these cells exhibit an increase in the unspliced transcript abundance without much change in spliced transcript abundance.

HACD4 is an example of a noisy gene, where the change in ratio of unspliced and spliced transcripts does not necessarily correlate with any cell state transition.

GCC2 shows a linear increase in the ratio of unspliced to spliced transcripts and seems to be associated with steady state dynamics. Thus, it does not offer any curvatures associated with transcriptional induction or repression. Such cases could also lead to inverted velocity estimation due to the inaccurate assignment of downregulation fit for the induction phase of the gene and vice versa.

EI4FG2 is an example of a gene which shows a slight downward curvature representing transcription repression and could be informative for model learning.

Taken together, most of the top genes selected by the model for parameter inference are not truly informative (such as HACD4 and GCC2) or do not fit the current assumptions of the model (such as SCD5 and BUD23).

Could you please confirm if my interpretations are correct?

Please let me know if I can provide more figures or details to help interpret this better. Since this is unpublished data, it would be great if I could share them via email.

Thank you!

Figure10-github-issues-scvelo

JahnaviBhaskaran commented 11 months ago

Dear authors, would you have any updates about this? Many thanks in advance!