theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
414 stars 102 forks source link

Is the transition probability matrix in scVelo same as RNA velocity paper ? #30

Closed ahy1221 closed 5 years ago

ahy1221 commented 5 years ago

Hi, I realized that the transition probability matrix estimated using RNA velocity and some graph analysis on this transition matrix might be a good way to quantify cell state transition. The scvelo actually implement such idea according to the function names. Since I am not familiar with python, I was wondering that is the transition probability matrix calculated from scvelo same as the RNA velocity paper ? According to RNA velocity paper,

We therefore developed an approach that places the velocity arrow in the direction in which expression difference is best correlated with the estimated velocity vector, controlling for the cell density distribution. The direction was estimated as follows. We calculated a transition probability matrix P by applying an exponential kernel on the Pearson correlation coefficient between the velocity vector and cell state difference vectors: image where r_ij is the difference vector between the expression vectors s_i and s_j of cells i and j transformed with a variance stabilizing (elementwise) transformation , and d_i is the sigma-transformed velocity extrapolation vector of the cell i: image

VolkerBergen commented 5 years ago

You can reproduce the transition matrix from the paper, i.e. applying a Gaussian kernel on the correlations, with scv.tl.transition_matrix(adata, self_transitions=False).

Per default self_transitions=True, which means that it also allows self-loops for cells that have low confidence of transitioning towards other cells. That would, for instance, have an effect on end points or non-dynamic regions, where cells do not find another state they can confidently transition to, thus staying at its steady state.

ahy1221 commented 5 years ago

Thanks very much. Is the scale parameter of scv.tl.transition_matrix same as the sigma in the formula ? Does the P_ij scale to 1 ? In my dataset, the maximum value of P_ij is only 0.018. Do you think this value is too small and is there a way to evaluate significance ?

VolkerBergen commented 5 years ago

scale = 1 / sigma. A max of .018 is suspiciously low, no matter what scale is set to. How many samples & variables do you have? What are your maximum row values of adata.uns['velocity_graph']?

You can evaluate obtained velocities with scv.tl.velocity_confidence.

ahy1221 commented 5 years ago

Hi Volker, I have ~2700 and 827 velcocity_genes. The maximum row values of adata.uns['velocity_graph']is 0.248. One thing make me confuse is that the transion probability matrix should be a naturally graph to infer cell fate ( and that is what I really want to do actually ) , so what is the difference between adata.uns['velocity_graph'] and transition probability matrix ? Is this transition probability matrix sum up to 1 row by row ? Regarding the velocity_confidence, how do I interprete this value and how to use them ? It seems that it is not a hypothese test here to calculate p value or something from the source code

V -= V.mean(1)[:, None]
V_norm = norm(V)
R = np.zeros(adata.n_obs)

for i in range(adata.n_obs):
    Vi_neighs = V[indices[i]]
    Vi_neighs -= Vi_neighs.mean(1)[:, None]
    R[i] = np.mean(np.einsum('ij, j', Vi_neighs, V[i]) / (norm(Vi_neighs) * V_norm[i])[None, :])

adata.obs[vkey + '_length'] = V_norm.round(2)
adata.obs[vkey + '_confidence'] = R
VolkerBergen commented 5 years ago

That appears quite low to me as well. Might it be a dataset where you would not expect much dynamics maybe? How do the velocities look like?

The velocity graph just contains the correlations of potential cell state changes (xj-xi) with its velocity vector vi. The transition matrix transform those to probabilities by applying an exponential kernel on the velocity graph; normalized, such that the transition probabilities for each cell sum up to 1.

You can use tl.cell_fate to infer cell fates using the transition matrix.

velocity_confidence is not underlying a statistical test. velocity_confidence simply checks whether the velocities confirm with the neighboring velocities (some kind of smoothness score) while velocity_confidence_transition affirms whether the embedded velocities obtained from the transition probabilities truly reflect the velocities in high dimensional space.

ahy1221 commented 5 years ago

If I understand right, the velocity_graph is the correlation part of the formula below: image I guess you point the critical question out, I am quite not sure how much dynamics in my data set. Visualizing velocities arrow on the embedding is very tricky because we can tweak the arrow shape by setting up several plotting parameters such as density. I would expected to get a p value like metric to quantify is there any real dynamics we got, is there possible to got such p value by permutation or bootstrapping ?

VolkerBergen commented 5 years ago

Pardon for the late response, and happy new year! :)

For visualization purpose, you can use the attributes arrow_size and arrow_length to tweak around the embedded velocities.

Such a metric is given by tl.velocity_confidence. A low value would indicate that noise overshadows directionality, thus it yields no unambiguous dynamics given the velocities obtained.

It might help for comprehension to have one illustrative plot from you.