theislab / cellrank

CellRank: dynamics from multi-view single-cell data
https://cellrank.org
BSD 3-Clause "New" or "Revised" License
341 stars 47 forks source link

transition_confidence and fatemap #541

Closed tkamth closed 3 years ago

tkamth commented 3 years ago

... I am walking through the tutorial from cellrank(https://cellrank.readthedocs.io/en/latest/pancreas_basic.html), I have a interpretation/conceptual question that I am hoping you can quickly help clarify.

How does the directed transition confidence (using scv.get_df(adata,'paga/transition_confidence')) relate to the values shown on the fate heatmap (cr.pl.cluster_fate(adata,"heatmap")? More specifically, by following the tracing from any cluster of cell to the terminal states (following the different paths and multiplying out the values of the transition table) and applying some normalization factor(s) along the way, should we be able to reconstruct the results from the fatemap table.

Marius1311 commented 3 years ago

Hi @tkamth, that's a great question, thank you for posting it. Directed PAGA transition confidences are computed directly on the cluster level by comparing the velocity in- and outflow for each cluster, see Section 2 ("Computing a directed PAGA graph") in the Online Methods section of the CellRank preprint.

On the other hand, cr.pl.cluster_fates is an aggregation method for fate probabilities computed for individual cells on the basis of absorption probabilities, check Section 1.4 ("Computing fate probabilities") in the Online Methods section of the CellRank preprint. Conceptually, absorption probabilities really are the probabilities that a random walk initialized in cell i will reach absorbing cell j before reaching any other absorbing cell k, so they have an intuitive mathematical definition.

tkamth commented 3 years ago

thanks @Marius1311 for the clarification, is there any existing cellrank function to extract the transition matrix which would include the intermediate states? Most importantly, I reckon that several computational challenges were overcome to come to the fate probabilities, so please let me know even if this detailed transition matrix can be extracted, whether the interpretation can be obtained at the cluster level and whether there is any value is seeing the details of the transition matrix (or at least the coarse-grained transition matrix looking just at the macrostates.)

Marius1311 commented 3 years ago

Hi @tkamth, you can extract the constructed transition matrix either from the AnnData object (adata.obsp['T_fwd']) or directly from your kernel object if you follow the low-level pipeline (see the Cellrank advanced tutorial) via kernel.transition_matrix. You can also access the coarse-grained transition matrix from an estimator object via estimator.coarse_T. Going through the cellrank advanced tutorial will make this much more clear.

Marius1311 commented 3 years ago

Hi @tkamth, the CellRank acvanced tutorial has been renamed to kernels and estimators and I can highly recommend going through this, I think that will help with your question. Closing this until further questions/comments arise.

tkamth commented 3 years ago

thanks @Marius1311, I have gone through the kernels and estimators , thanks a lot for this. Is it correct to state that: 1)g.compute_macrostates(n_states=Ns, cluster_key="clusters") followed by g.plot_coarse_T() will provide us with the transition matrix for the number of macrostates of our choice (Ns, say the number of clusters that we have annotated) and not just on the 3 most likely terminal states determined by eigengap or the minChi criterion --> if so, choosing Ns=6 on the tutorial yields negative values on the transition matrix.

Marius1311 commented 3 years ago

Hi @tkamth, yes, that is correct! Negative values can happen if macrostates overlap very much - that's usually a sign that this cluster number is very suboptimal. We comment on this in the Online Methods of our preprint (Section 1.3, "Coarse-graining the Markov Chain", paragraph "Positivity of the projected transition matrix"). See also Section 2.2 in https://pubs.acs.org/doi/abs/10.1021/acs.jctc.8b00079 for a discussion on this.

tkamth commented 3 years ago

Thanks @Marius1311, how does this related to clusters that were annotated for this dataset? I reckon that annotated clusters are determined independently of how the macrostates, membership and transition matrices are calculated, so it's not a given that we can generate meaningful coarse grained transition matrix using the annotated clusters as the cluster_key. Additional questions: 1)the default weight given to the velocity to create the combined transition matrix was determined empirically, what should we be looking for in terms of assessing the noise level of the velocities to decide whether downweigh or upweigh this in preference to say the similarity kernel?
2)I am pretty confident now that we can retrace the fate matrix from the different macrostates to the terminal states by summing and following up/multiplying out the different paths using the coarse-grained transition matrix with whichever combination of kernels of our choice. Twofold question: i) the is calculated independent of the Directed PAGA transition confidences and ii) Is there a way to represent this graphically similar to the PAGA referring to the actual transition matrix that was used?

Marius1311 commented 3 years ago

Thanks @Marius1311, how does this related to clusters that were annotated for this dataset? I reckon that annotated clusters are determined independently of how the macrostates, membership and transition matrices are calculated, so it's not a given that we can generate meaningful coarse grained transition matrix using the annotated clusters as the cluster_key.

Please check our beyond rna velocity tutorial for this question, there we show that the cluster_key has no effect on the actual macrostate computation and is just used for labeling.

Additional questions: 1)the default weight given to the velocity to create the combined transition matrix was determined empirically, what should we be looking for in terms of assessing the noise level of the velocities to decide whether downweigh or upweigh this in preference to say the similarity kernel?

CellRank is pretty robust to this parameter, default weight is 0.2 and that should be fine.

2)I am pretty confident now that we can retrace the fate matrix from the different macrostates to the terminal states by summing and following up/multiplying out the different paths using the coarse-grained transition matrix with whichever combination of kernels of our choice. Twofold question: i) the is calculated independent of the Directed PAGA transition confidences

yes, macrostates and the corase-grained transition matrix are computed independently of the directed PAGA

and ii) Is there a way to represent this graphically similar to the PAGA referring to the actual transition matrix that was used?

mhm you can plot the coarse-grained transition matrix as a heatmap, if you want to plot it as a PAGA graph, you would have to look into this a bit - I don't have an easy-to-use recepie here but you can probably find a hacky solution by putting the data in the right location in AnnData, for this, you would have to dive into the PAGA code a bit.