theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
409 stars 102 forks source link

Injecting alternative connectivity matrix to paga #284

Closed dawe closed 3 years ago

dawe commented 3 years ago

Hello there, I'm using scvelo and I'm really happy with the functionalities provided (and with the results, as well). I'm now playing with PAGA trees with velocity directed edges and I would like to tweak it to fit other observations I have for my model. More in detail, I'm using schist (actually I am one of the developers) to find cell groups and to understand the groupwise relations, based on stochastic block models. In particular, the tree estimated by PAGA is, in general, topologically different from what I can estimate. As consequence, paga/connectivities_tree and paga/transitions_confidence do not apply to my model. So, I would like to substitute those matrices with two derived from my data and let then sc.pl.paga do the rest. This is easy for the MST, as I have the adjacency matrix, but I have difficulties in understanding what transitions_confidence is and how this is calculated. Would you share some hint?

VolkerBergen commented 3 years ago

If applying on different clusters, you can store them in .obs and pass it via scv.tl.paga(adata, groups='my_clusters'). If on the other hand, you'd like to pass your own transition graph/matrix, you can store it under adata.uns['my_transitions_graph'] and run paga via scv.tl.paga(adata, vkey='my_transitions'). transitions_confidence is just an abstraction of transitions to cluster level, which simply put tests the single-cell transition probabilities against transitions under random assignment for significance.

dawe commented 3 years ago

@VolkerBergen thank you. I won't use my clusters to build the PAGA, as this is based on between-clusters connectivity, and the transition tree is estimated from that. I'll try to use my tree as transition matrix, then.

dawe commented 3 years ago

Mmm, I believe this is not the appropriate solution. The transition graph you are referring is, de facto, the velocity graph estimated by scvelo regardless PAGA. I have clusters but I want scv.pl.paga use a different tree. I can change the adata.uns['paga']['connectivities'] with my adjacency matrix, I can change the adata.uns['paga']['connectivities_tree'] with my minimum spanning tree, but I need to recalculate the adata.uns['paga']['transitions_confidence'] matrix (and possibly the thresholds), which is what I'm not able to do.

VolkerBergen commented 3 years ago

transition_confidence is calculated completely independent from connectivities_tree. The former takes velocity_graph as input, the other uses connectivities. Do you have a single-cell transitions matrix that you would input, or an already abstracted graph? if the latter, you would then want to use your mst as hard prior, i.e. prune some edges from directed paga, or even pass a readily-computed graph?

dawe commented 3 years ago

I have an already abstracted graph. I thought transition_confidence was calculated on the abstracted graph. If this is not the case, I will just replace the connectivities and the connectivities_tree with my own group graphs and leave transition_confidence previously calculated on the same groups.

VolkerBergen commented 3 years ago

transitions_confidence is the directed graph/tree, so you can just replace it with yours.

dawe commented 3 years ago

Yep, but the weight of its edges are not derived from the abstracted graph (aren't they?), they are not connectivities. e.g. in the pancreas dataset, the edge Ngn3 low EP => Ngn3 high EP has connectivity 0.3 and transition 0.19. As far as I understand transitions are derived from the time prior (pseudotime). I just need to know how these weights are calculated so that I can assign proper weights to my graph and threshold it to replace the transitions_confidence. I'm sorry to bother you. Also because I'm afraid what I'm doing will be useless, in the end.

VolkerBergen commented 3 years ago

pseudotime is just used as prior, the confidence values are derived from the single-cell transition probabilities here (the same way as it is done for connectivities / undirected graph), corresponding to how significant the aggregation of transition probabilities of cells pointing from one cluster to another is, compared to random assignment.