velocyto-team / velocyto.py

RNA velocity estimation in Python
http://velocyto.org/velocyto.py/
BSD 2-Clause "Simplified" License
159 stars 84 forks source link

Adjusting for cell cycle? #154

Closed koelling closed 5 years ago

koelling commented 5 years ago

Is there a way to adjust for possible cell cycle effects in velocyto?

I am trying to make sense of a velocyto output for (what we think are) a set of cells in different stages of differentiation, from naive cells to two different types of more mature cells. Some of the cells are in G1 phase, some are in G2/M and some are in S.

We do get some patterns that match our expectations (two clusters that look terminal, with velocity vectors pointing towards them from one of the clusters). However, the cluster that seems to be the origin of these vectors (ie. what should be the naive cells) is actually not the cluster we would have expected based on prior knowledge.

In fact, the most distinguishing feature of the origin cluster is that the majority of its cells seem to be in G2/M phase.

Thus, we are concerned that the differences in cell cycle of our cells are overshadowing other differences between them. The expression of cells in G2/M phase may be so different from other cells that it will look like all other cells are derived from them.

We did adjust for the cell cycle when clustering the cells in Seurat but I did not find a similar feature in velocity.

Has anybody dealt with a similar issue and can point me in the right direction?

gioelelm commented 5 years ago

I am afraid that there is no standard function that velocyto can offer at this point, I think removing the effect of the cell cycle needs to be evaluated for each specific cases and it is a challenging task. In particular with a big M phase cluster there could be a strong divergence from the assumption of the model because cells will be changing their volume abruptly and not transcribing new RNA while the chromosomes condense. A naive way to go about it is to try to filter out annotated cell cycle genes together with genes highly correlated with a good fraction of the annotated genes. However this is not guaranteed to work and it might bring different bias (e.g. cycling cells or cells in particular phases of the cell cycle might have now few detected transcripts in comparison to others) What I would do is to look at individual phase portraits and to evaluate how they look for genes that are well known markers. If they look reasonable in shape and robustly showing a trend consistent with the direction that you get, then there is good reason to believe that what you see trajectory is supported by that. On a side note, it might also be that some fundamental assumption of our modelling such as constant degradation rate is definitely not true for your system, this might be a little more difficult to evaluate from phase portrait.

koelling commented 5 years ago

Thank you for your quick response Gioele!

Could you expand a bit on how you would look at the phase portraits? Do you mean it would be good to look whether known cell-cycle genes have very high unspliced over spliced ratios (ie. are above the diagonal) for the cycling cluster? With the idea that this would mean that these have a strong impact on the overall velocity?

One related thing I have been trying to do is to look at the vlm.velocity matrix, calculate the mean velocity for each gene in each cluster and then ask questions like "which gene has the highest average velocity in this cluster?" and "what proportion of the total (absolute) velocity of this cluster is in known cell cycle genes?"

Could that be a way of looking at this? Is vlm.velocity the right data to use for it? I'm not sure how meaningful it actually is to compare velocities between different genes or to take their mean.

gioelelm commented 5 years ago

I would simply look if individual genes are consistent with the direction of the velocity filed. For example, if a gene X is low in a population that “moves" towards a population with gene X-high I would expect a phase portrait that looks an arc that is convex and above the diagonal. With the cells moving in that direction forming the arc and the cells geneX-high at the tip of the arc.

I am not sure that doing what you are proposing will work well, I agree that it might not be meaningful…. It is a possible heuristic by the issue is that magnitude is not a reliable statistic since different genes will be expressed at different levels.