theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
414 stars 102 forks source link

Hematopoiesis velocity directionality #173

Closed AMA111 closed 3 years ago

AMA111 commented 4 years ago

Hey Volker,

I have a question regarding an analysis of Bone Marrow stem and progenitors compartment from heathy human samples.

image

The Figure her shows that there is a developmental process directed from the Erythroid prog (green color) to the Hematopietic stem cells (HSCs in Red color). We already know from previous studies of the Hematopietic system that the development is rather the other way around (From the stem to the prog).

Code: Compute velocity and velocity graph

scv.tl.recover_dynamics(sub_data) scv.tl.velocity(sub_data, mode='dynamical') scv.tl.velocity_graph(sub_data) scv.tl.recover_latent_time(sub_data)

I tried also the steady-state model and gave the same conclusions. Also, I I tried the same to other samples from disease states and gave the same result!

Do you have an idea about the reverse directionality here? Possibly, there is dedifferentiation process is going on, but before we go to this conclusion, I would like to know your opinion.

Best, Abdelrahman Mahmoud DKFZ

VolkerBergen commented 4 years ago

Hi Abdelrahman,

great, that you examine this carefully before drawing any conclusions.

Looking into the phase potraits of top likelihood genes would certainly help.

top_genes = adata.var.sort_values('fit_likelihood', ascending=False).index[:10]
scv.pl.scatter(adata, basis=top_genes)

If you send me the output, I can help you with interpretations.

AMA111 commented 4 years ago

Hey Volker, Sure!

image

VolkerBergen commented 4 years ago

Looks like the velocities are mostly driven by the Hemoglobin Beta genes.

Can you also color it with pseudotime / latent time coord. to see the diffs. in the erythroid compartment? What do you expect for HBB towards maturation?

AMA111 commented 4 years ago

Latent time :

image

Pseudotime : image

UMAP: image

yes, in the hematopiotic system, we expect maturation steps from: HSCs >> Erythroid prog >> mature Erythroid cells (mostly express hemoglobin related genes)

VolkerBergen commented 4 years ago

We analysed one of our datasets today, which also includes erythroid cells. Interestingly, with RNA velocity we obtain directional flows in accordance with biol. expectations in all cell types except for erythroids. It turned out in our example that splicing efficiency seems to change dramatically in erythroids, decreasing the fractions of unspliced counts by half. That could be a reason for 'false' signal.

You could get an indication of such with the following code.

counts_s = scv.utils.sum_var(adata.layers['spliced'])
counts_u = scv.utils.sum_var(adata.layers['unspliced'])
fractions_u = counts_u / (counts_s + counts_u)
scv.pl.scatter(adata, color=fractions_u, smooth=True)

Would be very interesting to see whether the fractions are gradually decreasing in your erythroid progenitors.

AMA111 commented 4 years ago

Interesting!! I calculated the fractions from my data and here is how it looks like:

Screenshot 2020-03-23 at 18 59 03

Do you have possible explanation for such velocities specially in the Erthroid prog? would be great if you could send me the velocities figures of your dataset? I am interested to see how the overall velocities behaviour in the stem and progenitors other than my data; since I couldn't find any published example from the BM hematopiotic system with velocities analysis.

VolkerBergen commented 4 years ago

Indeed, there is now hundreds of papers making use of RNA velocity, yet (nearly) none of them on hematopoiesis. And I think the reason is because it usually does not comply with the biology. Conspicuously always observed in blood development, while it works just fine in almost all other systems (under some assumptions). Your example perfectly resembles what I also see in our data. The significant change in splicing efficiency eventually yields false signal, as illustrated by the dynamics of Hbb gene: It is up-regulated. However, due to a gradual increase in the splicing rate, the curvature in phase space may be turned such that it indicates down-regulation. That violates the fundamental assumption of any RNA velocity framework potentially making it inapplicable. I'll have some thoughts, on whether it can be adapted to changing splicing efficiency.

AMA111 commented 4 years ago

Great! I found that it's known in the HSCs biology that they retain more of the unspliced mRNA and get more spliced form under activation: https://stemcellsjournals.onlinelibrary.wiley.com/doi/full/10.1634/stemcells.2005-0552. Possibly, the Stem cells has undefined state of uncommitted cells for certain state/fate (reflected by high unspliced fraction), then further through the development state, start to have more spliced forms which would have a critical role for fate commitment. I would be happy to discuss this further and please let me know by any further updates regarding this issue. Thanks Volker!

Best wishes, Abdelrahman

ftheis commented 4 years ago

This is really interesting - so far my assumption always was that hematopoietic kinetics in particular in HSCs and further up are so much slower than the usual kinetics (more weeks than days) that we do not find much. Hence, as Volker said, no real RNA velocity appl in blood. Do you guys see this effect only in erythroids (where transcription could go down etc as well) or also in other downstream cell types?

AMA111 commented 4 years ago

I think specially in the erythroid prog, the system needs to be very fast to fit the Erythroid cells (RBCs) production rate (~2.4 Million cell/sec) which is very high demand per day. It make sense that the system will increase the splicing rate dramatically at this stage to commit to certain fate > RBCs. Also, I looked at the cell cycle and it seems that the they are high cycling cells (S and G2M) which interestingly fits this paper: https://www.ncbi.nlm.nih.gov/pubmed/30463007 . From my point of view, the velocities accurately detected that there is a development process (False direction as we think) going on from Eryth prog to MK prog. The issue here is the "direction", but from an optimistic point of view it capture the developmental process correctly :))

Screenshot 2020-03-25 at 12 24 42

Can you please explain for me from a method perspective how it capture the directionality? I see that HBB is possibly in a repression phase but this doesn't help me to understand the directionality here?

VolkerBergen commented 4 years ago

SCB_scVelo The directionality is already depicted in gene-wise dynamics by assignment of positive or negative velocities (induction/repression). The projected velocity vector summarizes all gene-wise velocities, whereas in your example we can presume that it is mainly driven by HBB. HBB is picked up as repression (even though it may be induction, thus potentially false signal), which implies that cells should move from the highly expressed to the lowly expressed, i.e. from yellow to blue (colored by velocity pseudotime).

Good that you mention the cycle effects - same in our data (for Erythroid prog.)! That could also be a cause for false signal (if the cycle cannot be resolved by embedding coord. due to compositional effects of cycle and dev. progression) and should be investigated. Perhaps, you could try regressing out cycle effects.

Can you also show the phase potraits of HBB on raw data with scv.pl.scatter(adata, basis='HBB', use_raw=True)?

AMA111 commented 4 years ago

I don't recommend regressing out cell cycle, since it's reflecting important underlaying process of the hematopoiesis differentiation. Have you tried to regress out the cell cycle in your data? and do you see change in directionality?

sure! Screenshot 2020-03-31 at 14 39 07

ogilliam commented 4 years ago

Hi Volker,

I am facing the same issues as @AMA111. We are using a mouse model of the intestinal crypt in order to investigate cell differentiation.

image

From our biological understanding the arrows in the paneth cells should be pointing the other way around. Differentiation starts in the stem cells and terminates in the different celltypes (Enterocytes, Gobblet cells and Paneth cells). There is some knowledge about dedifferentiation of Paneth cells, but we didn’t expect to observe the majority of arrows pointing in the inverse direction of differentiation.

Top 10 putative driver genes image

Fraction of unspliced RNA image

This left me with the assumption, that a change in splicing efficiency could be the reason for my observation. Does this makes sense or am I missing something here?
And if this is the reason, would you have any suggestion on how to solve this issue?

Also, we tested for the number of cells and velocity genes, which looked fine to me: print(adata.var['velocity_genes'].sum(), adata.n_vars, adata.n_obs) 401 3000 5412

Thanks a lot for your help!

Best wishes, Oliver

VolkerBergen commented 4 years ago

Your genes are undoubtedly supporting directionality from stem to enterocytes. However, for paneth cells you got some interesting effects. In Gm0104, the red-highlighted part imho does not really fit to the remaining kinetic, thus induces some signal for down-regulation towards stem, which seems to be artificial imho given that the change in unspliced counts is too sudden/rapid..

image

VolkerBergen commented 4 years ago

You seem to get this consistently for 'Gm*' genes. Any clue, whether there's anything specific to those genes?

ogilliam commented 4 years ago

Thank you for the quick response!

Indeed, all the Gm* genes show similar results. In intestine, these genes are exclusively expressed in paneth cells.

image

It seems, that with further differentiation of paneth cells, the expression of GM10104 is also increasing. I am just a bit confused why the scatter plot is indicating the opposite? Could the splicing efficiency have any effect in this?

(Gm10104 (DEFA25) belongs to the family of alpha defensins. The respective proteins play an important role in the regulation of the microbial flora and the innate immune response against pathogens)

ogilliam commented 4 years ago

Hi Volker,

I reanalyzed the dataset with the updated version of scVelo (0.2.1) and could observe a very low fraction of unspliced counts over the whole dataset, especially low for Paneth cells (Cluster 7, 8, 10, 17).

image

image

image

According to the scVelo website, between 25 - 10% unspliced counts is expected. Are the 10% a lower threshold or is it possible to analyze datasets with less than 10% unspliced RNA, like in my case?

Also, I was wondering what the reason for the low fraction of unspliced RNA might be. In this context I found this preprint:

image https://www.biorxiv.org/content/10.1101/2020.03.13.990069v1.full

According to the paper, different preprocessing methods create different overall results. Is this something you observed as well? For my analysis, I used velocyto.py with the run10X command.

Thanks a lot!

WeilerP commented 3 years ago

@ogilliam

According to the scVelo website, between 25 - 10% unspliced counts is expected. Are the 10% a lower threshold or is it possible to analyze datasets with less than 10% unspliced RNA, like in my case?

It seems rather low. This should result in a lot of genes being filtered out (based on shared counts). You would have to look at your phase portraits to see if enough information is given to infer dynamics.

Also, I was wondering what the reason for the low fraction of unspliced RNA might be. In this context I found this preprint:

image https://www.biorxiv.org/content/10.1101/2020.03.13.990069v1.full

According to the paper, different preprocessing methods create different overall results. Is this something you observed as well?

I did not read the paper but yes, preprocessing does matter!

cimbusch commented 2 years ago

In case somebody is interested or bumps into this:

Coordinated changes in gene expression kinetics underlie both mouse and human erythroid maturation https://pubmed.ncbi.nlm.nih.gov/34225769/

elifdogandar commented 7 months ago

Hi,

I wanted to check splicing rates too. But apparently 'scv.utils.sum_var' is not there anymore. Do you have an alternative way to check this? I would be grateful...