theislab / cellrank

CellRank: dynamics from multi-view single-cell data
https://cellrank.org
BSD 3-Clause "New" or "Revised" License
337 stars 45 forks source link

subset based on terminal #462

Closed wangmhan closed 3 years ago

wangmhan commented 3 years ago

... Hi,

If I want to select cells for those who has the probability to one specific terminal state, i.e. in the tutorial Alpha. Is it possible to use parameters in adata.obs? I thought "clusters_gradients" is the most possible terminal states, please correct me if I understand wrong. But in my case, the proportion of "clusters_gradients" is always not even. For example, terminal type A, the cell number is 4000 cells, for terminal type B, it is 80 cells. I tried several datasets and it is always like that. One extremely high and the other is extremely low, which didn't make so much sense. Please let me know if have any suggestions how to subset, it would be really helpful.

Thank you!

Marius1311 commented 3 years ago

Hi @wangmhan, thanks for your questions. I don't quite understand the problem you're facing, maybe you can help me by posting a code snippet that illustrates your question? Thanks!

wangmhan commented 3 years ago

Hi Marius,

sure, so my code is like this:

to see the number of cells for each lineage. terminal states: cls1, cls2

-> adata.obs['clusters_gradients'].value_counts() cls1 7571 cls2 40

subset the cells in the interested lineage

-> adata_sub = adata[adata.obs['clusters_gradients']=='cls2']

I am not sure if I understand "clusters_gradients" correctly. In general, I want to subset cells with specific lineage as the plot of cr.pl.terminal_states.

On Fri, 15 Jan 2021 at 11:02, Marius Lange notifications@github.com wrote:

Hi @wangmhan https://github.com/wangmhan, thanks for your questions. I don't quite understand the problem you're facing, maybe you can help me by posting a code snippet that illustrates your question? Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theislab/cellrank/issues/462#issuecomment-760796152, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJVYTVXBMLTDVZQN7ZXFS23S2AHBZANCNFSM4WCWZ6JA .

Marius1311 commented 3 years ago

I think it would help me to understand your application. Why do you want to subset cells according to their lineage membership? What final plot would you like to generate?

wangmhan commented 3 years ago

Because we only interested in one terminal. So we only want to explore the temporally expressed genes from the specific lineage. The final plot would be one panel for embedding highlight the cells with the specific lineage, other cells in gray. The other panel would be a heatmap showing the expression dynamics along the trajectory, based on latent time/ velocity_pseudotime.

On Mon, 18 Jan 2021 at 10:09, Marius Lange notifications@github.com wrote:

I think it would help me to understand your application. Why do you want to subset cells according to their lineage membership? What final plot would you like to generate?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theislab/cellrank/issues/462#issuecomment-762102283, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJVYTVXOQ2PMF7N7UJZ4EHTS2P3C3ANCNFSM4WCWZ6JA .

Marius1311 commented 3 years ago

ok, got it! You don't need to subset cell for this - here's how I would do this: follow the steps of the [pancreas advanced]() tutorial until line 23. At this point, you should have computed your set of terminal states and fate (absorption) probabilities towards them. Next, you illustrate the lineage you're interested in via it's associated fate probabilities like so:

g_fwd.plot_absorption_probabilities(same_plot=False, lineages=['YOUR_LINEAGE'], show_dp=False)

where the parameters are explained in the docstring. This will give you something like this:

image

Next, when you want to select genes which are specific to this lineage, you again use the corresponding fate probabilities by calling

g_fwd.compute_lineage_drivers(lineages=['YOUR_LINEAGE'], cluster_key='clusters', clusters=['RESTRICT_TO_RELEVANT_CLUSTERS'], use_raw=True)

This will compute correlations of genes with that respective lineage as well as their p and multiple-testing adjusted q-values. This will give you a set of genes which show interesting dynamics along your lineage. To plot these in a heatmap, use

model = cr.ul.models.GAM(adata_raw, n_knots=n_knots)
fig_kwargs = {'model': model, 
              'genes': GENES_THAT_CORRLEATE_WITH_THIS_LINEAGE 
              'lineages': ['YOUR_LINEAGE'], 
              'cluster_key': 'clusters', 
              'time_key': YOUR_PSEUDOTIME,  
              'weight_threshold': CELLS_WITH_FATE_PROB_BELOW_THIS_THRESHOLD_WILL_NOT_CONTRIBUTE}
cr.pl.heatmap(adata_raw, **fig_kwargs) 

Please check the precise meaning and function of these function arguments in the API, it's all there. If you have more questions, please check out our examples section, e.g. the example that shows how to plot a heatmap of gene expression trends.

wangmhan commented 3 years ago

Thanks, I will check it out!

On Tue, 19 Jan 2021 at 18:19, Marius Lange notifications@github.com wrote:

ok, got it! You don't need to subset cell for this - here's how I would do this: follow the steps of the pancreas advanced tutorial until line 23. At this point, you should have computed your set of terminal states and fate (absorption) probabilities towards them. Next, you illustrate the lineage you're interested in via it's associated fate probabilities like so:

g_fwd.plot_absorption_probabilities(same_plot=False, lineages=['YOUR_LINEAGE'], show_dp=False)

where the parameters are explained in the docstring. Next, when you want to select genes which are specific to this lineage, you again use the corresponding fate probabilities by calling

g_fwd.compute_lineage_drivers(lineages=['YOUR_LINEAGE'], cluster_key='clusters', clusters=['RESTRICT_TO_RELEVANT_CLUSTERS'], use_raw=True)

This will compute correlations of genes with that respective lineage as well as their p and multiple-testing adjusted q-values. This will give you a set of genes which show interesting dynamics along your lineage. To plot these in a heatmap, use

model = cr.ul.models.GAM(adata_raw, n_knots=n_knots)fig_kwargs = {'model': model, 'genes': GENES_THAT_CORRLEATE_WITH_THIS_LINEAGE 'lineages': ['YOUR_LINEAGE'], 'cluster_key': 'clusters', 'time_key': YOUR_PSEUDOTIME, 'weight_threshold': CELLS_WITH_FATE_PROB_BELOW_THIS_THRESHOLD_WILL_NOT_CONTRIBUTE}cr.pl.heatmap(adata_raw, **fig_kwargs)

Please check the precise meaning and function of these function arguments in the API, it's all there. If you have more questions, please check out our examples https://cellrank.readthedocs.io/en/latest/auto_examples/index.html section, e.g. the example that shows how to plot a heatmap of gene expression trends.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theislab/cellrank/issues/462#issuecomment-762992074, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJVYTVQLDRCYLVGE2ASRCYTS2W5JDANCNFSM4WCWZ6JA .

Marius1311 commented 3 years ago

Great, let me know whether this worked, then we can close the issue.

Marius1311 commented 3 years ago

Any updates on this?

wangmhan commented 3 years ago

Hi, It works fine for me. Thank you!

Marius1311 commented 3 years ago

Great, closing this.