Closed reneemoerkens closed 1 year ago
Hi @reneemoerkens, this depends on the genes you selected to plot your heatmap. Please post the code snippet where you select genes and plot the actual heatmap.
As to your question of how to verify which cells are in each Lineage - you can simply look at the fate probabilities you computed, these determine the weight given to each cell during the lineage-specific expression trend plotting, e.g. in heatmaps. We don't threshold and explicitly assign cells to lineages, we rather assume gradual lineage commitment of cells to fates and use the fate-probabilities as cell-level weights when fitting GAMs to visualize expression trends. You can read more about this in the methods section for the CellRank 1 paper: https://www.nature.com/articles/s41592-021-01346-6
Hey @Marius1311, thank you for your answer. Here is the code snippet I used:
driver_genes = g.compute_lineage_drivers(
lineages=["Enterocyte type 1"], cluster_key="annotation_res0.34_new"
)
model = cr.models.GAM(adata, n_knots=6)
cr.pl.heatmap(
adata,
model=model, # use the model from before
lineages="Enterocyte type 1",
cluster_key="annotation_res0.34_new",
show_fate_probabilities=True,
genes=driver_genes.head(60).index,
time_key="latent_time",
figsize=(12, 10),
show_all_genes=True,
weight_threshold=(1e-3, 1e-3),
)
The genes I selected are the first 60 driver genes of the driver gene list (sorted on _corr column, highest first) found by the 'g.compute_lineage_drivers' function.
Thanks for referring to the paper for more information on the lineages, I will have a look. Because if I understand correctly the fate-probabilities are eventually used to select a subset of cells that are displayed in the Heatmap expression cascade plots, as depicted above. Right?
Hi @reneemoerkens, I'll go trough this step by step
clusters
in g.compute_lineage_drivers
, see https://cellrank.readthedocs.io/en/latest/api/_autosummary/estimators/cellrank.estimators.GPCCA.html#cellrank.estimators.GPCCA.compute_lineage_driversHey @Marius1311,
Thank you again for your quick response.
Hi @reneemoerkens, Re your second point:
If you already used moments imputation, this should be stored as Ms
in adata.layers
(this corresponds to imputed spliced counts, Mu
stores imputed unspliced counts). You should be able to use this data by passing data_key='Ms'
. However, I encourage you to inspect and validate this imputation beforehand, e.g. by plotting a few genes in the imputed modality in a low dimensional representation and checking whether you see the expected patterns, or by looking at marker genes for your clusters in the imputed modality - basically, just make sure you're happy with your imputed data before you proceed! MAGIC does not require a Palantir pseudotime, the two are just run in the same notebook (that also would not make sense, the MAGIC tool was introduced much earlier than the Palantir tool!)
Re the choosing between scVelo latent time & moments vs. Palantir pseudotime & MAGIC-imputed data: you can mix and match these, you could use the scVelo latent time with MAGIC imputed data. Generally, I would recommend to just compare these, using whatever prior knowledge you have about your data, and then making some choice. They're all good methods and work well in general, however, they do of course have specific assumptions that can or can not be met in your data. I recommend checking e.g. the assumptions behind RNA velocity and carefully making sure that they are met in your data.
Thank you so much for your extensive explanation, I will apply your suggestions and inspect the imputation in my dataset. This will definitely help me further in the analysis.
Great, thanks for the feedback, happy to help!
... Hello and thanks for this great package and the very helpful tutorials!
I was generating Heatmap expression cascade plots to visualize temporal activation of genes along trajectories. I noticed that for quite many of my lineages, the plot has an enrichment for genes that peak in the terminal states of these lineages, instead of identifying genes that peak in the different stages. Like the plot here:
Accidentally, I plotted a driver gene list belonging to Lineage A on Lineage B in this plot and it actually looked more informative than the correct plot above.
This made me wonder, are my lineages assigned in an incorrect manner? In short, I computed macrostates (g.compute_macrostates), then set terminal states (g.set_terminal_states), I did not specifically set initial states, and then computed fate probabilities (g.compute_fate_probabilities). I didn't use data_key="magic_imputed_data" in plotting the Heatmap, as was done in the tutorial. And how can I verify which cells are in each Lineage?
Perhaps you encountered something like this before and have some feedback on how to improve it.
Thank you!