set_terminal_states does not calculate terminal_states_probabilities

tt2190 commented 2 years ago

Thanks so much for developing this tool.

I am analysing my own dataset for which RNA velocity does not work well, so I have followed the "CellRank beyond RNA velocity" tutorial and it worked very well for fate probability inference.

Now I would like to infer the lineage drivers but when running the tl.lineage.drivers, the following error occurred. KeyError: "Unable to find transition matrix in `adata.obsp['T_fwd']`."

Since this indicates that the transition matrix is not updated to the adata object, I updated it manually by: adata.obsp['T_fwd'] = g_fwd.transition_matrix

Then, tl.lineage.drivers gave a different error: RuntimeError: Compute absorption probabilities first as `cellrank.tl.lineages(..., backward=False)`.

In fact, adata does not have .obs['terminal_states_probs'] column, and what is more, I found that g_fwd.terminal_states_probabilities is empty after manually setting the terminal states by set_terminal_states

This issue was reproduced with the pancreas dataset, and I attach the notebook including my codes and the reproduced errors.

I would be grateful if you could enlighten me on how to update an adata with required information (eg, obsp['T_fwd'], obs['terminal_states_probs'], etc.) for the downstream lineage driver inference.

cellrank_pancreas_PseudotimeKernel.ipynb.txt

michalk8 commented 2 years ago

Hi @tt2190 , sorry for the late reply.

Now I would like to infer the lineage drivers but when running the tl.lineage.drivers, the following error occurred. KeyError: "Unable to find transition matrix in adata.obsp['T_fwd']." Since this indicates that the transition matrix is not updated to the adata object, I updated it manually by: adata.obsp['T_fwd'] = g_fwd.transition_matrix

Since I assume you have g: GPCCA object, why not just run g.compute_lineage_drivers? That way, you don't have to set the transition matrix in AnnData.

In fact, adata does not have .obs['terminal_states_probs'] column, and what is more, I found that g_fwd.terminal_states_probabilities is empty after manually setting the terminal states by set_terminal_states

For lineage drivers computation, you need to have data in, e.g., adata.obsm['to_terminal_states'], which contain the fate (absorption) probabilities. .obs['terminal_states_probs'] is juts a vector that gives you the probability of each cell being a terminal state (doesn't matter which) and is not used in driver computation.

I found that g_fwd.terminal_states_probabilities is empty after manually setting the terminal states by set_terminal_states

Yes, this is by design. Once you set your terminal states, you still need to compute the absorption probabilities towards them (this also needs the transition matrix) using g.compute_absorption_probabilities. Then you should be able to run g.compute_lineage_drivers. Am attaching a small code snippet below in case anything is not clear:

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed()

k = cr.tl.transition_matrix(adata)
g = cr.tl.estimators.GPCCA(k)
g.set_terminal_states({"Beta": adata[adata.obs['clusters'] == "Beta"].obs_names[:30],
                       "Alpha": adata[adata.obs['clusters'] == "Alpha"].obs_names[:30]})
g.compute_absorption_probabilities()
df = g.compute_lineage_drivers()

tt2190 commented 2 years ago

Many thanks @michalk8, I was not aware of the g.compute_lineage_drivers method and it now works with my data!

theislab / cellrank

set_terminal_states does not calculate terminal_states_probabilities #872