Fate probabilities towards aggregated terminal states

Marius1311 commented 1 year ago

Hi all, I have a conceptual comment/question regarding CellRank's current handling of aggregated terminal states. Say I compute the macrostates A, B, C, D, E, and I want to aggregate them as terminal states 1: A and B, 2: C and D, 3: E. This can be done conveniently in the set_terminal_states_from_macrostates method. Now, I think under the hood, this method selects the 30 most confidently assigned cells for each aggregated terminal state and uses these to compute fate probabilities, when I call g.compute_absorption_probabilities. I think this is not really the intended behavior: say macrostate A is really dominant, then aggregated terminal state 1 will have almost exclusively A cells, and won't really represent the combination of A and B. The same holds for fate probabilities, these won't really be representative of the aggregated terminal state, but of whatever individual macrostate is dominant. I'm just observing this behavior in one data example and I find it a bit troubling.

Instead, what would be potentially better is to keep all 30 cells from both A and B, to use 60 cells to represent terminal state 1, and the same for all aggregated terminal states. What do you think @michalk8 @WeilerP ? An alternative would be to randomly sample from these 60 cells until we have 30, but I'm not sure that's what we want.

Marius1311 commented 1 year ago

Or is there some way to already achieve the behavior I would like in the current API?

Marius1311 commented 1 year ago

Here is an example. I have the macrostates below:

After aggregation using g.set_terminal_states_from_macrostates(names=[ 'Excretory_gland, AMso', 'ASH, AWC', 'RIM, SIB, AVK']), I get for terminal states:

I would argue that these cells are not fully representative of the terminal states I would like to use.

Marius1311 commented 1 year ago

Introduce a parameter called agg={"union", "top_n"}, default should be top_n.

theislab / cellrank

Fate probabilities towards aggregated terminal states #1000