Different results from PseudotimeKernal vs PalantirKernel

theislab / cellrank

CellRank: dynamics from multi-view single-cell data

https://cellrank.org

BSD 3-Clause "New" or "Revised" License

341 stars 47 forks source link

Different results from PseudotimeKernal vs PalantirKernel #559

Closed mzhibo closed 3 years ago

mzhibo commented 3 years ago

I just updated from v1.2 (conda) to v1.3 (pip). When I rerun a notebook, I noticed the Schur decomposition results and macrostates computation results are very different. The older 1.2 version makes more sense in my case. I wonder if the PseudotimeKernal is the supposed to be the same as the old PlantireKernal. Do we need to specify any parameters if we want to match some previous analysis?

...

from cellrank.tl.kernels import PalantirKernel
pk=PalantirKernel(adata, time_key='monocle3_pseudotime').compute_transition_matrix()
print(pk)

from cellrank.tl.kernels import PseudotimeKernel
pk=PseudotimeKernel(adata, time_key='monocle3_pseudotime',).compute_transition_matrix()
print(pk)

The pseudotimeKernal result looks less robust.

Versions:

Running CellRank 1.2.0, on 2021-04-08 07:20. (installed from conda) Running CellRank 1.3.0, on 2021-04-08 07:20. (installed from pip)

michalk8 commented 3 years ago

Hi @mzhibo , I believe this is due to the change in the defaults we did in 1.3 (using a weighting scheme when pruning edges backwards in pseudotime instead of just removing them). The below code should fix your issue:

pk=PseudotimeKernel(adata, time_key='monocle3_pseudotime',).compute_transition_matrix(threshold_scheme='hard')

mzhibo commented 3 years ago

HI @michalk8 Thanks for the explaination. I tried with your suggestion on setting threshold_scheme to "hard". It helped a little but still much less robust as compared to the PalantirKernel. Further increasing the frac_to_keep and/or n_neighs helped. In my case, the result is comparable to that of PalantirKernal when the frac_to_keep is set from 0.3 to 1. Any suggestions on how to optimize these parameters on other dataset that I don't know much about? Will these parameter change affect find_lineage_driver results much, if a less mature state is merged into the more mature macrostate? Thanks, Zhibo

Marius1311 commented 3 years ago

Mhm, I actually though the results should stay the same if we use the PseudotimeKernel with a hard thresholding scheme - @michalk8, do you recall whether we had to change our tests, or did we get the same results for our test cases?

michalk8 commented 3 years ago

Mhm, I actually though the results should stay the same if we use the PseudotimeKernel with a hard thresholding scheme - @michalk8, do you recall whether we had to change our tests, or did we get the same results for our test cases?

Based on https://github.com/theislab/cellrank/blob/master/tests/test_kernels.py#L568 frac_to_keep = 1 / 3.0 in 1.3.1 should be equal to k=3 in 1.2.0. Apart from that, the other change was:

# from 1.2.0
k_thresh = np.min([int(np.floor(n_neighbors / k)) - 1, 30])
# to 1.3.{0,1}
k_thresh = max(0, min(30, int(np.floor(n_neighs * frac_to_keep)) - 1))

to make sure k_thresh is always non-negative.

Marius1311 commented 3 years ago

mhm, that's bit weird. In any case, @mzhibo, setting fract_to_keep to ` does not make such sense - you don't get a directed transition matrix in this way. Usually, the default setting here should work well.

Marius1311 commented 3 years ago

@michalk8, how to proceed here?

michalk8 commented 3 years ago

I will recheck once more and report what I've found.

michalk8 commented 3 years ago

I've checked it again right now, the values are correct. in #582, I'm just changing how k_thresh was determined (removing the useless -1 so that if better corresponds to the expection), but I doubt such small change would/should have such drastic effects on the transition matrix.