theislab / cellrank

CellRank: dynamics from multi-view single-cell data
https://cellrank.org
BSD 3-Clause "New" or "Revised" License
342 stars 47 forks source link

RuntimeError: No coarse-grained stationary distribution found #1135

Open xyxuq opened 11 months ago

xyxuq commented 11 months ago

... Hi there,

I am trying to compute the initial and terminal states of cells from time series experiments with the codes in the attached cellrank_check_script.txt.

It worked when I ran the script with a small subset of data but failed with the whole dataset with 1,222,515 cells. When I run g.predict_initial_states(allow_overlap=False) , it gave me the error information RuntimeError: No coarse-grained stationary distribution found.

I check the scripts step by step; g.coarse_stationary_distribution is empty when I run with the whole dataset. Could you please help me check this issue? Thanks in ahead.

The log file is below.

Computing Schur decomposition
Adding `adata.uns['eigendecomposition_fwd']`
       `.schur_vectors`
       `.schur_matrix`
       `.eigendecomposition`
    Finish (1:59:34)
Computing `15` macrostates
Adding `.macrostates`
       `.macrostates_memberships`
       `.coarse_T`
       `.coarse_initial_distribution
       `.coarse_stationary_distribution`
       `.schur_vectors`
       `.schur_matrix`
       `.eigendecomposition`
    Finish (4:33:05)
Writing `GPCCA[kernel=RealTimeKernel[n=1222515], initial_states=None, terminal_states=None]` to `test.initial_terminal_state.fate_probabilities.pickle`
Adding `adata.obs['term_states_fwd']`
       `adata.obs['term_states_fwd_probs']`
       `.terminal_states`
       `.terminal_states_probabilities`
       `.terminal_states_memberships
    Finish`
Writing `GPCCA[kernel=RealTimeKernel[n=1222515], initial_states=None, terminal_states=['0_1', '0_2', '0_3', '14', '18_1', '18_2', '18_3', '19_1', '19_2', '5_1', '5_2', '5_3', '6_1', '6_2', '6_3']]` to `test.initial_terminal_state.fate_probabilities.pickle`
Traceback (most recent call last):
  File "cellrank_macrostates.test.py", line 45, in <module>
    g.predict_initial_states(allow_overlap=False)
  File "~/miniconda3/envs/cellrank/lib/python3.11/site-packages/cellrank/estimators/terminal_states/_gpcca.py", line 368, in predict_initial_states
    raise RuntimeError("No coarse-grained stationary distribution found.")
RuntimeError: No coarse-grained stationary distribution found.

The version of packages:

cellrank==2.0.0 scanpy==1.9.5 anndata==0.9.2 numpy==1.24.4 numba==0.57.1 scipy==1.11.2 pandas==1.5.3 pygpcca==1.0.4 scikit-learn==1.1.3 statsmodels==0.14.0 python-igraph==0.10.8 scvelo==0.3.0 pygam==0.8.0 matplotlib==3.6.3 seaborn==0.12.2

cellrank_check_script.txt

Marius1311 commented 11 months ago

mh, not sure what's going on here, do you have any idea @michalk8 ?

xyxuq commented 10 months ago

I randomly took subsets of the whole dataset and increased the cells by 5% for each subset. At most, only up to 30% cells (366,754) could run successfully.

michalk8 commented 10 months ago

I check the scripts step by step; g.coarse_stationary_distribution is empty when I run with the whole dataset.

The (coarse) stationary distribution is not guaranteed to always exist. Running g.predict_initial_states(allow_overlap=False) sometimes needs it when there is only 1 initial macrostate detected automatically.

To overcome this, if you know how many initial macrostates you expect, you can pass it as g.predict_initial_states(n_states=..., allow_overlap=False), since this won't require access to the coarse stationary distribution.

ramadatta commented 8 months ago

I check the scripts step by step; g.coarse_stationary_distribution is empty when I run with the whole dataset.

The (coarse) stationary distribution is not guaranteed to always exist. Running g.predict_initial_states(allow_overlap=False) sometimes needs it when there is only 1 initial macrostate detected automatically.

To overcome this, if you know how many initial macrostates you expect, you can pass it as g.predict_initial_states(n_states=..., allow_overlap=False), since this won't require access to the coarse stationary distribution.

Hi @michalk8,

I tried your comment above but still I receive the same error.

g.fit(cluster_key="annotation_cell_states", n_states=[0, 25],n_cells=15)
Computing Schur decomposition
Adding `adata.uns['eigendecomposition_fwd']`
       `.schur_vectors`
       `.schur_matrix`
       `.eigendecomposition`
    Finish (0:00:14)
WARNING: Minimum value must be larger than `1`, found `2`. Setting `min=2`
WARNING: In most cases, 2 clusters will always be optimal. If you really expect 2 clusters, use `n_states=2`. Setting `min=3`
Calculating minChi criterion in interval `[3, 25]`
Computing `22` macrostates
Adding `.macrostates`
       `.macrostates_memberships`
       `.coarse_T`
       `.coarse_initial_distribution
       `.coarse_stationary_distribution`
       `.schur_vectors`
       `.schur_matrix`
       `.eigendecomposition`
    Finish (0:04:44)
GPCCA[kernel=PseudotimeKernel[n=93458], initial_states=None, terminal_states=None]
g.predict_initial_states(n_states=22, allow_overlap=False)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[87], line 1
----> 1 g.predict_initial_states(n_states=22, allow_overlap=False)

File ~/anaconda3/envs/trajectories_1/lib/python3.11/site-packages/cellrank/estimators/terminal_states/_gpcca.py:368, in GPCCA.predict_initial_states(self, n_states, n_cells, allow_overlap)
    366 stat_dist = self.coarse_stationary_distribution
    367 if stat_dist is None:
--> 368     raise RuntimeError("No coarse-grained stationary distribution found.")
    370 states = list(stat_dist[np.argsort(stat_dist)][:n_states].index)
    371 return self.set_initial_states(states, n_cells=n_cells, allow_overlap=allow_overlap)

RuntimeError: No coarse-grained stationary distribution found.

I have tried n_states from 1 to 22, but still I experience the same error. May I know if this can be fixed? Many thanks!

Marius1311 commented 5 months ago

mh, any idea @michalk8 ?

shaln commented 5 months ago

Hi, thought I'd mention that I've been getting the same error too, though only when computing the initial state. Both with and without specifying the n_states.

I was able to compute the terminal states with no errors.