msmdev / pyGPCCA

pyGPCCA - python GPCCA: Generalized Perron Cluster Cluster Analysis package to coarse-grain reversible and non-reversible Markov state models.
https://pygpcca.readthedocs.io/en/latest/index.html
GNU Lesser General Public License v3.0
20 stars 4 forks source link

Threshold too tight for Schur decomposition #27

Closed WeilerP closed 3 years ago

WeilerP commented 3 years ago

Description

Using CellRank with pyGPCCA on an Arabidopsis single-cell dataset, the threshold determining if values are negative is too tight when computing the Schur decompositon.

Reproducible example

import scanpy as sc
import scvelo as scv
import cellrank as cr
from cellrank.tl.estimators import GPCCA

sc.pp.filter_genes(adata, min_cells=10)
scv.pp.normalize_per_cell(adata)
sc.pp.log1p(adata)

# annotate highly variable genes, but don't filter them out
sc.pp.highly_variable_genes(adata)
print(f"This detected {np.sum(adata.var['highly_variable'])} highly variable genes. ")

scv.pp.moments(adata)

ctk = CytoTRACEKernel(adata)
ctk.compute_transition_matrix()
ctk.compute_projection()

g_fwd = GPCCA(ctk)
g_fwd.compute_schur(n_components=20)
Traceback ```pytb --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /code/cellrank/cellrank/tl/estimators/_gpcca.py in compute_macrostates(self, n_states, n_cells, use_min_chi, cluster_key, en_cutoff, p_thresh) 178 try: --> 179 self._gpcca = self._gpcca.optimize(m=n_states) 180 except ValueError as e: ~/miniconda3/envs/cellrank/lib/python3.8/site-packages/pygpcca/_gpcca.py in optimize(self, m) 1015 # Xm = np.copy(X[:, :m]) -> 1016 chi, rot_matrix, crispness = _gpcca_core(self._p_X[:, :m]) 1017 # check if we have at least m dominant sets. If less than m, we warn. ~/miniconda3/envs/cellrank/lib/python3.8/site-packages/pygpcca/_gpcca.py in _gpcca_core(X) 638 --> 639 rot_matrix, chi, fopt = _opt_soft(X, rot_matrix) 640 ~/miniconda3/envs/cellrank/lib/python3.8/site-packages/pygpcca/_gpcca.py in _opt_soft(X, rot_matrix) 518 if np.any(chi < -10 * EPS): --> 519 raise ValueError(f"Some elements of chi are significantly negative: < {-10 * EPS}.") 520 else: ValueError: Some elements of chi are significantly negative: < -2.220446049250313e-15. During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) ~/.ch_tmp/ipykernel_27625/1301213408.py in ----> 1 g_fwd.compute_macrostates(n_states=3, cluster_key='celltype') 2 g_fwd.plot_macrostates(discrete=True, legend_loc='right', size=100) /code/cellrank/cellrank/tl/estimators/_gpcca.py in compute_macrostates(self, n_states, n_cells, use_min_chi, cluster_key, en_cutoff, p_thresh) 184 n_states += 1 185 logg.warning(f"{e}\nIncreasing `n_states` to `{n_states}`") --> 186 self._gpcca = self._gpcca.optimize(m=n_states) 187 188 self._set_macrostates( ~/miniconda3/envs/cellrank/lib/python3.8/site-packages/pygpcca/_gpcca.py in optimize(self, m) 1014 # Reduce X according to m and make a work copy. 1015 # Xm = np.copy(X[:, :m]) -> 1016 chi, rot_matrix, crispness = _gpcca_core(self._p_X[:, :m]) 1017 # check if we have at least m dominant sets. If less than m, we warn. 1018 nmeta = np.count_nonzero(chi.sum(axis=0)) ~/miniconda3/envs/cellrank/lib/python3.8/site-packages/pygpcca/_gpcca.py in _gpcca_core(X) 637 rot_matrix = _initialize_rot_matrix(X) 638 --> 639 rot_matrix, chi, fopt = _opt_soft(X, rot_matrix) 640 641 # calculate crispness of the decomposition of the state space into m clusters ~/miniconda3/envs/cellrank/lib/python3.8/site-packages/pygpcca/_gpcca.py in _opt_soft(X, rot_matrix) 517 if np.any(chi < 0.0): 518 if np.any(chi < -10 * EPS): --> 519 raise ValueError(f"Some elements of chi are significantly negative: < {-10 * EPS}.") 520 else: 521 chi[chi < 0.0] = 0.0 ValueError: Some elements of chi are significantly negative: < -2.220446049250313e-15. ```
msmdev commented 3 years ago

@Marius1311 I think we could safely lower the negative threshold for chi here, mb. even down to 10e-12? What is your opinion on this?

Marius1311 commented 3 years ago

Sure, agreed. @WeilerP, what value would we have to change the threshold to, so that this passes?

WeilerP commented 3 years ago

Sure, agreed. @WeilerP, what value would we have to change the threshold to, so that this passes?

I set EPS to 1e-10 but didn't check at which bound it starts failing.

msmdev commented 3 years ago

Sure, agreed. @WeilerP, what value would we have to change the threshold to, so that this passes?

I set EPS to 1e-10 but didn't check at which bound it starts failing.

@WeilerP, were exactly did you set EPS to 1e-10?

WeilerP commented 3 years ago

Ah sorry, should have been more specific. I manually set EPS in pygpcca/utils/_constants.py and installed pyGPCCA from source in editable mode via pip install -e pyGPCCA s.t. I could run the above code.

msmdev commented 3 years ago

Ah sorry, should have been more specific. I manually set EPS in pygpcca/utils/_constants.py and installed pyGPCCA from source in editable mode via pip install -e pyGPCCA s.t. I could run the above code.

@WeilerP , it is not intended to manipulate EPS in pygpcca/utils/_constants.py. This could have all kinds of unintended effects. Please revert this. Could you please instead print out the actual values of chi in _opt_soft(X, rot_matrix) before the error is raised? Thus, we can see what the actual negative elements are.

msmdev commented 3 years ago

@WeilerP, is the issue still occurring? We are currently working on printing out the deviations in such cases, see #29 .

WeilerP commented 3 years ago

@msmdev, sorry for the delay. The issue seems to be numerical since I could not reproduce the error on a different machine. Will come back to it should the error pop up again. #29 should help investigate it further.

msmdev commented 3 years ago

@WeilerP, alright - no problem! Come back anytime :) In the meantime we will try to make things more convenient.