Problem with Krylov-Schur decomposition

rostyhn commented 2 years ago

Hi, I have been corresponding with Dr. Reuter, and it seems like we have discovered an issue with the Krylov-Schur decomposition.

I am using the latest version of PyGPCCA (1.0.4), and using PETSc version 3.18 compiled with complex support and SLEPc version 3.18. The matrices I am using are attached as nano_pt_tm and nano_pt_700_tm respectively; they are stored as Python pickles of scipy.sparse.csr_matrices. The nano_pt matrix is a truncated version of the 700K matrix. Download them here: matrices.tar.gz.

These matrices are transition matrices corresponding to a molecular dynamics simulation of a platinum nano-particle at 700K, generated using ParSplice which uses LAMMPS internally; of course, this is different than the usual biological data.

What is interesting is that the Brandts method for decomposing the transition matrix works on both matrices, but it is far too slow for my needs, especially since the 700K matrix is one of the smallest matrices that I plan to work with.

The following code should set up the minimal working example:

  import pickle
  import pygpcca as gp
  m = None
  with open("nano_pt_tm.pickle", "rb") as f:
      m = pickle.load(f)
  gpcca = gp.GPCCA(m, z="LM", method="krylov")
  gpcca.optimize((2,20))

Here are the errors that I continually run into.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [12], line 2
      1 gpcca = gp.GPCCA(sparse.csr_matrix(m.values), z="LM", method="krylov")
----> 2 gpcca.optimize((2,9))

File ~/Programs/Python/NeoMDWeb/.venv/lib/python3.10/site-packages/pygpcca/_gpcca.py:1005, in GPCCA.optimize(self, m)
   1001 n_closed_components = len(closed_components)
   1002 # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
   1003 
   1004 # Calculate Schur matrix R and Schur vector matrix X, if not adequately given.
-> 1005 self._do_schur_helper(max(m_list))
   1007 if TYPE_CHECKING:
   1008     assert isinstance(self._p_X, np.ndarray)

File ~/Programs/Python/NeoMDWeb/.venv/lib/python3.10/site-packages/pygpcca/_gpcca.py:853, in GPCCA._do_schur_helper(self, m)
    851                 logging.info("Using pre-computed Schur decomposition")
    852 else:
--> 853     self._p_X, self._p_R, self._p_eigenvalues = _do_schur(
    854         self._P, eta=self._eta, m=m, z=self._z, method=self._method
    855     )

File ~/Programs/Python/NeoMDWeb/.venv/lib/python3.10/site-packages/pygpcca/_gpcca.py:254, in _do_schur(P, eta, m, z, method, tol_krylov)
    251     P_bar = np.diag(np.sqrt(eta)).dot(P).dot(np.diag(1.0 / np.sqrt(eta)))
    253 # Make a Schur decomposition of P_bar and sort the Schur vectors (and form).
--> 254 R, Q, eigenvalues = sorted_schur(P_bar, m, z, method, tol_krylov=tol_krylov)  # Pbar!!!
    256 # Orthonormalize the sorted Schur vectors Q via modified Gram-Schmidt-orthonormalization,
    257 # if the (Schur)vectors aren't orthogonal!
    258 if not np.allclose(Q.T.dot(Q * eta[:, None]), np.eye(Q.shape[1]), rtol=1e6 * EPS, atol=1e6 * EPS):

File ~/Programs/Python/NeoMDWeb/.venv/lib/python3.10/site-packages/pygpcca/_sorted_schur.py:410, in sorted_schur(P, m, z, method, tol_krylov)
    408     R, Q, eigenvalues = sorted_brandts_schur(P=P, k=m, z=z)
    409 elif method == "krylov":
--> 410     R, Q, eigenvalues, _ = sorted_krylov_schur(P=P, k=m, z=z, tol=tol_krylov)
    411 else:
    412     raise ValueError(f"Unknown method `{method!r}`.")

File ~/Programs/Python/NeoMDWeb/.venv/lib/python3.10/site-packages/pygpcca/_sorted_schur.py:277, in sorted_krylov_schur(P, k, z, tol)
    271 # getInvariantSubspace() gets an orthonormal basis of the computed invariant subspace.
    272 # It returns a list of vectors.
    273 # The returned real vectors span an invariant subspace associated with the computed eigenvalues.
    274 # We take the sequence of 1-D arrays and stack them as columns to make a single 2-D array.
    276 print(E.getInvariantSubspace())
--> 277 Q = np.column_stack([x.array for x in E.getInvariantSubspace()])
    279 try:
    280     # otherwise, R would be of shape `(k + 1, k)`
    281     E.getDS().setExtraRow(False)

File <__array_function__ internals>:180, in column_stack(*args, **kwargs)

File ~/Programs/Python/NeoMDWeb/.venv/lib/python3.10/site-packages/numpy/lib/shape_base.py:656, in column_stack(tup)
    654         arr = array(arr, copy=False, subok=True, ndmin=2).T
    655     arrays.append(arr)
--> 656 return _nx.concatenate(arrays, 1)

File <__array_function__ internals>:180, in concatenate(*args, **kwargs)
ValueError: need at least one array to concatenate

It seems like there isn't any error checking to see if the invariant subspace returned by the SLEPc solver is empty. When the values are set to 2,20:

/home/rostyhn/Programs/Python/NeoMDWeb/.venv/lib/python3.10/site-packages/pygpcca/_sorted_schur.py:295: UserWarning: The number of converged eigenpairs is `16
`, but `20` were requested.
  warnings.warn(

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [13], line 2
      1 gpcca = gp.GPCCA(sparse.csr_matrix(m.values), z="LM", method="krylov")
----> 2 gpcca.optimize((2,20))

File ~/Programs/Python/NeoMDWeb/.venv/lib/python3.10/site-packages/pygpcca/_gpcca.py:1005, in GPCCA.optimize(self, m)
   1001 n_closed_components = len(closed_components)
   1002 # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
   1003 
   1004 # Calculate Schur matrix R and Schur vector matrix X, if not adequately given.
-> 1005 self._do_schur_helper(max(m_list))
   1007 if TYPE_CHECKING:
   1008     assert isinstance(self._p_X, np.ndarray)

File ~/Programs/Python/NeoMDWeb/.venv/lib/python3.10/site-packages/pygpcca/_gpcca.py:853, in GPCCA._do_schur_helper(self, m)
    851                 logging.info("Using pre-computed Schur decomposition")
    852 else:
--> 853     self._p_X, self._p_R, self._p_eigenvalues = _do_schur(
    854         self._P, eta=self._eta, m=m, z=self._z, method=self._method
    855     )

File ~/Programs/Python/NeoMDWeb/.venv/lib/python3.10/site-packages/pygpcca/_gpcca.py:254, in _do_schur(P, eta, m, z, method, tol_krylov)
    251     P_bar = np.diag(np.sqrt(eta)).dot(P).dot(np.diag(1.0 / np.sqrt(eta)))
    253 # Make a Schur decomposition of P_bar and sort the Schur vectors (and form).
--> 254 R, Q, eigenvalues = sorted_schur(P_bar, m, z, method, tol_krylov=tol_krylov)  # Pbar!!!
    256 # Orthonormalize the sorted Schur vectors Q via modified Gram-Schmidt-orthonormalization,
    257 # if the (Schur)vectors aren't orthogonal!
    258 if not np.allclose(Q.T.dot(Q * eta[:, None]), np.eye(Q.shape[1]), rtol=1e6 * EPS, atol=1e6 * EPS):

File ~/Programs/Python/NeoMDWeb/.venv/lib/python3.10/site-packages/pygpcca/_sorted_schur.py:425, in sorted_schur(P, m, z, method, tol_krylov)
    421     Q, R, eigenvalues = Q[:, :m], R[:m, :m], eigenvalues[:m]
    424 # check the returned schur decomposition
--> 425 _check_schur(P=P, Q=Q, R=R, eigenvalues=eigenvalues, method=method)
    427 return R, Q, eigenvalues

File ~/Programs/Python/NeoMDWeb/.venv/lib/python3.10/site-packages/pygpcca/_sorted_schur.py:158, in _check_schur(P, Q, R, eigenvalues, method)
    154 # check whether things are real
    155 #for q in Q:
    156 #    print(q)
    157 if not np.all(np.isreal(Q)):
--> 158     raise TypeError(
    159         f"The orthonormal basis of the subspace returned by `method={method!r}` is not real. "
    160         "G-PCCA needs real basis vectors to work."
    161     )
    163 dummy = np.dot(P, csr_matrix(Q) if issparse(P) else Q)
    164 if issparse(dummy):

TypeError: The orthonormal basis of the subspace returned by `method='krylov'` is not real. G-PCCA needs real basis vectors to work.

Here, the subspace returned is not empty, but gets split into complex values, which doesn't work with GPCCA. After I received this error, I thought I might have to recompile PETSc with real support. Unfortunately, I was met with this error instead:

[0] EPSSolve() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/src/eps/interface/epssolve.c:146
[0] EPSSolve_KrylovSchur_Default() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/src/eps/impls/krylov/krylovschur/krylovschur.c:259
[0] BVMatArnoldi() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/src/sys/classes/bv/interface/bvkrylov.c:91
[0] BVOrthonormalizeColumn() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/src/sys/classes/bv/interface/bvorthog.c:402
[0] BVOrthogonalizeGS() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/src/sys/classes/bv/interface/bvorthog.c:177
[0] BVOrthogonalizeCGS1() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/src/sys/classes/bv/interface/bvorthog.c:101
[0] BV_SquareRoot_Default() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/include/slepc/private/bvimpl.h:362
[0] BV_SafeSqrt() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/include/slepc/private/bvimpl.h:127
[0] Missing or incorrect user input
[0] The inner product is not well defined: nonzero imaginary part -nan.

Thank you very much for all the work you have put into PyGPCCA, I hope that we can resolve this issue.

msmdev commented 2 years ago

@rostyhn, thank you for putting your issue on GitHub! I will do my best to help you with your problem. There is one question I would like to ask you for now: At which point exactly does the error for PETSc with real support occur? During compilation of PETSc or does it compile and the error occurs later during execution of pyGPCCA?

rostyhn commented 2 years ago

It occurs during the execution of PyGPPCA. On Fri, Nov 4, 2022 at 11:46 AM Bernhard Reuter @.***> wrote:

@rostyhn https://github.com/rostyhn, thank you for putting your issue on GitHub! I will do my best to help you with your problem. There is one question I would like to ask you for now: At which point exactly does the error for PETSc with real support occur? During compilation of PETSc or does it compile and the error occurs later during execution of pyGPCCA?

— Reply to this email directly, view it on GitHub https://github.com/msmdev/pyGPCCA/issues/48#issuecomment-1303996808, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALOF7TAN2G4S3Q6UIMZVWZLWGVKXVANCNFSM6AAAAAARXMGITM . You are receiving this because you were mentioned.Message ID: @.***>

msmdev commented 2 years ago

@rostyhn: ok, thanks. Could you give me some details on the OS you are using and from which source (pip, conda, ...) and how exactly you installed pyGPCCA, PETSc, and SLEPc?

msmdev commented 2 years ago

It occurs during the execution of PyGPPCA. … On Fri, Nov 4, 2022 at 11:46 AM Bernhard Reuter @.> wrote: @rostyhn https://github.com/rostyhn, thank you for putting your issue on GitHub! I will do my best to help you with your problem. There is one question I would like to ask you for now: At which point exactly does the error for PETSc with real support occur? During compilation of PETSc or does it compile and the error occurs later during execution of pyGPCCA? — Reply to this email directly, view it on GitHub <#48 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALOF7TAN2G4S3Q6UIMZVWZLWGVKXVANCNFSM6AAAAAARXMGITM . You are receiving this because you were mentioned.Message ID: @.>

Is the error message you gave the complete output or is there some more pyGPCCA related output indicating, e.g., the exact line in the pyGPCCA source code where the exception occurs?

rostyhn commented 2 years ago

I am using Arch Linux, with kernel version 6.0.6. I installed PyGPCCA with pypoetry, and then the python bindings for SLEPc / PETSc inside the virtual environment created by poetry with pip. The C versions of PETSc and SLEPc were built using the Arch user repository (AUR) packages found here and here. If you check the PKGBUILD for both packages, you can see that PETSc was compiled with

--with-shared-libraries=1 
--with-petsc4py=1 
--with-mpi-f90module-visibility=0

and SLEPc has no special flags turned on. The PETSc complex file is also on the [AUR](), and it has the same flags as the regular version, except --with-scalar-type=complex.

@rostyhn: ok, thanks. Could you give me some details on the OS you are using and from which source (pip, conda, ...) and how exactly you installed pyGPCCA, PETSc, and SLEPc?

rostyhn commented 2 years ago

It seems like the errors start in the sorted_schur.py file, after the decomposition is complete and it is being checked for errors. .../site-packages/pygpcca/_sorted_schur.py:158

msmdev commented 2 years ago

It seems like the errors start in the sorted_schur.py file, after the decomposition is complete and it is being checked for errors. .../site-packages/pygpcca/_sorted_schur.py:158

Even those?:

[0] EPSSolve() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/src/eps/interface/epssolve.c:146
[0] EPSSolve_KrylovSchur_Default() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/src/eps/impls/krylov/krylovschur/krylovschur.c:259
[0] BVMatArnoldi() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/src/sys/classes/bv/interface/bvkrylov.c:91
[0] BVOrthonormalizeColumn() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/src/sys/classes/bv/interface/bvorthog.c:402
[0] BVOrthogonalizeGS() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/src/sys/classes/bv/interface/bvorthog.c:177
[0] BVOrthogonalizeCGS1() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/src/sys/classes/bv/interface/bvorthog.c:101
[0] BV_SquareRoot_Default() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/include/slepc/private/bvimpl.h:362
[0] BV_SafeSqrt() at /home/rostyhn/.cache/yay/slepc/src/slepc-3.18.0/include/slepc/private/bvimpl.h:127
[0] Missing or incorrect user input
[0] The inner product is not well defined: nonzero imaginary part -nan.

These appear like internal errors of SLEPc to me, thus, they should happen before pyGPCCA error checking.

msmdev commented 2 years ago

@rostyhn, I'm asking so much about your problems with PETSc with real support, since you actually need to compile both PETSc and SLEPc with real support to work with pyGPCCA. You can see our compile config here. Actually, the simplest and mb. best option would be to install miniconda and then set up a conda environment including pygpcca and all dependencies via conda create -c conda-forge --name pygpcca_env pygpcca. In doing so, you would let conda install mpi4py/PETSc/petsc4py/SLEPc/slepc4py/pyGPCCA and other dependencies using our configs. If you don't want to use conda, you would have to compile them using the proper flags and not using --with-scalar-type=complex. After reinstalling with real support, you could rerun your experiment and then we will see if this already solves all of your problems (it will at least partially solve them) or not.

msmdev commented 2 years ago

@rostyhn: After successfully reinstalling everything with real support, before running gpcca.optimize((2,20)), you should run gpcca.minChi((2,20)) to calculate the minChi indicator for each cluster number n in [2, 20] and check which n are leading to an almost feasible (minChi approx. 0) starting guess. Then, run gpcca.optimize only for the n leading to a almost feasible starting guess according to the minChi indicator, e.g., gpcca.optimize(4), gpcca.optimize(8), if n=4 and n=8 lead to a minChi indicator near zero. While this is not a necessary approach (and might not always work, since there might be no minChi near zero), it will greatly speedup your optimization, since you will only run the actual optimization for n with a almost feasible (aka "good") starting guess.

rostyhn commented 2 years ago

It seems like installing with conda did indeed solve the issue... I will take a look at my configs to figure out why it doesn't work in my specific setup. I very much appreciate your patience and taking time to help me with my problem. Thank you! Once I get my config working, I would be happy to do a write up of how to set everything up manually to help others in the future.

msmdev commented 2 years ago

It seems like installing with conda did indeed solve the issue... I will take a look at my configs to figure out why it doesn't work in my specific setup. I very much appreciate your patience and taking time to help me with my problem. Thank you! Once I get my config working, I would be happy to do a write up of how to set everything up manually to help others in the future.

I'm very happy that we could solve your problem together. :) Your offer to fix and write up your manual setup for the community is very welcome. If you experience any further issues or if you have questions about any aspects of the method, results, and interpretation, don't hesitate to come back to me.

rostyhn commented 2 years ago

So it doesn't seem to be as simple as just a configuration issue... I managed to properly configure everything on my end, and I do get the results that I need, but the invariant subspace seems to be empty unless gpcca.minChi() is ran before running the optimization.

Also, here are my notes on setting up PyGPCCA without using conda:

#!/bin/sh
# make sure to install PETSc / SLEPc dependencies - this is system specific, so I left it out
# also, make sure that your virtual environment is activated - specific to the user, so I left that out

# need numpy, cython, mpi4py
pip install numpy cython mpi4py

# build PETSc
git clone https://gitlab.com/petsc/petsc
pushd petsc
git checkout --track origin/release
export PETSC_DIR=$PWD
export PETSC_ARCH=arch-linux-c-opt
python ./configure --with-cc=mpicc --with-fc=0 --with-cxx=mpicxx --with-debugging=0 --with-mpi=1 --with-shared-libraries=1 --with-mpi-f90module-visibility=0
make all
make check

# this is the key to building petsc4py successfully! otherwise, it will fail
NUMPY_INCLUDE="$(python -c 'import numpy; print(numpy.get_include())')"
ln -sfv "$NUMPY_INCLUDE/numpy" "$PETSC_DIR/$PETSC_ARCH/include"

# build & install petsc4py in current virtual environment
pushd src/binding/petsc4py
python setup.py install
popd 

# build SLEPc
git clone https://gitlab.com/slepc/slepc
pushd slepc
git checkout --track origin/release
export SLEPC_DIR=$PWD
python ./configure
make all
make check

# build & install slepc4py
pushd src/binding/slepc4py
python setup.py install
popd

# finally install pygpcca
pip install pygpcca

msmdev commented 2 years ago

@rostyhn, I was suspecting that there might be more than complex vs. real support to your issue. If I understand you correctly, even with real support you run into the "empty subspace issue" in line 277 (Q = np.column_stack([x.array for x in E.getInvariantSubspace()])) of sorted_schur for several cluster numbers n? Could you please give me some info on the n were this happens (which n, what are the associated minChi indicators)?

msmdev commented 2 years ago

Also, thank you for your notes! :)

rostyhn commented 2 years ago

@rostyhn, I was suspecting that there might be more than complex vs. real support to your issue. If I understand you correctly, even with real support you run into the "empty subspace issue" in line 277 (Q = np.column_stack([x.array for x in E.getInvariantSubspace()])) of sorted_schur for several cluster numbers n? Could you please give me some info on the n were this happens (which n, what are the associated minChi indicators)?

Without running minChi, this occurs for all n. If I run minChi beforehand, this problem disappears. This is what the minChi indicators look like for 2-20:

[-3.3306690738754696e-16,
 -0.019951925344584558,
 -0.024399348200346116,
 -0.10449125293120395,
 -0.10246537138915512,
 -0.010101483041100888,
 -0.01010164681142638,
 -0.2928589814706285,
 -0.21745401919272433,
 -0.17767704164279008,
 -0.09852022334462318,
 -0.048983077053167144,
 -0.048983077053169184,
 -0.046671525349176506,
 -0.049414772220244416,
 -0.10790689560114214,
 -0.06390117778283277,
 -0.06386227625599543,
 -0.06386227625599965]

msmdev / pyGPCCA

Problem with Krylov-Schur decomposition #48