theislab / cellrank

CellRank: dynamics from multi-view single-cell data
https://cellrank.org
BSD 3-Clause "New" or "Revised" License
341 stars 47 forks source link

PETSc Error when computing the Schur vectors #484

Closed wangjiawen2013 closed 3 years ago

wangjiawen2013 commented 3 years ago

I ran some of the cellrank examples following cellrank's documentation but can't compute schur vectors and schur matrix.

import cellrank as cr adata = cr.datasets.pancreas_preprocessed("../example.h5ad") k = cr.tl.transition_matrix( adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False) g = cr.tl.estimators.GPCCA(k) g.compute_schur(n_components=6)

PETSc Error --- Open MPI library version Open MPI v4.1.0, package: Open MPI conda@77195665eba6 Distribution, ident: 4.1.0, repo rev: v4.1.0, Dec 18, 2020 does not match what PETSc was compiled with 4.0, aborting The MPI_Comm_size() function was called before MPI_INIT was invoked. This is disallowed by the MPI standard. *** Your MPI job will now abort. [Gilbert2:40504] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

martijn-cordes commented 3 years ago

I am also running into the same issue. Running cellrank 1.2.0.

michalk8 commented 3 years ago

Seems like a version mismatch between PETSc and OpenMPI (4.1.0 was released ~3 weeks ago on conda). I will have to check this out further if this also happens during pip installation, in the meantime, I found version 4.0.2 on conda that could temporarily fix this issue: https://anaconda.org/anaconda/openmpi

Marius1311 commented 3 years ago

@michalk8, how should we proceed here?

wangjiawen2013 commented 3 years ago

I ran cellrank 1.1.0 with OpenMPI 4.1.0, It seems that the sign of the schur vectors is opposite to that on https://cellrank.readthedocs.io/en/latest/auto_examples/estimators/compute_schur_vectors.html#sphx-glr-auto-examples-estimators-compute-schur-vectors-py. image

The same thing happened when computing Schur matrix https://cellrank.readthedocs.io/en/latest/auto_examples/estimators/compute_schur_matrix.html#sphx-glr-auto-examples-estimators-compute-schur-matrix-py QQ图片20210315173654 Why the results are different (caused by petsc4py and slepc library)? Does it affect the downstream computation ? By the way, cellrank 1.2.0 worked well without petsc4py and slepc library. I can get the same results with the correct sign.

Marius1311 commented 3 years ago

Hi @wangjiawen2013 don't worry about the sign structure of the Schur vectors, this is expected from theory as the Schur decomposition is not unique see https://en.wikipedia.org/wiki/Schur_decomposition. This does not affect the downstream results, our macrostates and corase-grained transition probabilities are unique, even though the Schur decomposition is not.

Marius1311 commented 3 years ago

Hi @wangjiawen2013, I'm assuming that your issue has been solved and I'm closing this, feel free to reopen if you have more questions related to this.

dawe commented 3 years ago

Hi all, I had this very same issue today. OpenMPI 4.1.1 and PETSc 3.15 (both from conda-forge). Is the only solution downgrading openmpi (as pointed above by @michalk8)?

michalk8 commented 3 years ago

Hi all, I had this very same issue today. OpenMPI 4.1.1 and PETSc 3.15 (both from conda-forge). Is the only solution downgrading openmpi (as pointed above by @michalk8)?

Locally on my machine, I can't seem to reproduce this (not sure why the mismatch is happening in the first place). Here are the steps I've did:

conda create --name petsc_test python=3.8
conda activate petsc_test
conda install -c bioconda -c conda-forge cellrank-krylov

Then, I ran the Kernels and Estimators tutorial, it worked as expected. My versions:

import petsc4py
import slepc4py
import mpi4py

print(petsc4py.__version__)
print(slepc4py.__version__)
print(mpi4py.__version__)
# 3.15.0
# 3.15.0
# 3.0.3

I've also tried conda install on my main env with pre-installed openmpi==4.1.0, petsc4py==3.14.1, slepc4py==3.14.0 and it has also worked. I am attaching the environment.yml, just for the sake of completeness: env.yml.txt

dawe commented 3 years ago

It seems to work on my local computer as well. It may then be something with the cluster I'm using, SLURM and its interaction with openmpi. I'll check this with the sysadm (slurm here is configured with MPI support)