Closed roofya closed 4 years ago
Hi @roofya ,
sounds very strange.
I assume you're using the SLEPc/PETSc libraries from cellrank-krylov
. If so, can you please post the output of
python -c "import slepc4py; import petsc4py; print(slepc4py.__version__, petsc4py.__version__)"
?
Currently, only this line comes to my mind is densifying the matrix (https://github.com/msmdev/msmtools/blob/krylov_schur/msmtools/util/sorted_schur.py#L283) if SLEPc/PETSc is NOT installed (however, after testing this locally, my notebook doesn't crash).
As the next thing: could please run start your notebook as jupyter notebook --debug > log.txt 2>&1
and post the log.txt
here (ideally as an attachement, might get big)?
Finally, what's your Python version and OS? I've tested it using fresh conda environment with cellrank-krylov
(Python3.8.5, Debian bullseye) and no crash has happened.
Apart from the above, maybe this thread can help to solve the issue: https://github.com/jupyter/notebook/issues/1892
@michalk8 Thank you so much it seems that the problem was related to SLEPC/PETSC. I have uninstall and installed them again and now works fine.
Awesome! Thanks @michalk8 for fixing this so quickly! @roofya, great that you're checking out CellRank, let us know via issues in case you encounter any other problems - we're happy to help.
Good day,
I am having the same problem. When running cr.tl.terminal_states
the kernel dies. I decided to create a new env, installing all packages through conda, but still having the same issue. In the output from command line while excecuitng the notebook, there is the next error message:
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[n0078.compute.hpc:71628] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
When installing thought conda on the new env, I noticed the next message:
For Linux 64, Open MPI is built with CUDA awareness but this support is disabled by default.
To enable it, please set the environmental variable OMPI_MCA_opal_cuda_support=true before launching your MPI processes. Equivalently, you can set the MCA parameter in the command line: mpiexec --mca opal_cuda_support 1 ...
I tried to export OMPI_MCA_opal_cuda_support=true
before launching the Jupiter notebook, but still the kernel dies. I also tried to install SLEPC/PETSC via pip but always get an error and the packages are not installed.
Answering the questions you asked to the other person with the same problem:
python -c "import slepc4py; import petsc4py; print(slepc4py.__version__, petsc4py.__version__)"
3.13.0 3.13.0
What do you think the problem might be? Thanks in advance for your help!
I did a little digging, both of the links below mention that this can happen when openMPI was present on the system and can be solved by reinstalling it:
Hope this helps.
@ccruizm , did this solve your issue?
Unfortunately, it did not. I have access to another server, so I created a new conda env with cellrank
and there it ran with no issues. I do not understand why this happens with my other HPC. I had the 'same' issue before using scVelo
(https://github.com/theislab/scvelo/issues/198). I could not find why it kept killing the kernel but it ran in the other server. However, after they updated the package (from v.0.2.0. to v.0.2.2. the problem was solved!), so I do not know why this keeps happening
Any thoughts? Thanks!
Hi @michalk8 @Marius1311
I'm experiencing the same issue as @roofya. My kernel always dies on Computing Schur decomposition
when I run cr.tl.initial_states
. I'm using my own dataset which is 23k cells. I have removed and reinstalled the dependencies you mentioned but the problem persists. I also tried to downsample my data to 6k cells but that didn't help.
I ran python -c "import slepc4py; import petsc4py; print(slepc4py.__version__, petsc4py.__version__)"
and get 3.16.1 3.16.1
.
I also have the log.txt log.txt
Do you have any suggestions for this problem? Thank you!
Hi @Doris-Fu, could you please try whether this works if you don't use SLEPSc/PETSc? You can do that easily by running via the low-level mode (check the kernels and estimators tutorial) and passing method="brandts"
in estimator.compute_schur()
, see https://cellrank.readthedocs.io/en/stable/api/cellrank.tl.estimators.GPCCA.compute_schur.html#cellrank.tl.estimators.GPCCA.compute_schur
@Marius1311 Yes this worked for me! Thank you a lot!
Hi there,
I would also report that I am having the same issue as @Doris-Fu , which is that the kernel is died immediately after Computing Schur decomposition. The slepc and petsc versions are as follows: python -c "import slepc4py; import petsc4py; print(slepc4py.version, petsc4py.version)" 3.17.2 3.17.4
One observation is that all my slepc and petsc are installed via conda from conda-forge. I have tried to reinstall them with pip and got error message.
I have tried to run the low-level mode with method='brandts'
in estimator.compute_schur()
and g.compute_absorption_probabilities(use_petsc = False)
to get around without SLEPSc/PETSc.
I have ~50k cells and it will work in term of running time ? The analysis will make any difference between with/without SLEPSc/PETSc ?
Thanks.
Hi @lengfei5, results will be equivalent, just compute time will be very much longer if you use brandts
.
Hi all, I got into the same error. I thought it could be something about jupyter, so I launched it in the terminal. I am using cluster through SLURM srun into a node. The error was far more informative:
The application appears to have been direct launched using "srun", but OMPI was not built with SLURM's PMI support and therefore cannot execute. There are several options for building PMI support under SLURM, depending upon the SLURM version you are using:
version 16.05 or later: you can use SLURM's PMIx support. This requires that you configure and build SLURM --with-pmix.
Versions earlier than 16.05: you must use either SLURM's PMI-1 or PMI-2 support. SLURM builds PMI-1 by default, or you can manually install PMI-2. You must then build Open MPI using --with-pmi pointing to the SLURM PMI library location.
An error occurred in MPI_Init_thread on a NULL communicator MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, and potentially your MPI job) [srcn02:06565] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!"
Hello,
I'm experiencing the same issue as above. I was able to work around the error for compute_shur() by specifying method="brandts", however I am now receiving the same error message when trying to run compute_fate_probabilities(). I am also on a HPC using slurm.
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:
version 16.05 or later: you can use SLURM's PMIx support. This
requires that you configure and build SLURM --with-pmix.
Versions earlier than 16.05: you must use either SLURM's PMI-1 or
PMI-2 support. SLURM builds PMI-1 by default, or you can manually
install PMI-2. You must then build Open MPI using --with-pmi pointing
to the SLURM PMI library location.
Please configure as appropriate and try again.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[qnode2038:59792] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
do you have any idea how this could be resolved @michalk8 ?
Hi, My workaround was to launch the JupyterLab with sbatch and not with an interaction session using srun. Apparently, the --with-pmi is automatically loaded with sbatch but not with srun. Everything worked after that. You just need to check for the node your sbatch job is in and then open the tunnels or whatever you need to open your Jupyter session on the browser. HTH Jose
Jose M. Garcia Manteiga PhD Computational Biologist Center for Translational Genomics and BioInformatics Dibit2-Basilica, 4A3 San Raffaele Scientific Institute Via Olgettina 58, 20132 Milano (MI), Italy
Tel: +39-02-2643-9211 e-mail: @.***
Il giorno mar 23 gen 2024 alle ore 00:00 jnmaciuch @.***> ha scritto:
Hello,
I'm experiencing the same issue as above. I was able to work around the error for compute_shur() by specifying method="brandts", however I am now receiving the same error message when trying to run compute_fate_probabilities(). I am also on a HPC using slurm. `OPAL ERROR: Unreachable in file pmix3x_client.c at line 111
The application appears to have been direct launched using "srun", but OMPI was not built with SLURM's PMI support and therefore cannot execute. There are several options for building PMI support under SLURM, depending upon the SLURM version you are using:
version 16.05 or later: you can use SLURM's PMIx support. This requires that you configure and build SLURM --with-pmix.
Versions earlier than 16.05: you must use either SLURM's PMI-1 or PMI-2 support. SLURM builds PMI-1 by default, or you can manually install PMI-2. You must then build Open MPI using --with-pmi pointing to the SLURM PMI library location. Please configure as appropriate and try again.
An error occurred in MPI_Init_thread on a NULL communicator MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, and potentially your MPI job) [qnode2038:59792] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! `
— Reply to this email directly, view it on GitHub https://github.com/theislab/cellrank/issues/399#issuecomment-1904979368, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2UOMKOJDO5QC5WNF65P6TYP3VQJAVCNFSM4R3ZYQ3KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJQGQ4TOOJTGY4A . You are receiving this because you commented.Message ID: @.***>
@josegarciamanteiga thank you, that seemed to solve the issue! I am just getting the following warning when I run compute_fate_probabilities():
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: qnode0465
Local device: mlx5_0
--------------------------------------------------------------------------
However the function seems to be completing without errors, so I'm guessing this is okay to ignore.
@josegarciamanteiga thank you, that seemed to solve the issue! I am just getting the following warning when I run compute_fate_probabilities():
-------------------------------------------------------------------------- WARNING: There was an error initializing an OpenFabrics device. Local host: qnode0465 Local device: mlx5_0 --------------------------------------------------------------------------
However the function seems to be completing without errors, so I'm guessing this is okay to ignore.
So I'm using PETSc/SLEPc for a completely different application, but this is one of the only results on Google for someone getting the same obscure error as me. Running on a Sun Grid Engine-managed cluster computer, I get that error and it only uses a single CPU core (despite requesting three), even though when I run on a regular computer it very efficiently parallelises. I'm not using any kind of MPI, I only specify #$ -pe smp 3
. So I wonder if the lack of parallelism and that error are related.
Hi I'm running tutorial Pancreas Basics and It's working fine until to velocity part but for cr.tl.terminal_states(adata, cluster_key='clusters', weight_connectivities=0.2) Computing transition matrix based on velocity correlations using
'deterministic'
mode Estimatingsoftmax_scale
using'deterministic'
mode 100% 2531/2531 [00:03<00:00, 714.91cell/s]Setting
softmax_scale=3.7951
100% 2531/2531 [00:01<00:00, 1420.24cell/s]Using a connectivity kernel with weight
0.2
Computing transition matrix based on connectivities Finish (0:00:00) Computing eigendecomposition of the transition matrix Adding.eigendecomposition
adata.uns['eig_fwd']
Finish (0:00:00) Computing Schur decomposition And suddenly the kernal apperas to have died and it ger restarted. I really appreciate your help to fix this problemThank you