theislab / cellrank

CellRank: dynamics from multi-view single-cell data
https://cellrank.org
BSD 3-Clause "New" or "Revised" License
341 stars 47 forks source link

PETSC error at cr.tl.initial_states #588

Closed mehrankr closed 3 years ago

mehrankr commented 3 years ago

I installed cellrank in a new environment in python3.8 using

conda install -c conda-forge -c bioconda cellrank-krylov

I think the recipe needs to be updated to require the latest networkx otherwise paga compatibility breaks with matplotlib error

This installs cellrank 1.3.1 currently and in some of the scvelo and cellrank functions, particularly

cr.tl.initial_states(adata, cluster_key='Cluster', n_jobs=1)

I get the following error:

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18194/18194 [00:19<00:00, 948.84cell/s]
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18194/18194 [00:16<00:00, 1134.85cell/s]
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
WARNING: For 1 macrostate, stationary distribution is computed

This used to happen for:

cr.tl.terminal_states(
            adata, cluster_key='Cluster', weight_connectivities=0.2)

But after changing it to:

cr.tl.terminal_states(
            adata, cluster_key='Cluster', weight_connectivities=0.2,
            model="monte_carlo",
            n_jobs=1, method='brandts', n_states=2)

didn't happen any more.

Very surprisingly, the same issue rises some times (not always) when running:

scv.tl.recover_dynamics(adata, n_jobs=1, n_top_genes=1000)

and

scv.tl.velocity(adata, mode='dynamical')

Versions:

cellrank==1.3.1 scanpy==1.7.2 anndata==0.7.6 numpy==1.20.2 numba==0.53.1 scipy==1.6.3 pandas==1.2.4 pygpcca==1.0.2 scikit-learn==0.24.2 statsmodels==0.12.2 python-igraph==0.9.1 scvelo==0.2.3 pygam==0.8.0 matplotlib==3.4.2 seaborn==0.11.1

michalk8 commented 3 years ago

Hi @mehrankr

I believe this is the same issue as in https://github.com/theislab/cellrank/issues/473 (not sure why, but in some cases, PETSc parallelization doens't play nicely with they way we parallelize [by default through processes]). Usually, changing the backend to cr.tl.initial_states(adata, cluster_key='Cluster', n_jobs=1, backend='threading') worked, so I'd try this first.

cr.tl.initial_states(adata, cluster_key='Cluster', n_jobs=1)

Hmm, this should not really happen, esp. for n_jobs=1 (based on #473, this should be fine).

scv.tl.velocity(adata, mode='dynamical') scv.tl.recover_dynamics(adata, n_jobs=1, n_top_genes=1000)

Very strange, since scvelo doesn't use PETSc; only in 0.2.3, the parallelization was added that we're using here (I assume PETSc has been loaded through cellrank). I will take a closer look at this function to look for problematic parts.

But after changing it to: ...

This is expected, since method='brands' is using scipy under the hood, not PETSc, to get the Schur vectors.

Marius1311 commented 3 years ago

Hi @mehrankr, did these tips help you with your problem already?

mehrankr commented 3 years ago

Unfortunately no, I'm still getting the same error:

In [337]:         cr.tl.initial_states(
     ...:             adata, cluster_key='Cluster', n_jobs=1,
     ...:                 backend='threading')
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18194/18194 [00:22<00:00, 808.30cell/s]
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18194/18194 [00:18<00:00, 959.66cell/s]
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
WARNING: For 1 macrostate, stationary distribution is computed
WARNING: The following states could not be mapped uniquely: `['lup_1']`
Marius1311 commented 3 years ago

mhm, @michalk8 , could you look into this please?

mehrankr commented 3 years ago

Thanks for following up. Send me an email and we can arrange for passing you the loom file if needed: mkarimzadeh@vectorinstitute.ai

michalk8 commented 3 years ago

Hi @mehrankr ,

just to be completely sure, does the code above https://github.com/theislab/cellrank/issues/588#issuecomment-842428157 actually raise some Python exception (or crashes the ipykernel), or does it simply print out to the error to the console? Because it seems that it just prints the [0]PETSC ERROR right after the progress bar (where joblib does its parallelization) and it seems to have succesfully computed stationary distribution and mapped the cluster label (the 2nd warning regarding lup_1 comes from this call, which is after the stationary dist. has been computed [and therefore after any PETSc usage]).

If it crashes/raises an exception, I will ping you over the email for the data. Lastly, could you please print the output of the following command?

python -c "import petsc4py; import slepc4py; print(petsc4py.__version__); print(slepc4py.__version__)"
mehrankr commented 3 years ago

Hi @michalk8,

It doesn't crash actually. It simply prints the message out. As long as you can confirm this warning hasn't affected any of the processes and doesn't affect the results, I think we can close this.

The output is:

python -c "import petsc4py; import slepc4py; print(petsc4py.__version__); print(slepc4py.__version__)"
3.15.0
3.15.0
michalk8 commented 3 years ago

It doesn't crash actually. It simply prints the message out.

Thanks for confirming this. I can see the same error in our CI, as well as in jupyter's log, i.e. the code below:

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed()
cr.tl.terminal_states(adata)
cr.tl.lineages(adata, n_jobs=1, backend='threading')

produces: petsc_pipe , and the results are unaffected. I see it printed to the console if using just ipykernel: petsc_error_2

As long as it doesn't throw an error/crashed the kernel as in #473, it should be fine.

Marius1311 commented 3 years ago

I'm closing, as I think you guys figures out that this is not critical.