Closed Marius1311 closed 3 years ago
Which multiprocessing backed are you using (e.g. default is loky
). Could try with backend='threading'
.
Could you also try with e.g.
n_jobs=2
?
Also, do you have longer error log? The one above does not really help.
Updates:
6 jobs gives
---------------------------------------------------------------------------
TerminatedWorkerError Traceback (most recent call last)
<ipython-input-31-fb9aa4c52d9e> in <module>
----> 1 g_fwd.compute_absorption_probabilities(use_petsc=True, solver='gmres', n_jobs=6)
~/Projects/cellrank/cellrank/tl/estimators/_base_estimator.py in compute_absorption_probabilities(self, keys, check_irred, solver, use_petsc, time_to_absorption, n_jobs, backend, show_progress_bar, tol, preconditioner)
479
480 # solve the linear system of equations
--> 481 mat_x = _solve_lin_system(
482 q,
483 s,
~/Projects/cellrank/cellrank/tl/_linear_solver.py in _solve_lin_system(mat_a, mat_b, solver, use_petsc, preconditioner, n_jobs, backend, tol, use_eye, show_progress_bar)
463
464 # can't pass PETSc matrix - not pickleable
--> 465 mat_x, n_converged = parallelize(
466 _solve_many_sparse_problems_petsc,
467 mat_b,
~/Projects/cellrank/cellrank/ul/_parallelize.py in wrapper(*args, **kwargs)
99 pbar, queue, thread = None, None, None
100
--> 101 res = jl.Parallel(n_jobs=n_jobs, backend=backend)(
102 jl.delayed(callback)(
103 *((i, cs) if use_ixs else (cs,)),
~/miniconda3/envs/cellrank_revision/lib/python3.8/site-packages/joblib/parallel.py in __call__(self, iterable)
1052
1053 with self._backend.retrieval_context():
-> 1054 self.retrieve()
1055 # Make sure that we get a last message telling us we are done
1056 elapsed_time = time.time() - self._start_time
~/miniconda3/envs/cellrank_revision/lib/python3.8/site-packages/joblib/parallel.py in retrieve(self)
931 try:
932 if getattr(self._backend, 'supports_timeout', False):
--> 933 self._output.extend(job.get(timeout=self.timeout))
934 else:
935 self._output.extend(job.get())
~/miniconda3/envs/cellrank_revision/lib/python3.8/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
540 AsyncResults.get from multiprocessing."""
541 try:
--> 542 return future.result(timeout=timeout)
543 except CfTimeoutError as e:
544 raise TimeoutError from e
~/miniconda3/envs/cellrank_revision/lib/python3.8/concurrent/futures/_base.py in result(self, timeout)
437 raise CancelledError()
438 elif self._state == FINISHED:
--> 439 return self.__get_result()
440 else:
441 raise TimeoutError()
~/miniconda3/envs/cellrank_revision/lib/python3.8/concurrent/futures/_base.py in __get_result(self)
386 def __get_result(self):
387 if self._exception:
--> 388 raise self._exception
389 else:
390 return self._result
TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.
The exit codes of the workers are {SIGABRT(-6), SIGABRT(-6), SIGABRT(-6), SIGABRT(-6), SIGABRT(-6)}
Oh, after that error, also 2 jobs fails. Let me restart my kernel and try again.
Okay, after restarting the kernel, it does work with 6 jobs.
For 8 jobs, it still fails, longer error log below
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Couldn't close file
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Couldn't close file
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
application called MPI_Abort(MPI_COMM_WORLD, 50162059) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=50162059
:
system msg for write_line failure : Bad file descriptor
/Users/marius/miniconda3/envs/cellrank_revision/lib/python3.8/site-packages/joblib/externals/loky/backend/resource_tracker.py:318: UserWarning: resource_tracker: There appear to be 2 leaked file objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/Users/marius/miniconda3/envs/cellrank_revision/lib/python3.8/site-packages/joblib/externals/loky/backend/resource_tracker.py:318: UserWarning: resource_tracker: There appear to be 8 leaked semlock objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/Users/marius/miniconda3/envs/cellrank_revision/lib/python3.8/site-packages/joblib/externals/loky/backend/resource_tracker.py:318: UserWarning: resource_tracker: There appear to be 2 leaked folder objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/Users/marius/miniconda3/envs/cellrank_revision/lib/python3.8/site-packages/joblib/externals/loky/backend/resource_tracker.py:333: UserWarning: resource_tracker: /var/folders/mx/0hyv8t2s26jdj79f55kvc_b80000gn/T/joblib_memmapping_folder_5593_40c65181fc744a9ca57ec0230f7941dd_2713869d33424586bc1f571723ca1820: FileNotFoundError(2, 'No such file or directory')
warnings.warn('resource_tracker: %s: %r' % (name, e))
[I 10:11:57.137 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
WARNING:root:kernel 04d152c9-f0a9-4298-8f39-1a8425757e3b restarted
Using backend='threading
with 8 jobs as in g_fwd.compute_absorption_probabilities(use_petsc=True, solver='gmres', n_jobs=8, backend='threading')
works!
I'm closing this, I think this problem is specific to my machine.
Hello, I've been using CellRank recently, and have been running into this same error, so far only when running cr.tl.lineages()
with the parameter backwards=True
, as well as with cr.tl.initial_states()
. I'm using the most recent version of CellRank (v1.1.0)
Hi @BioFalcon, the most recent version is 1.2, could you update and try again? If that doesn't help, could you try passing backend='threading'
? The above is a parallelisation issue, if nothing else helps, you can probably resolve this by turning off parallelisation, i.e. setting n_jobs=1
.
Hi, I tried doing all the above and still getting the error. Somehow, there is output being inserted into the adata object, but I was just wondering if this might impact downstream analyses.
Hi @BioFalcon, I will reopen this issue for you and @michalk8 will look into this with you - however, that will take until next week as it's exam season at the moment, sorry about that!
@BioFalcon, can you please post the error you are getting when you run the function without parallelisation, i.e. cr.tl.lineages(n_jobs=1)
?
I'm assuming that this issue has been solved.
Hi,
I get the same issue. tried setting all parameters as discussed here:
cr.tl.lineages(adata,backward=False,n_jobs=1, solver='gmres',backend='threading')
My jupyter kernel still crashes.
However, I receive the following warning:
Computing absorption probabilities
WARNING: There is only `1` terminal state, all cells will have probability `1` of going there
as a workaround, I tried to use the workaround with kernels and estimators, to no avail:
from cellrank.tl.kernels import VelocityKernel
vk = VelocityKernel(adata)
vk.compute_transition_matrix()
from cellrank.tl.kernels import ConnectivityKernel
ck = ConnectivityKernel(adata).compute_transition_matrix()
combined_kernel = 0.8 vk + 0.2 ck
from cellrank.tl.estimators import GPCCA
g = GPCCA(combined_kernel) print(g)
g.compute_schur(n_components=5) g.plot_spectrum()
Always crashing at the step, where the shur decomposition is calculated.
- This is the output of cellrank.logging.print_versions():
cellrank==1.5.1 scanpy==1.9.1 anndata==0.8.0 numpy==1.21.2 numba==0.56.3 scipy==1.9.3 pandas==1.5.1 pygpcca==1.0.4 scikit-learn==1.1.3 statsmodels==0.13.5 python-igraph==0.10.2 scvelo==0.2.4 pygam==0.8.0 matplotlib==3.6.2 seaborn==0.11.2
Thanks!
Nevermind, reinstalling environment seemed to help
When computing absorption probabilities on the lung using
g_fwd.compute_absorption_probabilities(use_petsc=True, solver='gmres', n_jobs=8)
, my kernel dies and I get the following error message in the terminal:Versions:
cellrank==1.1.0+gb36eac8 scanpy==1.6.0 anndata==0.7.4 numpy==1.19.5 numba==0.52.0 scipy==1.5.3 pandas==1.1.3 scikit-learn==0.23.2 statsmodels==0.12.0 python-igraph==0.8.3 scvelo==0.2.2 pygam==0.8.0 matplotlib==3.2.2 seaborn==0.11.0
Update 1: Running just
g_fwd.compute_absorption_probabilities(n_jobs=8)
works fine.Update 2: Using just a single core, i.e.
g_fwd.compute_absorption_probabilities(use_petsc=True, solver='gmres', n_jobs=1)
also works fine.