Closed franzoni315 closed 5 years ago
Hi @franzoni315
So it looks like there's race conditions in the ipython kernel launching parallel processes. Using a threadpool instead get's to run more often without hanging but any high parallelism doesn't beat the race conditions. I ran a few times under different conditions and eventually got
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/mseal/.py2local/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/home/mseal/.py2local/local/lib/python2.7/site-packages/traitlets/config/application.py", line 657, in launch_instance
app.initialize(argv)
File "<decorator-gen-121>", line 2, in initialize
File "/home/mseal/.py2local/local/lib/python2.7/site-packages/traitlets/config/application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "/home/mseal/.py2local/local/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 467, in initialize
self.init_sockets()
File "/home/mseal/.py2local/local/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 239, in init_sockets
self.shell_port = self._bind_socket(self.shell_socket, self.shell_port)
File "/home/mseal/.py2local/local/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 181, in _bind_socket
s.bind("tcp://%s:%i" % (self.ip, port))
File "zmq/backend/cython/socket.pyx", line 547, in zmq.backend.cython.socket.Socket.bind
File "zmq/backend/cython/checkrc.pxd", line 25, in zmq.backend.cython.checkrc._check_rc
raise ZMQError(errno)
ZMQError: Address already in use
And
Traceback (most recent call last):
File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/home/mseal/.py2local/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3231, in atexit_operations
self.history_manager.end_session()
File "/home/mseal/.py2local/local/lib/python2.7/site-packages/IPython/core/history.py", line 580, in end_session
self.writeout_cache()
File "<decorator-gen-23>", line 2, in writeout_cache
File "/home/mseal/.py2local/local/lib/python2.7/site-packages/IPython/core/history.py", line 60, in needs_sqlite
return f(self, *a, **kw)
File "/home/mseal/.py2local/local/lib/python2.7/site-packages/IPython/core/history.py", line 786, in writeout_cache
self._writeout_input_cache(conn)
File "/home/mseal/.py2local/local/lib/python2.7/site-packages/IPython/core/history.py", line 770, in _writeout_input_cache
(self.session_number,)+line)
DatabaseError: database disk image is malformed
I also noticed the race conditions onexit occur every run and are causing session saves to fail (nbd but points to the reuse of session_number
which overlaps).
I can also reproduce this failure with a simple bash for loop over papermill. I'll open up a ticket on the ipython project to figure out what the root cause is and see if there's a change in papermill that would fix this.
FWIW, this problem also affects papermill on jupyter in python 3.
Wondering: what is the status for this issue? I can confirm that this problem is still present in Python3 when using Papermill. Will multiprocessing become doable with Papermill?
It will become doable -- we need a release for multiple upstream libraries and there's still one pending PR testing an edge case we haven't fixed for one of those releases. Give the community a couple weeks more here, there's a lot of moving parts and it's been unsupported for a long time in the upstream projects. You can pay attention to nbconvert 5.5.1 release announcement on Discourse and on the jupyter mailing list. That will be the last release to get it resolved.
Hi guys, I'm having same problem with somewhat unpredictable:
RuntimeError: Kernel didn't respond in 60 seconds
.
I updated nbconvert
from GitHub according to:
Disable IPython History in executing preprocessor #1017
...but still bump into same problem
I also tried solution: Papermill on HPC/Dask #364 ...but in this case some of the packages(ta-lib for example) I use rase error when running
Are you running papermill in a concurrent setting (this isn't supported in upstream libraries yet)? If not, what kernel are you launching. Maybe it's slow to start up?
@MSeal kernel: python 3.6, linux I don't even know what 'concurrent setting' is, sorry. Is there a simple way to check (even a link for me to dive in would be more than appreciated)
hm..., and maybe you're right about being to slow. Can I increase this 60s limit somewhere?
By python 3.6 I am assuming you mean your kernel is an ipython kernel in python 3.6? Your kernel and papermill processes don't necessarily share a python version.
--start_timeout <num_seconds_to_wait>
Checking for concurrent setting means, are you launching papermill from inside a thread or multi-processing setup.
@MSeal thank you for clarification.
I use ipython kernel then. conda environment was created with specifying python=3.6
, and I use JupyterLab (if that provides any clarity).
Papermill is run by schedule, and the following async code:
async def myCoroutine():
while Stop==False:
schedule.run_pending()
await asyncio.sleep(1)
asyncio.run_coroutine_threadsafe(myCoroutine(), asyncio.get_event_loop())
I might have understood it totally wrong, but from my understanding this is concurrent setting, right? And as such you're saying "this isn't supported in upstream libraries yet", so if I'm understanding it correctly - for the time being I cannot do anything. Am I correct? And I apologize in advance for my lack of coding knowledge/experiences.
Yes. Having notebooks launched from a coroutine means there can be concurrent executions, so it's the same issue described above. The good news is that all of the known issues are PR'd or merged pending a release to fix this problem, so it should be fixed in the next few weeks.
@MSeal thank you very much for clarification. You've been very helpful.
This base issue should now be resolved with the nbconvert 5.6.0 release!
Hi, I might just be overlooking something, but I think I'm still experiencing this issue even after upgrading nbconvert. It seems to be an upstream issue with nbconvert, because I get the same issues when calling the execute API directly. Let me know if I should migrate this question to that repository.
To replicate:
import multiprocessing as mp
import nbconvert
assert "5.6." in nbconvert.__version__
from nbconvert.preprocessors import ExecutePreprocessor
import nbformat
import os
import papermill as pm
def run_pm(fn):
pm.execute_notebook(fn, fn, request_save_on_cell_execute = False)
def run(fn):
with open(fn) as f:
nb = nbformat.read(f, as_version = 4)
ep = ExecutePreprocessor(timeout = None, kernel_name = "python3")
ep.startup_timeout = 300
ep.preprocess(nb, {"metadata": {"path": os.getcwd() + "/"}})
with open(fn, "w", encoding = "utf-8") as f:
nbformat.write(nb, f)
fn = "test.ipynb"
test.ipynb
has a single cell that prints the word "testing". The following works fine:
run_pm(fn)
run(fn)
But the following two code snippets each break
pool = mp.Pool(1)
pool.map(run_pm, [fn])
pool.close()
pool.join()
pool = mp.Pool(1)
pool.map(run, [fn])
pool.close()
pool.join()
with error code RuntimeError: Kernel didn't respond in 60 seconds
in the first case and RuntimeError: Kernel didn't respond in 300 seconds
in the second.
I'm using Python 3.7. I've been able to replicate this with both nbconvert 5.6.0 and 5.6.1.
Thanks!
What are your versions of ipython, jupyter_client, and jupyter_core in your environment? And how are you running the two Pool snippets at the same time? If you have threads above the mp calls it will break at a C level because of https://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them.
I ran the two Pool snippets in serial, not concurrently. Sorry for the confusion; I've updated my comment to (I hope) reflect that.
My Python 3 package manager lists ipython 7.3.0, jupyter-client 5.2.4 and jupyter-core 4.4.0.
Sorry for the very late reply, catching up from the holidays.
I believe the issue is you have an old jupyer_client / jupyter_core versions.
Upgrade those to 5.3.4
and 4.6.1
respectfully and the error should go away.
Resolved, thank you!
Hello, I am trying to run multiple parameterized notebooks in parallel. Currently, I am using papermill inside Jupyter Notebook and if I try to use multiprocessing pool to map a list of parameters as pass them to
pm.execute_notebook
, I getRuntimeError: Kernel didn't respond in 60 seconds
. I am running everything with Python 2.7.This is the code I use:
It works correctly using the standard python
map
.Btw, is there a known way to produce multiple notebooks in parallel with papermill?
Thanks!