Open palewire opened 4 years ago
Hmm I wonder if this is related to some of the async stuff that @davidbrochart had worked on in the last few PRs?
@palewire could you provide an example of a notebook or code snippet that reliably creates the problem?
I am unable to replicate the bug, which we've only seen so far on Macbooks running Python 3.8 installed via Homebrew. I'm a neckbeard who uses Ubuntu 16.04, and I'm not seeing any bugs.
Here's basically what's happening in the code. We have this function to run notebooks via nbclient:
def _execute_notebook(name, path):
"""
Private method to execute the provided notebook and handle errors.
"""
input_path = f"{name}.ipynb"
output_path = f"{name}-output.ipynb"
with open(input_path) as f:
nb = nbformat.read(f, as_version=4)
client = NotebookClient(
nb,
timeout=600,
kernel_name='python3',
allow_errors=False,
force_raise_errors=True,
resources={'metadata': {'path': path}}
)
try:
client.execute()
except CellExecutionError:
out = None
msg = f'Error executing the notebook "{input_path}".\n\n'
msg += f'See notebook "{input_path}" for the traceback.'
print(msg)
raise
finally:
with open(output_path, mode='w', encoding='utf-8') as f:
nbformat.write(nb, f)
Then we feed notebook paths into the function from a list. It's something like this:
def run():
print("Running notebooks")
notebook_list = [
"notebook-1",
"notebook-2",
"notebook-3",
"notebook-4",
]
for notebook in notebook_list:
notebook_filename = f'./_notebooks/{notebook}'
print(f"- {notebook_filename}.ipynb")
_execute_notebook(
notebook_filename,
path='_notebooks/'
)
Hmm I'm unlikely to be able to help because I have an older ubuntu 18 and a new ubuntu 20 machine as well. Though maybe setting the ulimit lower and running might reproduce since OSX has a lot lower default limit. I can give it a try later in the week if someone else hasn't narrowed it down. Is the error being seen consistent or sporadic? Is it after a few notebooks have run or on the first one? Might be we're not cleaning up file handles somewhere in the dependency chain.
As @MSeal suggested, I could reproduce the bug on Ubuntu 20.04 with python3.8 by setting ulimit -n 16
, although it crashed in the shell channel creation (in ZMQ) instead of the heart beat channel (in asyncio.new_event_loop
). But both are caused by a new socket creation, so increasing the ulimit might solve the issue, provided that we are not leaking socket allocation.
One thing we could do is start running tests on an osx (or even windows) build on github actions. At least once we've got the bug reproducible in test form.
Good point, sounds like a good idea in any case.
This error was encountered by three different users on three different Macbooks, all running the same code.
I am guessing a ulimit of 256, 512, or 1024 is more accurate for a given Mac. From reading up the default and max ulimits have shifted around in this range on Mac depending on the OS version and the available RAM.
This is a recurrent issue, with e.g. multiple kernels in JLab.
Likely related to the comments near the end of this thread: https://github.com/jupyter/jupyter_client/pull/548
yep got another occurence in executablebooks/jupyter-book#867
Some users following a pattern similar to the one described in #48 are getting new errors after upgrading to 0.3. When they downgrade to 0.2, the errors are gone. Here's a sample traceback.