ipython / ipykernel

IPython Kernel for Jupyter
https://ipykernel.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
645 stars 365 forks source link

Creation and shutdown of InProcessKernel object leaks file descriptors. #219

Open mdickinson opened 7 years ago

mdickinson commented 7 years ago

This issue is part bug report, part question.

The bug report: running the script below gives me a "Too many open files" exception on OS X 10.9. (Python 2.7.10, ipykernel 4.5.2, traitlets 4.3.1, zmq 16.0.0.) Using lsof, it looks as though every cycle of shell creation and destruction leaks two open unix sockets, and these sockets are (indirectly) kept alive until the end of the process via an atexit handler.

The question: what's the recommended way of cleaning up a (kernel_manager, kernel) pair created as below? I'm looking for something that releases all relevant system resources (threads, file handles, etc). Is there something more I can do or should be doing beyond the shutdown_kernel call?

Background: we have a large application that makes use of an in-process kernel. Several of the tests in our test suite create and destroy an IPython kernel. That leads to us running out of file handles later in the test suite; this is especially a problem on OS X, where the default number of file handles per process is 256. At the moment, we're artificially increasing that limit to get our test suite to pass (using ulimit -n 1024), but that doesn't seem like the right solution. :-)

import gc
from ipykernel.inprocess.manager import InProcessKernelManager

def create_and_destroy_shell():
    kernel_manager = InProcessKernelManager()
    kernel_manager.start_kernel()
    kernel_manager.shutdown_kernel()
    gc.collect()

# Raises "Too many open files" OSError on OS X, with the default
# process limits (<= 256 open files).
for _ in xrange(200):
    create_and_destroy_shell()
mdickinson commented 7 years ago

I should add that this appears to be a regression since ipykernel 4.3.1. (It was upgrading to the most recent IPython / Jupyter packages that caused our test suite to start failing.)

mdickinson commented 7 years ago

and these sockets are (indirectly) kept alive until the end of the process via an atexit handler

It looks as though there's more to it than this. There are atexit handlers registered for each IOPubThread, but each (kernel, kernel_manager) pair is also kept alive by the InProcessInteractiveShell.configurables attribute, which contains one CommManager object for each pair. (There's a singleton shell that's keeping all of this alive.) Oh, and there's also an atexit handler for the shell.

takluyver commented 7 years ago

The in-process kernel machinery is something which we never use ourselves, and now regret that we created. Not infrequently, complicated problems like this arise because it doesn't fit well with the rest of our architecture. Can we help you work out if you can do what you want without it?

mdickinson commented 7 years ago

Thanks for the reply; that's good to know! We're in the somewhat slow process of moving away from the in-process machinery anyway for this particular application. Based on your response, it sounds as though we should (a) speed that transition up, and (b) in the mean time, just hack in what we can to keep our test suite passing (or stick with older versions of the IPython packages).

takluyver commented 7 years ago

That sounds like a good plan. @minrk may be able to help you come up with a better fix, as the things you're talking about in ipykernel are things that he has touched recently to solve other things. But in the long run, I think avoiding in-process kernels is the best approach.

Min, do you want to try to deprecate InProcessKernelManager? I know we don't really want to support it, but there are use cases where it's not easy to recommend what people can do instead.

minrk commented 7 years ago

We could kick InProcess out to a standalone package, but the mocking it has to do covers some decidedly non-public APIs right now, so we would need to do some serious cleaning up in order for that to make a lot of sense.

Deprecating doesn't seem like the right term, since there aren't really alternatives when inprocess is what you really want. I do think it's pretty rare that it is actually desirable, as the whole point of the kernel architecture is remote execution. I don't know a decent way to communicate this stuff.

I think in general we could do a better job at tracking objects for cleanup throughout the stack, so that it's easier to setup / teardown things. This is related to #218, where the IOPub threads were left running, causing some cleanup conflicts. We should probably make it more clear and easy for destroying an InteractiveShell to teardown its history thread, and for destroying a Kernel to teardown its IOPub thread, etc.

mdickinson commented 7 years ago

For the record, in our new code that avoids the InProcessKernel we're still having similar issues. Instead of an InProcessInteractiveShell we now have a singleton ZMQInteractiveShell whose configurables trait is keeping all sorts of other things alive. Tracking down all the references to that ZMQInteractiveShell object is proving rather hard.

So it looks as though the socket leaks have more to do with the base classes and general architecture than the InProcess* classes specifically.

I have lots more details to give if they're useful. Should I open another issue, or keep using this one?

takluyver commented 7 years ago

Probably open a new one, thanks!