takluyver / jupyter_kernel_mgmt

Experimental new kernel management framework for Jupyter
https://jupyter-kernel-mgmt.readthedocs.io/en/latest/
Other
15 stars 8 forks source link

[WIP] Fix kernelapp #41

Closed kevin-bates closed 4 years ago

kevin-bates commented 4 years ago

These changes update kernelapp to be closer to what they should be (tests passes), but the application still doesn't work correctly.

Problem areas:

  1. The connection file is no longer on the kernel manager instance, and the launch method returns the connection information (not a file), so logging the connection file name along with a hit to use --existing isn't applicable. I have just dumped the connection info, but that's probably a security violation now that I thin about ti.
  2. Need to run some async methods using the run_sync() utility method. There are probably issues in this area.
  3. Event loop already running issues...

When I run python -c "from jupyter_kernel_mgmt.kernelapp import main; main()" from the command line, I see the following (which seems correct)...

[KernelApp] Starting kernel 'pyimport/kernel'
[KernelApp] Connection information: {'shell_port': 56018, 'iopub_port': 56019, 'stdin_port': 56020, 'control_port': 56021, 'hb_port': 56022, 'ip': '127.0.0.1', 'key': '37575c3f-e6744a55aecfd4a3f4bccbd7', 'transport': 'tcp', 'signature_scheme': 'hmac-sha256'}

I then locate the process id (essentially simulating what the test is doing) and issue kill pid, then get the following...

[KernelApp] Shutting down on signal 15
ERROR:tornado.application:Exception in callback functools.partial(<bound method KernelApp.shutdown of <jupyter_kernel_mgmt.kernelapp.KernelApp object at 0x10c876588>>, 15)
Traceback (most recent call last):
  File "/opt/anaconda3/envs/kernel-mgmt-dev/lib/python3.6/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/Users/kbates/repos/oss/gateway-experiments/jupyter_kernel_mgmt/jupyter_kernel_mgmt/kernelapp.py", line 52, in shutdown
    client.wait_for_ready()
  File "/Users/kbates/repos/oss/gateway-experiments/jupyter_kernel_mgmt/jupyter_kernel_mgmt/client.py", line 407, in wait_for_ready
    loop.run_sync(lambda: self.loop_client.wait_for_ready(), timeout=timeout)
  File "/opt/anaconda3/envs/kernel-mgmt-dev/lib/python3.6/site-packages/tornado/ioloop.py", line 526, in run_sync
    self.start()
  File "/opt/anaconda3/envs/kernel-mgmt-dev/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 148, in start
    self.asyncio_loop.run_forever()
  File "/opt/anaconda3/envs/kernel-mgmt-dev/lib/python3.6/asyncio/base_events.py", line 409, in run_forever
    raise RuntimeError('This event loop is already running')
RuntimeError: This event loop is already running
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.

So although the test "passes", these issues are masked.

Fixes: #40

takluyver commented 4 years ago

Whew, OK.

Connection files: Maybe we need to think more about what connection files are and what parts are responsible for them. At the moment, they're used both to tell the kernel what ports it should bind to (provider/launcher responsibility), and to tell other applications what ports they should connect to (application responsibility?). Using the same connection file for both may not work if the kernel is launched somewhere remote. As a later step, I also want to move away from the launcher telling the kernel what ports to use.

Maybe the solution/workaround for now is for the application to write the connection info to a second connection file for the --existing information, separate from the one the provider creates to launch the kernel?

Sync & async: I think the issue is that we're using an awkward jumble of synchronous and asynchronous APIs (which is my fault). The application runs its own event loop, but then tries to use a bunch of synchronous APIs, which are now wrappers around async ones. It should either use the async APIs inside its event loop, or stick to synchronous code.

kevin-bates commented 4 years ago

I'm okay creating a second file. I think the test might require adjustment since I believe it assumes only one kernel-xxx.json file exists in runtime dir, but that's solvable. Also I'm a little weary about introducing another occurrence of secure_write given the plethora of issues that have cropped up since its creation - primarily down to Windows and NFS issues at this point.

I'll try to spend some "quality time" on the async issues, although I'm essentially on holiday through the New Year (Happy New Year! :tada:).

kevin-bates commented 4 years ago

Closing in favor of #42. (It's also the answer to the universe!)