ipython / ipykernel

IPython Kernel for Jupyter
https://ipykernel.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
637 stars 362 forks source link

Capture output coming from C and C++ libraries #110

Open dpiparo opened 8 years ago

dpiparo commented 8 years ago

Hi,

thanks for this excellent Kernel! This is already a known behaviour: the output that is printed on screen by C/C++ libraries is not captured in the notebook: this can happen with ctypes or with Python bindings for C++ functions. It would be terrific to have this output both captured and correctly interleaved with the output coming from Python and then printed "progressively" in the notebook. For example, this loop could print every .5 secs a line from the Python and C world, interleaved:

from ctypes import * libc = CDLL("libc.dylib") import time for i in xrange(10): print "This is Python" libc.printf("This is C\n") time.sleep(.5)

At CERN we solved this issue for the ROOT Kernel in a custom way. For Python, we converged on a partial solution, which allows to correctly interleave the outputs but not to print them "live" in an asynchronous way.

I wonder if a general solution to this rather common issue, at least in HEP, could be envisaged, e.g. a "baby sitting process" capturing the output.

Cheers, Danilo

takluyver commented 8 years ago

(@dpiparo contacted me by email, and I suggested he open an issue so that the 'nanny process' idea we've been thinking of for a while gets back on our radar)

dpiparo commented 8 years ago

@takluyver , thanks for the clarification.

minrk commented 8 years ago

I made this package based on what we learned making the Cling kernel, so right now you can capture any C output in a notebook with:

with wurlitzer.sys_pipes():
    run_c_code()
dpiparo commented 8 years ago

I confirm Wurlitzer works great: thanks for sharing it! It successfully captures all output w/o freezing and is very easy to use. Technically one could think of having it enabled by default hooked on pre and post_execute.

The limitation I see, as Wurlitzer is today, is that the printing of the output can happen only at the end of the execution of the code/cell. In presence of long calculations the user may get the impression that the system is stuck even if the library underneath is trying to print some sort of progress indicator.

Another improvement that one might wish is the ability to properly "interleave" output from C(++) and Python by default.

takluyver commented 8 years ago

Looking at the code, I think Wurlitzer should be capable of forwarding output 'live', without waiting to the end of the context. It's using a thread to pull from the read end of the pipe and forward it to Python sys.std*.

minrk commented 8 years ago

@dpiparo it should print outputs as they arrive, as long as you are using ipykernel >= 4.3.

The only impediment I see to integrating wurlitzer into IPython by default is how reliable it is on different platforms.

minrk commented 8 years ago

Since it is making ctypes calls, I could certainly imagine it causing crashes on weird systems. And I just assume it doesn't work on Windows.

takluyver commented 8 years ago

I'm pretty sure there's no dup2 on Windows, so there's probably not a practical way to do this in process there.

dpiparo commented 8 years ago

On the other hand, if this would get integrated, on linux/mac would add many nice features while windows would stay as it is. I am not sure though how similar cases were treated in the past within the Jupyter project.

minrk commented 8 years ago

We have plenty of things that only work on certain platforms (subprocesses behave nicer on non-Windows due to pexpect, and web terminals only work on non-Windows, also due to using pexpect), so I don't see that as a big issue.

The only thing in between me and proposing that we use it on non-Windows is the stability question. I don't know enough to say if/when it won't work, even on posixy systems.

takluyver commented 8 years ago

I don't think we should integrate that in the Python kernel - most stuff works well enough without it, and now there is an easy solution we can point people to when they do need it. I think the longer term answer to this is to get round to the nanny process we've talked about for a long time, because that lets us capture std streams at the OS level on all platforms without needing ctypes or anything.

Maybe this is something we can discuss at the dev meeting in a couple of weeks.

minrk commented 8 years ago

Sure.

tomerk commented 5 years ago

Hi, this is a pretty big issue for all Tensorflow users (because print operators run in the C++ internals) https://github.com/tensorflow/community/pull/14

Is it possible to revisit / reprioritize this?