OS-level stdout/stderr capture

bfroehle commented 12 years ago

Consider a simple C function void say_hi() { printf("Hello World!\n"); } which you compile and call from Python via Cython, Boost.Python, ctypes, or any other means.

Calling say_hi() in the python console and plain IPython both work as desired:

>>> say_hi()
Hello World!

However in any of the multiprocess IPython interfaces (parallel, qtconsole, console, notebook) the command appears to do nothing at all:

In [3]: say_hi()

In [4]:

Of course what is actually happening is that the text is being written directly to the C stdout file descriptor so the zmq stdout redirection has no chance to grab the text.

In most cases, the easiest way to deal with this problem is to rewrite your C source to use Pythonic methods, e.g. PySys_WriteStdout:

#define printf PySys_WriteStdout

But sometimes the printf statements lie in libraries for which you have little (or no) control of the actual source code. In this case it'd be convenient for IPython to capture the OS level stdout and stderr file descriptors and send their contents in the zmq streams. This functionality was once in IPython (cf 97262e9521f384fe8f55566bdfa1b4794e55bd8b), but has not AFAIK been resurrected in the zmq era.

A similar approach is likely feasible, but there are many potential issues in implementation (e.g., ipython console might be tricky because there we need a functioning stdout for printing to the terminal!).

At a minimum I think we need to document this behavior for confused users. Beyond that it'd be great to offer a simple script, or configuration profile option which enables this OS level stdout/stderr capture.

minrk commented 12 years ago

Looking at Gael's code, it seems like we can (and should) just trap and forward the FDs during user execution.

takluyver commented 12 years ago

This link was found by Nathaniel Smith on the scipy-user mailing list: http://homepage.ntlworld.com/jonathan.deboynepollard/FGA/redirecting-standard-io.html

njsmith commented 12 years ago

Turning redirection on and off will prevent this working for any code that spawns any threads, not sure how big a deal that is these days. (E.g. does matplotlib end up running in a thread? getting plots to work in rpy2 requires threads) It might make more sense to just make sure you turn it on whenever you reset sys.std{out,err} (I assume this happens once for qtconsole, never for regular ipython-in-a-terminal-window, etc.)

Two notes on Gael's code linked above:

Not sure that it works reliably on win32, depends on what exactly win32os and msvcrt are doing with those posix-ish file descriptor manipulation calls. See the link above -- there's a good chance you need to fiddle with SetStdHandle and friends too.
The getvalue API can't possibly work. There are two problems: First, when some library writes some output, you want to see it immediately, not wait until the command has finished (esp. since one of the most common offenders here is like fortran code dumping progress messages to stdout). Second, a pipe can only buffer a limited amount of data, and then writes to it will block until after some data has been read. So in its current form, any code that writes more than some fixed amount of data to stdout (probably 4096 bytes on Linux) will simply hang silently. Reading must be interleaved with writing, and since the writers here are, by assumption, old broken legacy code, that means that reading has to be done from a thread. So this should spawn two threads that just do:

while True:
    data = os.read(pipefd, 4096)
    sys.stdout.write(data)

hannes-brt commented 10 years ago

Is this issue still under consideration? I would love to see this fixed, since I use PyCUDA and would like to use printf inside CUDA kernel functions for debugging.

stonebig commented 10 years ago

Hi all,

I miss these beautifull resume of R who fall under my dos windows with IPython 2.3 and rpy2-2.4.4 + html.py patch

Anyone has an easy solution ? I didn't understood the suggestion https://github.com/ipython/ipython/issues/3100#issuecomment-39423852

I think the problem can be solve by using capture.output() in R.

takluyver commented 10 years ago

I think this could work for capturing R output, but:

It's tricky, and there should be better ways to redirect output at the R level.
We're not sure if the techniques we know to do this will work nicely on Windows, which is where you have the problem.

So I think this is not the droid you're looking for.

njsmith commented 10 years ago

Solving this is totally doable but requires some knowledge of system programming. The idea is to use OS-level APIs to redirect stdout and stderr to pipes, and then spawn some threads which read from the other end of those pipes and dump whatever they find into the python-level sys.stdout/sys.stderr. On OSX/Linux this is something like

(stdout_read_end, stdout_write_end) = os.pipe()
os.dup2(stdout_write_end, 1)
# in a thread:
while True:
    data = os.read(stdout_read_end, 4096)
    sys.stdout.write(data)

(and similar for stderr, substituting 2 for 1 in the dup2 call). Plus some fiddling on Python 3 to handle character encoding issues (what comes out of the pipe will be bytes, and on Py3 sys.stdout wants str).

There are equivalent but different APIs on win32 -- see the ntlworld link above for details on what the above code is doing, and how to do the same on win32.

IPython should certainly be doing this as a general feature -- it's not specific to R, there are lots of old codes that write directly to the system stdout/stderr. Solving it in general would presumably solve it for R as well.

I'm not an IPython developer but my guess is that if someone shows up with a patch based on the above, that they will be grateful. (I know I will be!)

takluyver commented 10 years ago

my guess is that if someone shows up with a patch based on the above, that they will be grateful.

Probably. I have this mentally filed as "we should fix this, but complaints about it are rare, and the fix would involve quite a bit of added complexity." So I'd want a PR for this to be well engineered and well tested on different platforms.

stonebig commented 10 years ago

I don't expect a well engineered piece of software soon. Any ruberband-made solution will do for me

Something like that (but working)

%%R

   ....: X=c(1,4,5,7);Y = c(2,4,3,9)

   ....: write(summary(lm(Y~X)))

Error in cat(list(...), file, sep, fill, labels, append) : 
  argument 1 (type 'list') cannot be handled by 'cat'

stonebig commented 10 years ago

Getting clother

%%R
X=c(1,4,5,7);Y = c(2,4,3,9)
s <- capture.output({summary(lm(Y~X)) })

%R s

Out[8]:

array(['', 'Call:', 'lm(formula = Y ~ X)', '', 'Residuals:',
       '    1     2     3     4 ', ' 0.88 -0.24 -2.28  1.64 ', '',
       'Coefficients:', '            Estimate Std. Error t value Pr(>|t|)',
       '(Intercept)   0.0800     2.3000   0.035    0.975',
       'X             1.0400     0.4822   2.157    0.164', '',
       'Residual standard error: 2.088 on 2 degrees of freedom',
       'Multiple R-squared:  0.6993,\tAdjusted R-squared:  0.549 ',
       'F-statistic: 4.651 on 1 and 2 DF,  p-value: 0.1638', ''], 
      dtype='<U56')

Curiously, it's still also written to DOS console.

stonebig commented 10 years ago

clother again

a =%R s

for line in a[2:-1]:
         print(line )

lm(formula = Y ~ X)

Residuals: 1 2 3 4 0.88 -0.24 -2.28 1.64

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0800 2.3000 0.035 0.975 X 1.0400 0.4822 2.157 0.164

Residual standard error: 2.088 on 2 degrees of freedom Multiple R-squared: 0.6993, Adjusted R-squared: 0.549 F-statistic: 4.651 on 1 and 2 DF, p-value: 0.1638

njsmith commented 10 years ago

@stonebig: Comments about capture.output should probably go to #3100, because that's the bug specifically about handling output in the %R magic, and capture.output is an R-specific trick. (Basically it's the R equivalent of assigning to sys.stdout.) This bug is about implementing a more general mechanism.

stonebig commented 10 years ago

ok, redo there

vbraun commented 10 years ago

Besides non-portable (who still cares about win32), the thread solution also has a race unless you can run the thread without GIL. And I don't think so, at least not easily. Once the main thread blocks with GIL held you are toast.

IMHO the right place to grab stdout/err is in the parent process, not the python kernel process. The parent can easily redirect the output with Popen. The parent can also act freely if the buffer is full, simply select() over the spawned kernels. You need some machinery in the child process to minimize buffers and flush them out, but that part is easy.

njsmith commented 10 years ago

Supposedly, code never blocks while holding the GIL :-)

I guess the case that would break is, poorly written Python wrappers which take the GIL, then hold it while calling into some C code which dumps >4 KiB of chatter to stdout (but without using Python mechanisms -- why are you holding the GIL if you are calling code that doesn't cooperate with the Python runtime?). I suppose this is possible but I'm not sure it's worth spending much time worrying about it...

Is it really the case that IPython kernels are always spawned by some IPython monitor process who has access to their stdout/stderr, even for remote kernels etc.? I just tried running a command-line IPython interpreter here and it didn't spawn any children at all.

If the GIL is a serious issue then it wouldn't be too hard to write a Cython version of that loop which drops the GIL while running, or else spawn a monitor subprocess instead of a monitor thread.

(Not sure what you mean calling the thread solution non-portable -- you need to use some OS-specific mechanisms to set up the redirections in the first place but threads are a much easier way to get portable async IO than anything involving select. Note that windows select() does not work on pipes.)

On Fri, Oct 10, 2014 at 8:38 PM, Volker Braun notifications@github.com wrote:

Besides non-portable (who still cares about win32), the thread solution also has a race unless you can run the thread without GIL. And I don't think so, at least not easily. Once the main thread blocks with GIL held you are toast.

IMHO the right place to grab stdout/err is in the parent process, not the python kernel process. The parent can easily redirect the output with Popen. The parent can also act freely if the buffer is full, simply select() over the spawned kernels. You need some machinery in the child process to minimize buffers and flush them out, but that part is easy.

— Reply to this email directly or view it on GitHub https://github.com/ipython/ipython/issues/1230#issuecomment-58705878.

Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

takluyver commented 10 years ago

The plain IPython terminal interface doesn't use the kernel machinery at all. Where things do use a kernel, it is normally spawned as a separate process. The two cases in which that isn't true are:

You can talk to a kernel running on another machine. We have talked about having a 'nanny' process which is guaranteed to be on the same machine as the kernel, to do things like sending interrupts and this kind of OS-level stdout capture. That's not implemented yet, though.
There is machinery to run an in-process kernel on the Qt event loop. But none of our own applications use that, so we don't mind too much if it lacks some features.

vbraun commented 10 years ago

os.system('ls -al') can easily produce more than 4kb.

The problem with the GIL is not that you can't read from the fd, it is that you need the GIL to put it back into Python's own ostream system. So you'd need to have your own buffer and wait until the GIL is available. So the better solution would be to just forget about capturing Python's output streams and directly send stdout to ZMQ. At which point you might just as well move it up to the parent process, much simpler than the whole thread contraption (e.g. no fork headaches)

njsmith commented 10 years ago

os.system doesn't create any problem here. Like all builtin or stdlib functions, it releases the GIL while blocking for output. To create a problem you specifically need some buggy wrapper that generates large quantities of output while holding the GIL.

AFAIK python-level threads don't cause any problems with fork, not sure what you're referring to there. (os.fork calls PyOS_AfterFork() which cleans up defunct threads etc.)

Given that IPython currently has no standard way of arranging parent/child processes, doesn't necessarily use ZMQ, and has no way to do the bookkeeping for which output from a child process goes with which cell, I'm not really convinced that it's as simple as you think. If you think you can make it work though then I'd be happy to be proven wrong :-)

On Fri, Oct 10, 2014 at 11:29 PM, Volker Braun notifications@github.com wrote:

os.system('ls -al') can easily produce more than 4kb.

The problem with the GIL is not that you can't read from the fd, it is that you need the GIL to put it back into Python's own ostream system. So you'd need to have your own buffer and wait until the GIL is available. So the better solution would be to just forget about capturing Python's output streams and directly send stdout to ZMQ. At which point you might just as well move it up to the parent process, much simpler than the whole thread contraption (e.g. no fork headaches)

— Reply to this email directly or view it on GitHub https://github.com/ipython/ipython/issues/1230#issuecomment-58724174.

Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

vbraun commented 10 years ago

But os.system is a very easy way to get a lot of stuff into stdout, and people use it all the time. How about os.system('ls -al &'), then print a single line at the wrong time. Or somebody uses printf to debug their cython module spewing lots of output. IMHO its easy enough that it will happen.

Fork and threads can work, but there are potential mistakes that one can make and if you do its going to hurt. I'm not saying that its impossible to deal with it in a thread, but its a whole load of potential concurrency issues might crop up. Separate process is easier and, if you execute remotely, you want an independent monitor process anyways.

Of course you don't need any of that on a local terminal, but its also trivial to turn off: just don't capture output in the Popen call. Instead IPython now has system_piped vs system_raw methods etc.

njsmith commented 10 years ago

But os.system is a very easy way to get a lot of stuff into stdout, and people use it all the time.

So... I guess it's a good thing then it plays well with all the proposals here? I think I'm missing something?

How about os.system('ls -al &'), then print a single line at the wrong time.

Can't see any problems here either. If it takes a while for our thread to run then potentially the ls -al command will fill up the pipe buffer and get blocked until our thread does run and empty it again, but that's fine -- it's not a deadlock, because ls -al being blocked does not stop our Python program from eventually making progress, releasing the GIL, and letting our thread run.

Or somebody uses printf to debug their cython module spewing lots of output.

I'm not sure how many cython users prefer to wrap printf instead of just using python's far-superior printing tools (which go through sys.stdout instead of OS-level stdout), but, yes, if someone were to do this, and emit text faster than cython could release the GIL, then with the thread approach we could conceivably deadlock. It is a good point.

I'm inclined to think that this would still be better than nothing (without this feature you just can't use printf for debugging at all!), and am worried that your approach is too impractical to be implemented regardless of its advantage in this somewhat unusual case. But it is a good point.

vbraun commented 10 years ago

But once the buffer is full a single printf from a shared library will lock the main process. In a very difficult-to-debug way. All I'm saying is that the "4kb ought to be enough for everybody" argument is not valid.

Thinking a bit more about it, I forgot the most important point: If your cython code dies then you really want to be able to see the final SIGILL/Segfault/whatever. And its highly unlikely that a thread in the same process space will be able to capture that and send it anywhere before the whole process is being terminated.

njsmith commented 10 years ago

I think the SIGILL/segfault case belongs in a different bug, because the way you detect and report those has no overlap with catching stdout/stderr. Only a monitor process can see those errors at all to generate a message in the first place, so it is a reasonable argument for having some monitor process. OTOH "kernel suddenly disappeared" tells you almost as much. On 11 Oct 2014 11:34, "Volker Braun" notifications@github.com wrote:

But once the buffer is full a single printf from a shared library will lock the main process. In a very difficult-to-debug way. All I'm saying is that the "4kb ought to be enough for everybody" argument is not valid.

Thinking a bit more about it, I forgot the most important point: If your cython code dies then you really want to be able to see the final SIGILL/Segfault/whatever. And its highly unlikely that a thread in the same process space will be able to capture that and send it anywhere before the whole process is being terminated.

— Reply to this email directly or view it on GitHub https://github.com/ipython/ipython/issues/1230#issuecomment-58745404.

vbraun commented 10 years ago

I disagree, the kernel suddenly disappearing can mean lots of things from loss of network connectivity to oomkiller, many of which don't even point at a programming mistake.

Also, a monitoring process isn't going to magically know that the child died with a SIGILL. The default signal handler in the child will print useful diagnostics like "SIGILL" before dying, but to see that you need to capture stdout/err. Or the "glibc: double free". And its highly unlikely that you can trash the heap and then reliably capture SIGSEGV, pass the error from the signal handler to the messaging thread and pass it to ZMQ and live long enough for ZMQ to flush its buffers. But print something and maybe dump the core, that has a chance.

And on top of that there is a load of resource exhaustion errors that you can get from ld.so (say, "cannot create TLS data structures") that are super-useful should you hit them and don't have any way to capture in-process.

njsmith commented 10 years ago

Processes terminated by unexpected signals don't produce any output unless they've explicitly registered some signal handler to do this, and most don't because it can't be done reliably. (Similarly core dumps are generated by the kernel, they don't involve the process itself at all, because that's the only way to make them work reliably.) The sigsegv messages that you're used to seeing are generated by your shell, not the dying process - the parent process gets notified if the child exits due to a signal via the same mechanism that lets it see its exit status. There's nothing magical about it.

I still think it would be a better use of your energy to figure out how to solve the engineering challenges of your proposal than to keep coming up with elaborate edge cases to justify its necessity :-). I've already agreed that it would be a nice approach if it can be made to work at all, I'm just worried about the perfect being the enemy of the good. On 11 Oct 2014 17:04, "Volker Braun" notifications@github.com wrote:

I disagree, the kernel suddenly disappearing can mean lots of things from loss of network connectivity to oomkiller, many of which don't even point at a programming mistake.

Also, a monitoring process isn't going to magically know that the child died with a SIGILL. The default signal handler in the child will print useful diagnostics like "SIGILL" before dying, but to see that you need to capture stdout/err. Or the "glibc: double free". And its highly unlikely that you can trash the heap and then reliably capture SIGSEGV, pass the error from the signal handler to the messaging thread and pass it to ZMQ and live long enough for ZMQ to flush its buffers. But print something and maybe dump the core, that has a chance.

And on top of that there is a load of resource exhaustion errors that you can get from ld.so (say, "cannot create TLS data structures") that are super-useful should you hit them and don't have any way to capture in-process.

— Reply to this email directly or view it on GitHub https://github.com/ipython/ipython/issues/1230#issuecomment-58754444.

vbraun commented 10 years ago

Afair its perfectly fine to write() from a signal handler, you just can't printf()

I don't really see it as difficult, really its much easier than trying to capture output in-process and then feed it back into Python, then capture it again. Its mostly tedious because too much is invested in doing it the wrong way. Thats why I'm spending my energy here, to get over the "its good enough as is" energy barrier...

Implement monitor process that must be in front of every ZMQ kernel. Only alternative is plain terminal, no ZMQ, no stdout capturing.
Add another ZMQ connection from the monitor to IPython, move kernel start/interrupt functionality to monitor
Then: capture stdout, send it to IPython, have IPython interleave it with Python output.
Get rid of system_piped

miquelmarti commented 7 years ago

Is there any progress on this? Anyone got a branch somewhere with the main functionality working, even if ugly? Thanks!

takluyver commented 7 years ago

Min made a Python package wurlitzer which you can use inside IPython to capture output at the OS level. We're still planning to do something which Jupyter kernels can opt in to.

miquelmarti commented 7 years ago

Found it a little later and it's working like a charm! Thanks!!

gidden commented 7 years ago

Hi folks, does anyone know of a package similar to wurlitzer that also supports Windows users? From the best I can tell, it depends on fcntl for which there is no similar package on windows.

joewandy commented 4 years ago

@gidden I just realised too that wurlitzer doesn't work on Windows. Do you ever find the solution to that?

eudoxos commented 4 years ago

I am sorry to be adding yeat another me too type of response, though I am bumping into this problem over and over with compiled python packages. Has anyone got anything marginally usable, e.g. using wurlitzer, even if not nicely packaged?

JamieShelley commented 4 years ago

BUMP! (P.s also tried Wurlitzer, linux only!)

I've had a few ideas if anyone is interested: .For each low level function called, create an additional listener, this will create a std:string/char* that is then returned and printing. Granted this means you are blocked by the function completion and requires a very functional architecture, but it's the best I've got currently.

I'm the midst of creating a wrapper on top the low level module which will provide at least something - p.s this is really godamn annoying..

JamieShelley commented 4 years ago

As I'm using pybind, I'm trying this out first, not hoping for much: https://pybind11.readthedocs.io/en/stable/advanced/pycpp/utilities.html

ipython / ipython

OS-level stdout/stderr capture #1230