Closed bphunter1972 closed 3 years ago
I believe this might turn out to be the same as #20. I can reproduce it with this script:
import ctypes
import wurlitzer
libc = ctypes.CDLL(None)
def main():
sz = 64000
while True:
sz = sz + 100
print(sz)
buf = b'1' * sz
with wurlitzer.pipes(stderr=False) as (stdout, _):
libc.printf(b"%s\n", buf)
print(len(stdout.read()))
if __name__ == "__main__":
main()
and you're right, it occurs when the buffer exceeds 64k (65536 bytes). The block occurs in stdout_w.close()
here which hangs if the read pipe is full. Looks like we need to add an intermediate consumer on the reader side to flush the read pipe.
It should work if you use io.StringIO
:
import io
with pipes(stdout=io.StringIO(), stderr=io.StringIO()) as (stdout, stderr):
result = c_function(args)
# seek(0) resets cursor to beginning, so we can read what was written
stdout = stdout.seek(0)
lines = stdout.readlines()
...
Should be fixed by #49, which does the above by default.
On further investigation, it is related to #20, but not quite the same. Both are due to hangs on full pipes, but #20 is a hang on the dup2
pipe, while this one is on the capture pipe. This one is easy to fix (#49), while #20 is hard to avoid other than saying "don't write too much to stdout while holding the GIL" because we can't read from pipes without acquiring the GIL (unless we rewrite wurlitzer in C).
Excellent. I'll need to wait for the 3.0.0 release to check it out.
Just released 3.0, let me know how it goes!
I tried running it this way, but I'm still seeing the same behavior when it gets too many bytes.
out = StringIO()
with pipes(stdout=out, stderr=STDOUT):
result = c_func(args)
for line in out.readlines():
print(line)
Is there a limit that you programmed in that might be configurable?
Do you have an example you can extract? I've tested with code that produces multiple megabytes of output and it works. It's still possible you are hitting #20 (i.e. your program holds the GIL while producing low-level output), in which case there is an upper limit of the pipe buffer size. #51 should try to bump this to the system max pipe size, but I don't know on how many systems this works.
Can you tell me what you get for wurlitzer._get_max_pipe_size()
with 3.0?
Extracting an example could be difficult, unfortunately.
However, I estimate the size of the output when not using wurlitzer to be about 3.2M. Meanwhile, _get_max_pipe_size()
gives me a value of 1,048,576.
Also F_SETPIPE_SZ
is, unfortunately, not available. I'm not sure why since I've got:
▶ uname -srm
Linux 3.10.0-1062.1.2.el7.x86_64 x86_64
▶ python3.8
Python 3.8.11 (default, Jul 9 2021, 16:02:07)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fcntl
>>> dir(fcntl)
['DN_ACCESS', 'DN_ATTRIB', 'DN_CREATE', 'DN_DELETE', 'DN_MODIFY', 'DN_MULTISHOT', 'DN_RENAME', 'FASYNC', 'FD_CLOEXEC', 'F_DUPFD', 'F_DUPFD_CLOEXEC', 'F_EXLCK', 'F_GETFD', 'F_GETFL', 'F_GETLEASE', 'F_GETLK', 'F_GETLK64', 'F_GETOWN', 'F_GETSIG', 'F_NOTIFY', 'F_RDLCK', 'F_SETFD', 'F_SETFL', 'F_SETLEASE', 'F_SETLK', 'F_SETLK64', 'F_SETLKW', 'F_SETLKW64', 'F_SETOWN', 'F_SETSIG', 'F_SHLCK', 'F_UNLCK', 'F_WRLCK', 'LOCK_EX', 'LOCK_MAND', 'LOCK_NB', 'LOCK_READ', 'LOCK_RW', 'LOCK_SH', 'LOCK_UN', 'LOCK_WRITE', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'fcntl', 'flock', 'ioctl', 'lockf']
Also F_SETPIPE_SZ is, unfortunately, not available.
Python fcntl doesn't bother to expose it, but you can still use it as hardcoded F_SETPIPE_SZ = 1031
. Check cat /proc/sys/fs/pipe-max-size
to see if you have a limit higher than 1MB (doesn't seem to be common). If so, you can try with pipes(bufsize=5_000_000):
If your C code does hold the GIL the whole time it is writing all of this data, I don't think there is much we can really do, unless we introduce intermediate files to store an arbitrary-length buffer. The code has to yield some time to give the Python thread a chance to consume the pipe.
Well, this is interesting:
>>> import sys, fcntl
>>> fcntl.fcntl(sys.stdout.fileno(), 1031, 5_000_000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 9] Bad file descriptor
This also fails with a 1 instead. I've also tried setting the bufsize argument to pipes, using StringIO, etc.
I can see that in https://github.com/minrk/wurlitzer/commit/ff1ec5fedd092b8a62c552b61fb9fade2863324f you are ignoring this OSError. Have you seen it work? Probably I'm doing something wrong. I was confused by why you're setting it for pipe_in, but not understanding the underlying os.pipe stuff is what lead me here in the first place. :)
Thanks for all your help!
Indeed, it wasn't working and I was suppressing the error telling us so. #52 fixes it to actually set the size, and warns on failure.
It should actually do what it was claiming in 3.0.1
Well, other than seeing the warning about the OSError, I'm not seeing much difference.
I replicated some of the code in _setup_pipes in the IDE and found that exceeding the 1MB value lead to a PermissionError?
>>> from os import pipe
>>> from fcntl import fcntl
>>> pipe_out, pipe_in = pipe()
>>> fcntl(pipe_in, 1031, 8_000_000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
PermissionError: [Errno 1] Operation not permitted
>>> pipe_in
4
>>> fcntl(pipe_in, 1031, 65535)
65536
>>> fcntl(pipe_in, 1031, 4_000_000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
PermissionError: [Errno 1] Operation not permitted
>>> fcntl(pipe_in, 1031, 2_000_000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
PermissionError: [Errno 1] Operation not permitted
>>> fcntl(4, 1031, 1_000_000)
1048576
>>> fcntl(4, 1032)
1048576
>>> fcntl(4, 1031, 1048577)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
PermissionError: [Errno 1] Operation not permitted
Curious, I attached a sys.audithook and found this:
>>> fcntl(pipe_in, 1031, 8_000_000)
('exec', (<code object <module> at 0x2b95ac49bf50, file "<stdin>", line 1>,))
('fcntl.fcntl', (4, 1031, 8000000))
('sys.excepthook', (<built-in function excepthook>, <class 'PermissionError'>, PermissionError(1, 'Operation not permitted'), <traceback object at 0x2b95abc10c80>))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
('open', ('<stdin>', 'r', 524288))
('open', ('<stdin>', 'r', 524288))
('open', ('/path/to/python/3.8.11/1/el-7-x86_64/lib/python38.zip/<stdin>', 'r', 524288))
('open', ('/path/to/python/3.8.11/1/el-7-x86_64/lib/python3.8/<stdin>', 'r', 524288))
('open', ('/path/to/python/3.8.11/1/el-7-x86_64/lib/python3.8/lib-dynload/<stdin>', 'r', 524288))
('open', ('/path/to/app-py3.8/lib/python3.8/site-packages/<stdin>', 'r', 524288))
('open', ('path/to/app/<stdin>', 'r', 524288))
PermissionError: [Errno 1] Operation not permitted
('compile', (None, '<stdin>'))
I don't entirely know what that means, but is it possible that my installation of python 3.8 does not permit raising the pipe size above /proc/sys/fs/pipe-max-size
?
I think it's actually your linux kernel that prevents exceeding pipe-max-size
. That really is a limit enforced by the system, like the open fd limit, etc.. If you have permission (CAP_SYS_RESOURCE), you can up the limit first:
echo 16777216 > /proc/sys/fs/pipe-max-size # 16MB
cat /proc/sys/fs/pipe-max-size
then it should handle up to 16MB before the thread needs to start consuming things
I think you're right, and no I don't have permissions (and my case wouldn't be very safe or portable if it were necessary anyway).
I will have to come at this issue from another angle, probably. Thanks for all your help and I hope that wurlitzer became more robust because of all this!
Thanks for all your help and I hope that wurlitzer became more robust because of all this!
It absolutely did, thanks for your testing!
You should be able to do arbitrary size with files. I'll think about adding an option to use files in wurlitzer, but in the meantime you can do it without wurlitzer (it's simpler with files because you don't need the read thread).
The below context manager copies out the os.dup
logic from wurlitzer, without the pipes and threads, and should work for fully arbitrary output sizes
import ctypes
import os
import sys
from contextlib import contextmanager
libc = ctypes.CDLL(None)
# simplified linux-only version from wurlitzer
c_stdout_p = ctypes.c_void_p.in_dll(libc, 'stdout')
c_stderr_p = ctypes.c_void_p.in_dll(libc, 'stderr')
@contextmanager
def capture_to_file(stdout="./stdout", stderr="./stderr"):
stdout_f = stderr_f = None
if stdout:
stdout_f = open(stdout, mode="wb")
real_stdout = sys.__stdout__.fileno()
save_stdout = os.dup(real_stdout)
os.dup2(stdout_f.fileno(), real_stdout)
if stderr:
stderr_f = open(stderr, mode="wb")
real_stderr = sys.__stderr__.fileno()
save_stderr = os.dup(real_stderr)
os.dup2(stderr_f.fileno(), real_stderr)
try:
yield stdout, stderr # filenames, unlike wurlitzer's pipes
finally:
# flush to capture the last of it
libc.fflush(c_stdout_p)
libc.fflush(c_stderr_p)
if stdout:
os.dup2(save_stdout, real_stdout)
stdout_f.close()
if stderr:
os.dup2(save_stderr, real_stderr)
stderr_f.close()
...
with capture_to_file() as (stdout_fname, stderr_fname):
produce_output()
with open(stdout_fname) as f:
stdout = f.read()
os.remove(stdout_fname)
full example in this gist.
Thanks, I'll give that a shot when I get a chance!
Bug description
The following code works for most C function calls, but hangs when the C-side tries to output thousands of lines. Unfortunately, I am not able to quantify the number of kB at which it hangs:
Naturally, when the
c_function
is called without the context manager, all lines are printed to stdout and application completes normally.I have also tried changing the arguments to pipes, and I've tried sys_pipes(), etc.
Expected behaviour
I know that pipes filling up can cause this, so I suspect that the stdout buffer is exceeding 64k. If you have any suggestions as to how to increase that size, I'm open to that as well; however adding that as an argument to
pipes
would be ideal.Actual behaviour
How to reproduce
Try creating a C function that outputs many kB of code and determine at which point it hangs.
Your personal set up