pyinvoke / invoke

Pythonic task management & command execution.
http://pyinvoke.org
BSD 2-Clause "Simplified" License
4.32k stars 365 forks source link

Python io.BytesIO binary stream as {out,err}_stream container #853

Open reporter4u opened 2 years ago

reporter4u commented 2 years ago

Hi, I'm trying to capture and process in real time the output of a command started with run in asynchronous mode, with the out_stream option using an in-memory buffer. So I've used the built-in io library with BytesIO class: is this class supported in your Invoke implementation?

I post the following code example:

import io
import time
from invoke import run

out = io.BytesIO()

run('ls -la && sleep 2 && ls -la', warn=True, pty=True, out_stream=out, asynchronous=True)

time.sleep(1)

while True:
    myline=out.readline()
    print(myline)
    time.sleep(1)
    if not myline:
        out.close()
        break

If you run this code, as you can see, it won't print any binary data despite readline() method is supported by BytesIO class. It seems run method is not writing in the out binary (buffer) object. Furthermore if you put asynchronous to False the script exits with this error:

Traceback (most recent call last):

  File "/usr/local/lib/python3.9/dist-packages/invoke/util.py", line 237, in run
    super(ExceptionHandlingThread, self).run()

  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)

  File "/usr/local/lib/python3.9/dist-packages/invoke/runners.py", line 756, in handle_stdout
    self._handle_output(

  File "/usr/local/lib/python3.9/dist-packages/invoke/runners.py", line 730, in _handle_output
    self.write_our_output(stream=output, string=data)

  File "/usr/local/lib/python3.9/dist-packages/invoke/runners.py", line 718, in write_our_output
    stream.write(encode_output(string, self.encoding))

TypeError: a bytes-like object is required, not 'str'

AFAIK io.BytesIO() is not a 'str' object, it's a binary stream using an in-memory bytes buffer.

Is it possible to use a BytesIO type (for binary data) as a buffer container of {out,err}_stream? If not, is it possible to implement it in Invoke, in order to process the output stream in real time (not when the command, that is very long in my project, has finished) without using filesystem (file-like) objects?

Thank you in advance for your help!

Roberto

neozenith commented 2 years ago

Hi, I have added this to my triage queue. I'll need to dive into the code and understand what is happening in that part of the code.

Thanks for providing the example and the stack trace. That is super helpful.

leamingrad commented 2 years ago

I've just run into this issue myself (with the context of using a file rather than BytesIO). The issue is that run expects out_stream (and err_stream) to be text streams rather than bytes streams. I think that the example above would work if you used StringIO rather than BytesIO.

reporter4u commented 2 years ago

I've just run into this issue myself (with the context of using a file rather than BytesIO). The issue is that run expects out_stream (and err_stream) to be text streams rather than bytes streams. I think that the example above would work if you used StringIO rather than BytesIO.

I tried to use StringIO before BytesIO and it didn't work anyway, both with asynchronous set to True or False (I need asynchronous to True). You can try this replacing the line out = io.BytesIO() with out = io.StringIO() as you'll see it won't print anything. Opening this issue I posted the example with io.BytesIO due to the error TypeError: a bytes-like object is required, not 'str'. AFAIK both {out,err}_stream as well as sys.stdout are file-like object.

This is my environment: SO: SMP Debian 5.10.113-1 Distribution: Debian 11 Python: 3.9.2 Invoke: 1.6.0

Can you try it in your python environment?

PS. Likely it doesn't mean anything... I tried also to change the parameters warn and pty without any positive effects.

leamingrad commented 2 years ago

I've just looked at this a bit more, and the issue is that calling readline on the StringIO won't work as expected because writing to the stream will move the stream position forward (so calling readline won't return anything). For example:

In [1]: import io

In [2]: buffer = io.StringIO()

In [3]: buffer.write("test")
Out[3]: 4

In [4]: buffer.readline()
Out[4]: ''

I'm not sure how to read from the same stream as you are writing it, but from a quick google this thread has a few options.

reporter4u commented 2 years ago

In your example you should use the method seek() before readline():

Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import io
>>> output = io.StringIO()
>>> output.write('First line.\n')
12
>>> print('Second line.', file=output)
>>> output.seek(0)
0
>>> output.readline()
'First line.\n'
>>> output.readline()
'Second line.\n'

I'm not sure how to read from the same stream as you are writing it

It is exactly what I need.

Despite StringIO and BytesIO, my main goal is to have a sort of in memory buffer where to read into line by line while run with asynchronous=True is writing into it, since my process lasts a lot of time even for hours.

With io.StreamIO using asynchronous=True it seems run stops to write into out_stream when the script start to read in it . This is another example with my script test.py inside a folder with other 4 dummy files:

import io
import time
from invoke import run

out = io.StringIO()

run('echo "foo" && ls -la && sleep 3 && echo "bar" && ls -la', warn=True, pty=True, out_stream=out, asynchronous=True)

time.sleep(1)

out.seek(0)

for i in range(1,19):
    myline=out.readline()
    print(f'line {i}: {myline}')

print('\n\n\n#out.getvalue() content')
print(out.getvalue())
out.close()

And as you can see the command output is incomplete after for cycle as well as in the print(out.getvalue()) call. This is what it prints out:

line 1: foo

line 2: totale 12

line 3: drwxr-xr-x  2 test test 4096 11 mag 11.47 .

line 4: drwxr-xr-x 15 test test 4096 11 mag 07.54 ..

line 5: -rw-r--r--  1 test test    0 11 mag 07.55 file1

line 6: -rw-r--r--  1 test test    0 11 mag 07.55 file2

line 7: -rw-r--r--  1 test test    0 11 mag 07.55 file3

line 8: -rw-r--r--  1 test test    0 11 mag 07.55 file4

line 9: -rw-r--r--  1 test test  385 11 mag 11.47 test.py

line 10: 
line 11: 
line 12: 
line 13: 
line 14: 
line 15: 
line 16: 
line 17: 
line 18: 

#out.getvalue() content
foo
totale 12
drwxr-xr-x  2 test test 4096 11 mag 11.47 .
drwxr-xr-x 15 test test 4096 11 mag 07.54 ..
-rw-r--r--  1 test test    0 11 mag 07.55 file1
-rw-r--r--  1 test test    0 11 mag 07.55 file2
-rw-r--r--  1 test test    0 11 mag 07.55 file3
-rw-r--r--  1 test test    0 11 mag 07.55 file4
-rw-r--r--  1 test test  385 11 mag 11.47 test.py

Now, if you increase the delay time.sleep(15) it prints out the complete command output, likely because the command terminate before sleep expires and then before to start reading in out_stream. This is the output:

line 1: foo

line 2: totale 12

line 3: drwxr-xr-x  2 test test 4096 11 mag 11.51 .

line 4: drwxr-xr-x 15 test test 4096 11 mag 07.54 ..

line 5: -rw-r--r--  1 test test    0 11 mag 07.55 file1

line 6: -rw-r--r--  1 test test    0 11 mag 07.55 file2

line 7: -rw-r--r--  1 test test    0 11 mag 07.55 file3

line 8: -rw-r--r--  1 test test    0 11 mag 07.55 file4

line 9: -rw-r--r--  1 test test  385 11 mag 11.51 test.py

line 10: bar

line 11: totale 12

line 12: drwxr-xr-x  2 test test 4096 11 mag 11.51 .

line 13: drwxr-xr-x 15 test test 4096 11 mag 07.54 ..

line 14: -rw-r--r--  1 test test    0 11 mag 07.55 file1

line 15: -rw-r--r--  1 test test    0 11 mag 07.55 file2

line 16: -rw-r--r--  1 test test    0 11 mag 07.55 file3

line 17: -rw-r--r--  1 test test    0 11 mag 07.55 file4

line 18: -rw-r--r--  1 test test  385 11 mag 11.51 test.py

#out.getvalue() content
foo
totale 12
drwxr-xr-x  2 test test 4096 11 mag 11.51 .
drwxr-xr-x 15 test test 4096 11 mag 07.54 ..
-rw-r--r--  1 test test    0 11 mag 07.55 file1
-rw-r--r--  1 test test    0 11 mag 07.55 file2
-rw-r--r--  1 test test    0 11 mag 07.55 file3
-rw-r--r--  1 test test    0 11 mag 07.55 file4
-rw-r--r--  1 test test  385 11 mag 11.51 test.py
bar
totale 12
drwxr-xr-x  2 test test 4096 11 mag 11.51 .
drwxr-xr-x 15 test test 4096 11 mag 07.54 ..
-rw-r--r--  1 test test    0 11 mag 07.55 file1
-rw-r--r--  1 test test    0 11 mag 07.55 file2
-rw-r--r--  1 test test    0 11 mag 07.55 file3
-rw-r--r--  1 test test    0 11 mag 07.55 file4
-rw-r--r--  1 test test  385 11 mag 11.51 test.py