rust-cli / team

CLI working group
https://rust-cli.github.io/book
MIT License
296 stars 34 forks source link

Can the working group make libraries that help cli apps using pipes unknowingly use them correctly? #132

Open i30817 opened 5 years ago

i30817 commented 5 years ago

I can't count the number of times i had this sequence

  1. 'i can use a named pipe here to save memory in interprocess communication'
  2. i send the write end to the cli app on the other process with its command line output option
  3. i try to open the read end
  4. i hang indefinitely because the writer end found something it didn't like so it exited with a error without bothering to open the writer end (so it was never closed, so EOF was never sent).
  5. i hack around it by introducing a buggy delay in the call of the process and before reading input from the pipe to see if the child process killed itself.

I was thinking the WG could 'encourage' this not to happen by providing a argparse equivalent where the easiest way to code a cli app ends with output files 'already open when you get them' and thus closed automatically on process end. Does this make sense?

BurntSushi commented 5 years ago

Can you provide a minimum example that someone else can run that demonstrates the problem you're talking about? (I for one am very confused by your description and suggested solution.)

Additionally, could you show how your problem is mitigated in some other environment? (e.g., via argparse?)

i30817 commented 5 years ago

Ok, this problem normally exists from using named pipes.

 @contextmanager
 def named_pipes(n=1):
     dirname = tempfile.mkdtemp()
     try:
         paths = [os.path.join(dirname, 'romhack_pipe' + str(i)) for i in range(n)]
         for path in paths:
             os.mkfifo(path)
         yield paths
     finally:
         shutil.rmtree(dirname)

 def get_checksums():
     hash_md5   = hashlib.md5()
     hash_sha1  = hashlib.sha1()
     hash_crc32 = 0
     size = 0

     buf = yield
     while len(buf) > 0:
         size += len(buf)
         hash_md5.update(buf)
         hash_sha1.update(buf)
         hash_crc32 = zlib.crc32(buf, hash_crc32)
         buf = yield

     crc = '{:08x}'.format( hash_crc32 & 0xffffffff )
     md5 = hash_md5.hexdigest()
     sha1 = hash_sha1.hexdigest()

     yield (size, crc, md5, sha1)

 def producer(arguments, generator_function):
     ''' will append a output fifo to the end of the argument list prior to
         applying the generator function to that fifo. Make sure the command
         output is setup for the last argument to be a output file in the arguments
     '''
     BLOCKSIZE = 2 ** 20
     process = None
     next(generator_function)
     with named_pipes() as pipes:
         pipe = pipes[0]
         arguments.append(pipe)
         with subprocess.Popen(arguments,
                 stdout=subprocess.DEVNULL,
                 stderr=subprocess.DEVNULL) as process:
             with open(pipe, 'rb') as fifo:
                 byt = fifo.read(BLOCKSIZE)
                 while len(byt) > 0:
                     generator_function.send(byt)
                     byt = fifo.read(BLOCKSIZE)
     #if a error occurred avoid writing bogus checksums
     if process.returncode != 0:
         raise NonFatalError('error during patching')

     return generator_function.send([])

In the code above, the with open(pipe, 'rb') as fifo: can hang if the subprocess quit with a error without having opened the output file; ie: it never took part in the fifo protocol, so the caller is waiting forever. The subprocess just wrote a error to stderr (usually) and went away without touching the 'output file' it was given.

This can be avoided with a 'trick' using system anonymous pipes that happen to have a 'fake but real' filesystem interface in linux, and also gives much better code:

def get_checksums():
    hash_md5   = hashlib.md5()
    hash_sha1  = hashlib.sha1()
    hash_crc32 = 0
    size = 0

    buf = yield
    while len(buf) > 0:
        size += len(buf)
        hash_md5.update(buf)
        hash_sha1.update(buf)
        hash_crc32 = zlib.crc32(buf, hash_crc32)
        buf = yield

    crc = '{:08x}'.format( hash_crc32 & 0xffffffff )
    md5 = hash_md5.hexdigest()
    sha1 = hash_sha1.hexdigest()

    yield (size, crc, md5, sha1)

def producer(arguments, generator_function):
    ''' will append a output fifo to the end of the argument list prior to
        applying the generator function to that fifo. Make sure the command
        output is setup for the last argument to be a output file in the arguments
    '''
    #read and patchers write to the error stream so if there is a error, the pipe still gets notified
    arguments.append("/dev/stderr") #not portable to windows (maybe with CON but i'd have to test it)
    next(generator_function)
    with subprocess.Popen(arguments, stdout=subprocess.DEVNULL, stderr=subprocess.PIPE) as process:
        for byt in process.stderr:
            generator_function.send(byt)
    #if a error occurred avoid writing bogus checksums
    if process.returncode != 0:
        raise NonFatalError('error during patching')

    return generator_function.send([])

Notice the vanishment of the named pipe context manager and the much simplified producer.

In the above, there is no chance of a deadlock of that other nature because the code is making sure that the subprocess output 'file' is its own stderr, therefore although a 'error' will corrupt the stream, we don't want it in that case, and as it is something that every process opens, when it exits it'll close it and signal the reader to stop waiting and see the return value.

Anyway i was thinking that encouraging people that do cli apps to get a already opened file handle like argparse parser.add_argument('-o', metavar=('output-file'), type=FileType('a+', encoding='utf-8')) appears to do would make the number of apps that can work with named fifos increase, by making the easier path to have a open file already and it'd be closed on process exit.

I don't know what happens in the case where the 'destination' is overwritten by a move on the end (this is common for programs that try to support trying to support the same file as input and output). Probably a failure of the 'stream' but not a hang.

One thing i thought was a problem that wasn't actually: i thought mkfifo was portable (it isn't, so might as well use dev/stderr). There isn't actually a portable way to do this kind of 'almost subprocess code agnostic' interprocess communication in windows and linux/unix, so it's kind of discouraging.