stripe / subprocess

A port of Python's subprocess module to Ruby
MIT License
208 stars 17 forks source link

Apply captured encoding settings on output from partial reads in Process#communicate #57

Closed andrew-breunig closed 5 years ago

andrew-breunig commented 5 years ago

Purpose

Handle encoding of output from Process#communicate identically whether a block is provided or not.

Context

Process#communicate depends on Ruby's IO#read_nonblock via Process#drain_fd. Strings returned from read_nonblock are handled as ASCII-8BIT to prevent data loss. (For a demonstration of this behavior, download and run this snippet.) If output from a subprocess has a different encoding, this can lead to decoding errors.

When not provided a block, Process#communicate handles this by capturing encoding settings at the beginning of the method and re-applying these settings at the end of the method. However, when provided a block this method yields output before re-applying these settings, so the output still retains the ASCII-8BIT settings applied by read_nonblock.

This results in an unexpected (at least for me) discrepancy between the encoding settings of output from Process#communicate, depending on the form of the method invoked. Any consumer that depends on output from partially completed commands must provide a block (see PR #36), and so becomes subject to potential decoding errors.

Design

This branch resolves the issue by re-applying captured encoding settings before yielding output to the provided block.

One consideration: as written here, encoding settings will be re-applied every time the method attempts to read output. As noted in this comment on PR #36, this step can be omitted when no output is present on STDOUT or STDERR. Since performing this step on an empty string is a no-op, and the previous implementation does not perform this check before re-applying encoding settings, the proposed changes favor readability.

Testing

This branch modifies the "multiwrite script" in the test file to include non-ASCII characters, and updates dependent assertions. This modification reveals the potential for decoding errors when providing a block to Process#communicate.

Note that the modified tests fail without the included changes, but pass with the included changes.

"你好 世界" is Google's translation of "Hello World" into Chinese.

evan-stripe commented 5 years ago

Sorry for the delay on taking a look at this. Good catch! This all seems reasonable to me