Closed andrew-breunig closed 4 years ago
Previous commit built on macOS/ruby@2.4
and macOS/ruby@2.5
, but failed to build on macOS/ruby@2.3
. Fixed up commit (test name typo) failed to build on macOS for all three ruby
versions, with this message from TravisCI:
An error occurred while generating the build script.
Developed locally on macOS/ruby@2.3.8
. Also tested on macOS/ruby@2.6.5
.
Purpose
Preserve encoding of output from
Process#communicate
with no block given across multiple cycles ofIO
selection.Context
The previous PR #57 on this repository changed the way that encoding is preserved by
Process#communicate
, so that output yielded to a block would preserve encoding similarly to returned output when no block is given. Tests included in that branch demonstrated that encoding was still preserved with no block given within a singleIO
write cycle, but failed to account for cases where no block is given and the subprocess writes across multiple cycles ofIO
selection.Under those conditions, the previous changes introduce a new failure mode in which attempts to append data to a write buffer raise an
Encoding::CompatibilityError
. This results from applying captured encoding to the write buffer after the firstIO
selection cycle, then attempting to append data which is read from the subprocess pipe viaIO#read_nonblock
.See the previous PR description and linked snippet for more information on the encoding behavior of
#read_nonblock
.Approach
This new failure mode arises from the use of
IO#read_nonblock
along with the application of captured encoding to the write buffer before the subprocess has finished writing to it.One way to solve this problem is to apply captured encoding only once at each place place that
Process#communicate
can yield or return output—taken from the context prior to #57, this would mean using this existing code:in three places (one of which exists already):
That approach has the benefit of least possible encoding modification, but it has the detriment of requiring duplicated logic—even extracted to a method, that step must be applied at each new output introduced to
#communicate
. That maintenance complication is the root cause of the current as well as the previous PR.This branch proposes instead to apply the encoding directly to the data read from the subprocess pipe, before appending to the write buffer. While this represents increased encoding modification for subprocesses which span
IO
selection cycles when no block is given to#communicate
, note thatString#force_encoding
modifies only the external encoding of the string, and does not transcode its contents. This branch assumes the performance detriment is therefore negligible compared to the maintenance benefit of handling encoding in one place, at the input.Testing
This branch returns the "multiwrite script" in the test file to its original state prior to #57 and instead creates a new pair of assertions describing
Subprocess::Process#communicate
and its encoding behavior. The subprocess is given a command designed to output non-ASCII data* over multipleIO
selection cycles, and the assertions validate encoding preservation both with a block given and with no block given.Note that the modified tests fail without the included change, but pass with the included change:
Without Change
With Change
*"你好 世界" is Google's translation of "Hello World" into Simplified Chinese.