stripe / subprocess

A port of Python's subprocess module to Ruby
MIT License
208 stars 17 forks source link

Apply captured encoding to returned output across IO selection cycles #58

Closed andrew-breunig closed 4 years ago

andrew-breunig commented 4 years ago

Purpose

Preserve encoding of output from Process#communicate with no block given across multiple cycles of IO selection.

Context

The previous PR #57 on this repository changed the way that encoding is preserved by Process#communicate, so that output yielded to a block would preserve encoding similarly to returned output when no block is given. Tests included in that branch demonstrated that encoding was still preserved with no block given within a single IO write cycle, but failed to account for cases where no block is given and the subprocess writes across multiple cycles of IO selection.

Under those conditions, the previous changes introduce a new failure mode in which attempts to append data to a write buffer raise an Encoding::CompatibilityError. This results from applying captured encoding to the write buffer after the first IO selection cycle, then attempting to append data which is read from the subprocess pipe via IO#read_nonblock.

See the previous PR description and linked snippet for more information on the encoding behavior of #read_nonblock.

Approach

This new failure mode arises from the use of IO#read_nonblock along with the application of captured encoding to the write buffer before the subprocess has finished writing to it.

One way to solve this problem is to apply captured encoding only once at each place place that Process#communicate can yield or return output—taken from the context prior to #57, this would mean using this existing code:

stdout.force_encoding(stdout_encoding) if stdout_encoding
stderr.force_encoding(stderr_encoding) if stderr_encoding

in three places (one of which exists already):

  1. https://github.com/stripe/subprocess/blob/b914fb4748ec7fbcc7433cfe92a89e8ba20ce0c1/lib/subprocess.rb#L426
  2. https://github.com/stripe/subprocess/blob/b914fb4748ec7fbcc7433cfe92a89e8ba20ce0c1/lib/subprocess.rb#L482
  3. https://github.com/stripe/subprocess/blob/b914fb4748ec7fbcc7433cfe92a89e8ba20ce0c1/lib/subprocess.rb#L490-L491

That approach has the benefit of least possible encoding modification, but it has the detriment of requiring duplicated logic—even extracted to a method, that step must be applied at each new output introduced to #communicate. That maintenance complication is the root cause of the current as well as the previous PR.

This branch proposes instead to apply the encoding directly to the data read from the subprocess pipe, before appending to the write buffer. While this represents increased encoding modification for subprocesses which span IO selection cycles when no block is given to #communicate, note that String#force_encoding modifies only the external encoding of the string, and does not transcode its contents. This branch assumes the performance detriment is therefore negligible compared to the maintenance benefit of handling encoding in one place, at the input.

Testing

This branch returns the "multiwrite script" in the test file to its original state prior to #57 and instead creates a new pair of assertions describing Subprocess::Process#communicate and its encoding behavior. The subprocess is given a command designed to output non-ASCII data* over multiple IO selection cycles, and the assertions validate encoding preservation both with a block given and with no block given.

Note that the modified tests fail without the included change, but pass with the included change:

Without Change

abreunig:subprocess(abreunig-comm-buffer-encoding)$ bundle exec rake test
Run options: --seed 8808

# Running:

...............................E.........

Finished in 2.940539s, 13.9430 runs/s, 20.0643 assertions/s.

  1) Error:
Subprocess::Subprocess::Process::#communicate#test_0001_preserves encoding with no block given:
Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT
    /Users/abreunig/subprocess/lib/subprocess.rb:374:in `block in drain_fd'
    /Users/abreunig/subprocess/lib/subprocess.rb:372:in `loop'
    /Users/abreunig/subprocess/lib/subprocess.rb:372:in `drain_fd'
    /Users/abreunig/subprocess/lib/subprocess.rb:430:in `block in communicate'
    /Users/abreunig/subprocess/lib/subprocess.rb:636:in `block in catching_sigchld'
    /Users/abreunig/subprocess/lib/subprocess.rb:633:in `pipe'
    /Users/abreunig/subprocess/lib/subprocess.rb:633:in `catching_sigchld'
    /Users/abreunig/subprocess/lib/subprocess.rb:414:in `communicate'
    /Users/abreunig/subprocess/test/test_subprocess.rb:430:in `block (4 levels) in <top (required)>'

41 runs, 59 assertions, 0 failures, 1 errors, 0 skips

With Change

abreunig:subprocess(abreunig-comm-buffer-encoding)$ bundle exec rake test
Run options: --seed 46378

# Running:

.........................................

Finished in 2.930579s, 13.9904 runs/s, 20.4738 assertions/s.

41 runs, 60 assertions, 0 failures, 0 errors, 0 skips

*"你好 世界" is Google's translation of "Hello World" into Simplified Chinese.

andrew-breunig commented 4 years ago

Previous commit built on macOS/ruby@2.4 and macOS/ruby@2.5, but failed to build on macOS/ruby@2.3. Fixed up commit (test name typo) failed to build on macOS for all three ruby versions, with this message from TravisCI:

An error occurred while generating the build script.

Developed locally on macOS/ruby@2.3.8. Also tested on macOS/ruby@2.6.5.