Closed metcalf closed 8 years ago
r? @zenazn
LGTM
I'll deploy this internally but I don't have Rubygems bits so I won't deploy publicly.
Am I missing something, or is this pretty backwards-incompatible? What would happen to someone currently using something like check_output(['gzip', whatever])
?
Hmm, I think you're right that would break but I think I'd rather break that than have Subprocess perpetually mishandle non-ASCII character sets by default. I don't really see a way of fixing that without introducing that incompatibility. This won't be a problem if you pass in your own pipe/file for stdout
with the correct encoding so it's still very much possible to use Subprocess
for binary things.
Yeah, I'm not saying that this behavior is wrong, I'm just concerned because we have a looooot of untested subprocess
-using code in places like puppet-config - I'm fairly sure that this will break artifact generation in Jenkins, for instance (which does a similar thing to above). At the very least we should bump the version to 2.0 since this is a breaking API change in a public open source module, but I think we'll also need to be very careful with the rollout (puppet-config does ensure => latest
a lot when installing global gems, among other things).
Alternately, we could require the encoding to be specified explicitly if it's not binary? It wouldn't be great, but it would avoid breaking things.
@doy I wrote up a little test script that reproduces the behavior of our build script and confirmed that this change does not break it:
# coding: utf-8
require 'rubygems'
gem 'subprocess', '1.3.2'
require 'subprocess'
require 'fileutils'
require 'tmpdir'
CONTENTS = "I'm a little teåpot"
Dir.mktmpdir do |tmpdir|
indir = File.join(tmpdir, 'input')
outdir = File.join(tmpdir, 'output')
outfile = File.join(tmpdir, 'output.tgz')
FileUtils.mkdir_p(indir)
FileUtils.mkdir_p(outdir)
File.write(File.join(indir, 'myfile'), CONTENTS)
pigz = Subprocess.popen(
['pigz'],
stdout: File.open(outfile, 'w'),
stdin: Subprocess::PIPE
)
tar = Subprocess.popen(
['tar', '-c', '-C', indir, '--exclude=.git', '.'],
stdout: pigz.stdin
)
pigz.stdin.close
[pigz, tar].each do |p|
unless p.wait.success?
raise Subprocess::NonZeroExit.new(p.command, p.status)
end
end
Subprocess.check_call(
%W{tar -xz -C #{outdir} -f #{outfile}},
)
puts File.read(File.join(outdir, 'myfile'))
end
I also confirmed that this works even if you don't pipe things directly from one process to another but read them in memory in Ruby first:
# coding: utf-8
require 'rubygems'
gem 'subprocess', '1.3.2'
require 'subprocess'
require 'fileutils'
require 'tmpdir'
CONTENTS = "I'm a little teåpot"
Dir.mktmpdir do |tmpdir|
indir = File.join(tmpdir, 'input')
outdir = File.join(tmpdir, 'output')
outfile = File.join(tmpdir, 'output.tgz')
FileUtils.mkdir_p(indir)
FileUtils.mkdir_p(outdir)
File.write(File.join(indir, 'myfile'), CONTENTS)
tarred = Subprocess.check_output(
['tar', '-c', '-C', indir, '--exclude=.git', '.'],
stdout: Subprocess::PIPE
)
Subprocess.check_call(
['pigz'],
stdout: Subprocess::PIPE,
stdin: Subprocess::PIPE
) do |p|
out, = p.communicate(tarred)
File.write(outfile, out)
end
Subprocess.check_call(
%W{tar -xz -C #{outdir} -f #{outfile}},
)
puts File.read(File.join(outdir, 'myfile'))
end
Okay, that's a bit more reassuring. I still think that we should bump the version to 2.0.0 and be a bit careful about the rollout, but it's probably a reasonable thing to do.
I'll let @zenazn weigh in on the version bump. Given that I can't repro any backwards incompatibility here, I don't think it's justified.
Well, there are clearly things that will behave differently if given a utf8-encoded string of characters instead of the equivalent string of bytes - it's just that writing them to a file or writing them back into an input pipe of another subprocess aren't those things, since those will always be interpreted as bytes regardless. Depends on how strictly you care about following semantic versioning, I guess?
Subprocess always returns output strings encoded as ASCII-8BIT even if the IO uses a different encoding.
read_nonblock
really can't do anything else since it's reading raw bytes and might split up Unicode characters. Instead, we need to set the encoding of the complete string after we're done receiving it.external_encoding
isn't readable on closed IOs so we have to read that at the beginning ofcommunicate
.