piotrmurach / tty-command

Execute shell commands with pretty output logging and capture stdout, stderr and exit status.
https://ttytoolkit.org
MIT License
400 stars 34 forks source link

binary streamed :in ? #66

Closed jrochkind closed 7 months ago

jrochkind commented 9 months ago

ruby: 3.2.2 TTY::Command: 0.10.1

I am trying to stream a Down::ChunkedIO object from down to a TTY::Command.

Where the content could be arbitrary binary content. But binary content streamed in: raises a Encoding::UndefinedConversionError

A simple reproduction:

require 'down'
require 'tty/command'

down_io = Down.open("https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg")

runner = TTY::Command.new

runner.run("cat > out.jpg", in: down_io)

But this raises Encoding::UndefinedConversionError

/Users/jrochkind/.rubies/ruby-3.2.2/lib/ruby/3.2.0/delegate.rb:349:in `write': "\xFF" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)
    from /Users/jrochkind/.rubies/ruby-3.2.2/lib/ruby/3.2.0/delegate.rb:349:in `block in delegating_block'
    from /Users/jrochkind/.gem/ruby/3.2.2/gems/tty-command-0.10.1/lib/tty/command/child_process.rb:200:in `convert_to_fd'
    from /Users/jrochkind/.gem/ruby/3.2.2/gems/tty-command-0.10.1/lib/tty/command/child_process.rb:136:in `convert'
    from /Users/jrochkind/.gem/ruby/3.2.2/gems/tty-command-0.10.1/lib/tty/command/child_process.rb:115:in `block in normalize_redirect_options'
    from /Users/jrochkind/.gem/ruby/3.2.2/gems/tty-command-0.10.1/lib/tty/command/child_process.rb:113:in `each'
    from /Users/jrochkind/.gem/ruby/3.2.2/gems/tty-command-0.10.1/lib/tty/command/child_process.rb:113:in `reduce'
    from /Users/jrochkind/.gem/ruby/3.2.2/gems/tty-command-0.10.1/lib/tty/command/child_process.rb:113:in `normalize_redirect_options'
    from /Users/jrochkind/.gem/ruby/3.2.2/gems/tty-command-0.10.1/lib/tty/command/child_process.rb:24:in `spawn'
    from /Users/jrochkind/.gem/ruby/3.2.2/gems/tty-command-0.10.1/lib/tty/command/process_runner.rb:44:in `run!'
    from /Users/jrochkind/.gem/ruby/3.2.2/gems/tty-command-0.10.1/lib/tty/command.rb:185:in `execute_command'
    from /Users/jrochkind/.gem/ruby/3.2.2/gems/tty-command-0.10.1/lib/tty/command.rb:104:in `run'

I expect this to instead properly stream to process, in this case (for reproduction purposes only) resulting in writing output to out.jpg.

I think I can fix the problem by changing this line:

https://github.com/piotrmurach/tty-command/blob/6aeb499dc323aebca7450f95f3fc8360a0d1ee83/lib/tty/command/child_process.rb#L198

To:

tmp = ::Tempfile.new(::SecureRandom.uuid.split("-")[0], :encoding => 'ascii-8bit')

If I make that change, the reproduction script no longer raises, and has the outcome I expect -- writing the bytes to local out.jpg.

I'm not sure if that change might break anything else, or if this is simply a bug that can be fixed?

But wait, streaming in: writes a a tempfile?

However, looking at that code and seeing it's writing the stream to a tempfile on disk.... may defeat the purpose of what I'm trying to do anyway. I was trying to improve performance by avoiding the write to disk, and streaming the bytes from network directly to the process. (Which is not cat in my real use case of course, that was just a simple reproduction).

But if TTY::Command is going to write whatever in: stream I give it to disk in a temporary file anyway, I'm not going to get the performance increase I was hoping for. So this may still be a bug that should be fixed, but maybe won't actually help me out.

Is is not feasible to stream the bytes directly to the process without creating the intermediate temp file? In my real use case, these are very large files from the network, and I was hoping to stream them directly to a spawned process stdin without first buffering them all on disk and taking the performance hit of the write-to-disk then read-from-disk.

piotrmurach commented 7 months ago

Hi Jonathan,

Thanks for submitting this issue.

However, looking at that code and seeing it's writing the stream to a tempfile on disk.... may defeat the purpose of what I'm trying to do anyway. I was trying to improve performance by avoiding the write to disk, and streaming the bytes from network directly to the process. (Which is not cat in my real use case of course, that was just a simple reproduction).

But if TTY::Command is going to write whatever in: stream I give it to disk in a temporary file anyway, I'm not going to get the performance increase I was hoping for. So this may still be a bug that should be fixed, but maybe won't actually help me out.

That's not how tty-command works. I believe you have jumped to the wrong conclusion based on spurious understanding.

The tty-command is a wrapper around Process.spawn. It doesn't aim to reinvent the wheel and instead piggybacks on the spawn method and its options. This is key. The spawn method provides the following redirection options and allowed values:

redirection:
    key:
      FD              : single file descriptor in child process
      [FD, FD, ...]   : multiple file descriptor in child process
    value:
      FD                        : redirect to the file descriptor in parent process
      string                    : redirect to file with open(string, "r" or "w")
      [string]                  : redirect to file with open(string, File::RDONLY)
      [string, open_mode]       : redirect to file with open(string, open_mode, 0644)
      [string, open_mode, perm] : redirect to file with open(string, open_mode, perm)
      [:child, FD]              : redirect to the redirected file descriptor
      :close                    : close the file descriptor in child process

Further, the file descriptor can be given as:

      :in     : the file descriptor 0 which is the standard input
      :out    : the file descriptor 1 which is the standard output
      :err    : the file descriptor 2 which is the standard error
      integer : the file descriptor of specified the integer
      io      : the file descriptor specified as io.fileno

The tty-command examines these redirect options. In almost all cases, the TTY::Command passes these through to spawn unchanged which is the first line in the convert_to_fd method:

return object if fd?(object)

However, when the object provided is a string it intercepts it and first checks if it is a file, and if so lets it through:

  if object.is_a?(::String) && ::File.exist?(object)
    return object
  end

Finally, in an exceptional circumstance when the redirect is a string that doesn't reference a file, it converts it to a Tempfile so that it can conform to the spawn interface. It then can be passed to spawn, for example, as a value for :in option. Why have I made this exception? I wanted to be able to pass a keyboard input as a string like "input\ninput\ninput\n" for the benefits of tools like tty-prompt.

Now back to your original issue of providing Down::ChunkedIO object as a value for :in option. As shown above, the spawn method can accept FD(:in, :out, :err, integer, IO),string and :close as values for :in option. However, your expectation is for :in to accept what is essentially a random object called IO but not an IO for all intents and purposes and stream the content down to process. Trying this out with spawn:

down_io = Down.open("https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg")
Process.spawn("cat > out.jpg", in: down_io)

You get an error:

 `spawn': wrong exec redirect action (ArgumentError)

Therefore, I don't believe this to be a bug. If anything, the TTY::Command should raise an error for any value that doesn't conform to the spawn interface. The Down::ChunkedIO object certainly doesn't conform to any of the expected values.

As for the encoding change in the Tempfile, this will be useful in cases when the input string contains ASCII 8-bit characters. However, the file should probably be written as binary to avoid any encoding issues.

Not sure what the best solution for your case is. Ideally, TTY::Command instance would expose the child process streams and make them available to down_io to write in chunks. However, this is not a feature that this gem offers yet.

jrochkind commented 7 months ago

Thank you that was very helpful!

However, your expectation is for :in to accept what is essentially a random object called IO but not an IO for all intents and purposes and stream the content down to process

The basic problem is that ruby provides no actual specification of what an "IO" is! Down::ChunkedIO is trying it's best to provide something that is an IO for all intents and purposes -- and it does succesfully serve as an IO in many contexts -- but ruby just doesn't make it easy (or possibly even feasible at all) for anything other than the stdlib objects to be IO "for all intents and purposes". :(