rtomayko / posix-spawn

Ruby process spawning library
Other
520 stars 52 forks source link

Add partial output to POSIX::Spawn::Child #47

Closed arrbee closed 10 years ago

arrbee commented 10 years ago

This always keeps the output of a spawned child, even when an exception needs to be raised. To make this work, this requires that the exec! method of the child can be deferred from the creation of the POSIX::Spawn::Child object, too, so this also adds a :defer option to the constructor.

arrbee commented 10 years ago

I considered several different options for adding partial output to POSIX::Spawn::Child and this is what I felt was the best option. I'm curious what you'll think...

My first thought was to make the out and err attributes contain the partial stdout and stderr data even when the operation is interrupted. The problem with this is that POSIX::Spawn::Child immediately runs the command in the constructor and so throwing an exception prevents the object from ever being created, so there is no object to query the partial results from.

My second though was to add extra parameters, either a block to be invoked with the partial results or options (:output and :error) that could be invoked on each incremental read if provided. However, this approach feels quite strange to me. Essentially you start using POSIX::Spawn::Child for the side effects of accumulating output instead of looking at the return value, which seems to violate the design of the API.

My third idea was to add a :noraise option that indicates that the :timeout and :max options should terminate the child process but not raise an exception. In those cases, instead of calling raise, we would add an extra attr_reader (maybe :exception that would be set to the exception object without it being raised). The problem with this design is that it provides no incremental output for an externally interrupted command, for example if used with timeout.

My last thought was to go back to the original approach (i.e. the out and err attributes contain partial output) and just add a :defer property to the constructor which skips the immediate call to exec!. Then we make the exec! method public.

I implemented this last approach and it seems to work, although I don't feel it's lovely. What do you think?

arrbee commented 10 years ago

By the way, I don't have a development environment where the test suite actually passes (just Mac OS and Ubuntu), so I'm not totally confident here, but the new tests that I added do seem to pass, so I'm optimistic. :grin:

tmm1 commented 10 years ago

Hmm, not sold on the :defer option.

so throwing an exception prevents the object from ever being created, so there is no object to query the partial results from.

One alternative would be to stick the data in the exception object.. something like:

begin
  POSIX::Spawn::Child.new(....)
rescue MaximumOutputExceeded => e
  puts e.out # or e.child.out or whatever
end

We'd have to implement that for both Timeout and MaximumOutput errors (via a base class I guess?). Not sure if it ends up being any cleaner than what you have here..

arrbee commented 10 years ago

One concern with putting the output in the exception is that externally raised exceptions, such as from:

begin
  timeout 10 do
    p = POSIX::Spawn::Child(...)
  end
rescue Timer::Error => boom
   ...
end

can't work since the exception is no longer being created internally.

arrbee commented 10 years ago

(Although I agree that :defer is ugly, don't get me wrong...)

tmm1 commented 10 years ago

Yea it totally breaks on external exceptions. Hrm.

Another alternative would be a new method instead of a new option.. POSIX::Spawn::Child.build(...).exec! or something.

I actually don't have a strong preference.. If the new :defer option is the easiest/fastest way to get this going then let's just do that.

arrbee commented 10 years ago

I like the idea of an alternative constructor to prepare a child without exec'ing it. That actually seems more natural than an option to existing constructor. Let me write it up...

arrbee commented 10 years ago

Okay, well, I like this approach better. I named the alternative constructor prepare because it kind of reminded me of a SQL prepared statement, but that's easy to change if necessary.

I'm not sure if constructing the new object with allocate and then explicitly sending the :process_args is too strange, but I wasn't sure about a better way to do it (without having to leave in some form of the :defer argument that would be used internally).

rtomayko commented 10 years ago

:heart:

I definitely prefer the alternative constructor approach as well.

I have a slight preference for "build" over "prepare" just due to what seems to be more common naming in Ruby projects with similar patterns.

I'd probably lean toward the internal arg over allocate. It's not as sophisticated but a bit simpler to follow IMO. Going into a library and seeing allocate overridden always has a weird smell to it, especially when you're trying to track down a weird bug or something.

rtomayko commented 10 years ago

I'd love to get a small example of the deferred form here once we settle on method naming and whatnot:

https://github.com/rtomayko/posix-spawn/blob/master/README.md#posixspawnchild

arrbee commented 10 years ago

I have a slight preference for "build" over "prepare" just due to what seems to be more common naming in Ruby projects with similar patterns.

Awesome, thanks! This is exactly the type of thing that I just don't do enough Ruby to have a feel for.

I went back to the internal argument instead of using allocate. I like the results much better; however, I did have to extract the options from the *args value in order to merge in the extra internal option. This is another case where I'm not sure if I coded it in an idiomatic manner, so let me know.

I also took a stab at adding some docs to README.md.

tmm1 commented 10 years ago

:+1: to using build over prepare, and implementing it with an internal :defer instead of the allocate.

I also wonder if Child objects should be allowed to be re-used. Maybe exec! should raise if it's called again.

arrbee commented 10 years ago

I have no opinion re calling exec! more than once. If you are spawning a process with side effects, I suppose you might want to build it and then run it repeatedly. There is no advantage to doing so, from what I can tell, so maybe we should raise to discourage that pattern.

I'll remove the lines that do output truncation when the limit is hit. I wrote them when I was concerned that the output could potentially exceed the max by an unlimited amount, but since it is bounded by the BUFSIZE, I don't think it's important.

arrbee commented 10 years ago

I'll remove the lines that do output truncation when the limit is hit.

Actually, how strongly do you feel about this? Knowing the exact limit on the result makes it a little easier to write tests. I guess I could just test the child.out is >= to the limit, instead of being precise...

tmm1 commented 10 years ago

Yea you could use assert_operator obj, :>=, limit

arrbee commented 10 years ago

Okay, thanks! I'm going to sit on this for the evening and I'll wrap it up tomorrow morning.

arrbee commented 10 years ago

Okay, I removed the out and err truncation logic, made an assertion helper for the tests that checks that the output matches or exceeds a given string repeated out to length X, and added a sentence to the docs clarifying the behavior.

I'm feeling pretty good about this now. I like it that with this approach there is very little actual code changed - probably 90% of the changes are documentation and tests.

rtomayko commented 10 years ago

Looks great :sparkles:

Let's go through the process of rolling it out on github.com (updating gitrpc, seeing github/github tests pass, deploy, etc) and then I'll merge and cut a gem release here.

arrbee commented 10 years ago

fwiw, this is now running in production for github/github and seems to be working for accessing partial results even after a timeout.

adelcambre commented 10 years ago

This has been out on prod for a few months now (and I couldn't figure out where POSIX::Spawn::Child.build was defined). Probably worth merging now.