Ensure the IO always gets closed, exactly once

matthewd commented 8 years ago

This solves a few problems together:

the one we actually hit, was that (per @nurse) the IO is created without autoclose: false -- which could lead to it eventually closing an unrelated file if the fd is reassigned before the next GC. I think this is also the problem described in #43.
as the IO is part of the published interface, allowing it to think it's open when the fd has been closed elsewhere is also risky... again, if the fd is reused, operations on the IO could affect some other open file.
for further robustness, we can use a finalizer to ensure the fd gets automatically closed if the Notifier goes out of scope without an explicit close call. This does necessitate an extra wrapper class (to outlive the finalization of the parent Notifier, and track whether the fd has been closed yet), but it seems a fairly modest increase in [purely internal] complexity.

Thanks!

:heart: :green_heart: :blue_heart: :yellow_heart: :purple_heart:

chancancode commented 8 years ago

cc @e2

pixeltrix commented 8 years ago

it's that the finalizer gets called with a parameter

Well, that's not very clear from the documentation at: http://ruby-doc.org/core-2.2.3/ObjectSpace.html#method-c-define_finalizer

:confused:

e2 commented 8 years ago

ObjectSpace used to be expensive on JRuby, so some people may complain about that part. (I'm not sure, though).

Other than that, this is awesome (especially wrapping the fd)!

Great work guys!

nurse commented 8 years ago

As far as I understand, what the wrapper does looks the same thing what IO object does.

To avoid finalizer magic, using IO.new with autoclose as it is now, and io.close intead of Native.close(@fd) or setting @io.autoclose as false in Notifier#close seems sufficient.

matthewd commented 8 years ago

@nex3 sorry it took me so long to get back to this :pensive:

I've moved the Handle class to a new file, documented the instance methods, and rearranged things as @nurse suggested, so MRI will lean entirely on the built-in IO instance (previously only constructed on the first call to to_io or readpartial, but surely every caller would immediately hit the latter anyway), and the new class is just used to shim in the same behaviour for JRuby.

https://travis-ci.org/rails/rails/builds/143159530 shows a bunch of successful runs using this branch (FIX=1), with a much spottier record for those running with 0.9.7.

TBH, I've not looked into why 2.3.1 seems so consistently happier than 2.2.5, but I believe this change is still necessary for correctness. My guess is it's something incidental: GC occurs in a different place, an internal algorithm has changed so that particular sequence doesn't hit an fd conflict, etc.

matthewd commented 8 years ago

@e2 iterating ObjectSpace is traditionally expensive, but I'm not sure whether that's equally true for defining finalizers. They're certainly not cheap, but in this case, I believe they're pretty much the only tool for the job.

ioquatix commented 7 years ago

Can you please rebase on master and I will review.

matthewd commented 7 years ago

Rebased.

I ignored the (mutually conflicting!) changes from be4cf88b4dab0bb3a899b16a0df8a6fcf1502e16 and 99d2101eaa71adc40737ae45cb03d3ae86fdad4a, because I believe those are both attempts to hack around the real problem: in a properly behaving application, IOs don't suddenly become closed out of nowhere.

If something is going rogue and closing random IOs (as this library does, without this PR), it's fairly desirable that we get errors and thus notice it; not all random IOs belong to us.

ioquatix commented 7 years ago

Awesome, thanks for this.

You are absolutely right with your logic.

matthewd commented 7 years ago

:heart:

I'm incidentally similarly suspicious of the BADF rescue, but that's been there long enough that I wasn't feeling brave enough to kill it just yet.

ioquatix commented 7 years ago

I think the key thing is to write some specs that cover the acceptable behaviour. Once we've got something like this, we can be a bit more heavy handed.

matthewd commented 7 years ago

@ioquatix did this get force-pushed out of existence? :confused:

Neither the commit nor its changes seem to be on either branch.

ioquatix commented 7 years ago

did this get force-pushed out of existence? 😕

Yes. My apologies, this PR, caused both guard and listen test suites to fail. There were a couple of messy PRs so I ended up cherry picking stuff for the 0-x-stable branch and rolling back a bunch of changes to master.

I would like to re-apply this PR, but we need to make sure it doesn't break listen or guard. Right now the test process is manual, but I'm planning to make it automatic, by having the test process checkout those gems, and run their test suites, on travis.

There is one more thing to consider.

The only reason for this is because IO.from_fd was broken on JRuby right? - if we use IO.from_fd, and set autoclose, it will have the same effect as the original PR. Well, it turns out JRuby (apparently) now supports #from_fd. It's a tricky situation, but this PR in it's current form may not be a great solution, it might just be simper to use IO#from_fd.

Thoughts?

ioquatix commented 7 years ago

At the very least, I suggest the first step should be to make a spec that fails. I know this isn't totally relevant since it's not an IO instance but an FD that's being leaked, but this is what I wrapped all my async reactor tests with: https://github.com/socketry/async-rspec/blob/master/lib/async/rspec/leaks.rb

matthewd commented 7 years ago

The Handle class was a work-around for JRuby not supporting IO.from_fd, yes -- but the change to use it on MRI was needed to prevent double-close: I noticed this was missing because the rails/rails suite started failing again [on MRI].

ioquatix commented 7 years ago

@matthewd Yeah, and I'm so sorry for puling the rug out from under you. Let's try to get this PR back on track.

Do you think you can rebase your work on master?

A spec to catch the failure and future regressions would be awesome.

Test by installing this gem, and then running listen and guard test suite.

I've invited you as part of the team for this gem so you can commit to master as you see fit, or make a PR to discuss further.

matthewd commented 7 years ago

No worries! Yeah, I'll poke around, and see if I can repro the problem you hit.

zachad commented 7 years ago

We're running into this as a problem as well when combined with Puma and under moderate load. Because puma creates a unix socket FD under the hood, it increases the chances of GC closing the wrong FD on finalization.

Since this was merged but then lost, is there another open issue to track for this bug?

ioquatix commented 7 years ago

Right now this is back to @matthewd but if I have time I may look into it.

matthewd commented 7 years ago

Here's where I got to so far: https://github.com/matthewd/rb-inotify/commits/temp

That branch contains #76 and #77, then a test showing the problem (e1757c766503ab8638bea79dad40d3c23f741d6f), and finally a direct cherry-pick from this PR (56716f8a860ff510c7f877979ca15b222c6a643e).

The Travis runs show a pass for the combined tests, the expected failure on the new test, and then lots of failures with this change.

So I believe I've got both a test for the problem we're trying to fix, and a repro of the problem @ioquatix described.. I just haven't had a chance to now dig in and understand what's going wrong.

guard / rb-inotify

Ensure the IO always gets closed, exactly once #49