Open mjq opened 10 years ago
@mjq do you have any sample code that reproduces this? thats the best place to start for a test
Sorry, sure. Simplified, it's:
var follow = require('follow');
var db = '...';
var feed = new follow.Feed({db: db, include_docs: true});
feed.on('change', function(change) {
console.log('got change %d', change.seq);
});
feed.on('error', function(err) {
console.log('got error %s, restarting in 5s', err.message);
setTimeout(function() {
console.log('restarting');
feed.restart();
}, 5000);
});
feed.start();
Normally, the logs would look like
got change 5
got change 6
got change 7
But, if the first attempt to reach the database times out but responds shortly after, you'll see
got error "Timeout confirming database: <db name>", restarting in 5s
restarting
got change 5
got change 5
got change 6
got change 6
got change 7
got change 7
@mjq this is fascinating, I've never seen this happen. Destroy_req, should be called by the die function but it seems like there is a race condition leaving two requests? Ill have to dig deeper on this when i have a minute
@jcrugzz die
destroys self.pending.request
, but the request in confirm is a local variable, so if it isn't destroyed in confirm
, nothing will (or so it seems to me).
A simpler bug to test, repro and fix may just be:
confirm
takes longer than the timeout, butdb_response
is called anyway (even though the timeout killed the feed).Since db_response only applies to the success case, that alone is weird/wrong behaviour, and just by fixing that (by e.g. destroying the request in the timeout fn), it should prevent the double-listener stuff.
re: race conditions: We've got a single process simultaneously following an ever-changing set of a few thousand databases (with all those databases on the same CouchDB box). So, when requests to that box start stalling... well, if there's a race condition to be found, we'll find it, heh.
I'm giving this patch a trial by fire right now, but I don't know how long it will take for us to trigger the bug again.
@mjq gotcha, this is before it is piped into the changes-stream. Let me know if you can reproduce that but that looks like a valid fix. Super edge case but I can see the potential for it happening.
@mjq @jcrugzz Are you going to fix this?
+1 - This is still an issue
Here's the relevant code block to follow along.
In
Feed.prototype.confirm
, a request is made to check if the DB is reachable, and a timeout is set to detect a slow response from Couch. If the timeout is hit, the Feed is killed (self.die
is called). But, the request object isn't destroyed. That means that if Couch responds after the timeout, the happy path callbackdb_response
still gets called.Normally, this isn't that noticeable, since the Feed object is
dead
and everything short-circuits. But, if the user calledrestart
on the feed in response to the error,dead
will be false, and the Feed ends up getting set up twice (once in response to the timed-out request, and once due torestart()
. This results in everychange
event getting called twice.The fix would seem to be adding
destroy_req(req);
here beforedie
ing. I haven't figured out how to write a test for this though. Any ideas?