iriscouch / follow

Very stable, very reliable, NodeJS CouchDB _changes follower
Apache License 2.0
393 stars 82 forks source link

Error: Cannot find wait timer #44

Closed isaacs closed 10 years ago

isaacs commented 10 years ago

Occasionally this happens:

Error: Cannot find wait timer
    at Feed.got_activity (/home/node/node_modules/npm-fullfat-registry/node_mo
dules/follow/lib/feed.js:355:21)
    at Feed.on_couch_data (/home/node/node_modules/npm-fullfat-registry/node_m
odules/follow/lib/feed.js:412:8)
    at Changes.handle_confirmed_req_event (/home/node/node_modules/npm-fullfat-registry/node_modules/follow/lib/feed.js:308:30)
    at Changes.EventEmitter.emit (events.js:95:17)
    at Changes.emit_changes (/home/node/node_modules/npm-fullfat-registry/node_modules/follow/lib/stream.js:223:12)
    at Changes.write_continuous (/home/node/node_modules/npm-fullfat-registry/node_modules/follow/lib/stream.js:176:8)
    at Changes.write (/home/node/node_modules/npm-fullfat-registry/node_modules/follow/lib/stream.js:124:17)
    at Request.ondata (stream.js:51:26)
    at Request.EventEmitter.emit (events.js:95:17)
    at IncomingMessage.<anonymous> (/home/node/node_modules/npm-fullfat registry/node_modules/follow/node_modules/request/request.js:840:12)

??

davglass commented 10 years ago

We are seeing this too while trying to replicate the npm registry.

Why is this a fatal error? https://github.com/iriscouch/follow/blob/master/lib/feed.js#L354-L355

  if(! self.pending.wait_timer)
    return self.die(new Error('Cannot find wait timer'))

  clearTimeout(self.pending.wait_timer)
  self.pending.wait_timer = null

Since you are clearing the timeout directly after checking if it's there, shouldn't this be:

  if(! self.pending.wait_timer)
    clearTimeout(self.pending.wait_timer)

  self.pending.wait_timer = null

Doesn't make a lot of sense to me to die if it's not there just before you are going to clear it anyway.

jcrugzz commented 10 years ago

@davglass im guessing the assumption is that there is something wrong if that timer has already been cleared or does not exist.

Regardless this module will be refactored as a wrapper around my changes-strem module once i get some better test coverage. You can checkout the current wip in the refactor branch. This will solve some of the inconsistencies.

isaacs commented 10 years ago

We still see this pretty often in our production followers. It doesn't happen often enough to throw the worker into a tailspin, and we use seq-file to restart right where we left off. But still, kinda annoying.

jcrugzz commented 10 years ago

ok so the root of this problem is actually due to a new request being created while the feed is paused because the wait_timer expires and triggers an on_timeout() -> retry() in terms of function calls. This causes the resume to cause this particular failure in got_activity().

@davglass removing that line does seem reasonable as a stop gap so I will do some testing and publish a new version.

jcrugzz commented 10 years ago

this is fixed in v0.11.1. Removing that actually worked.

davglass commented 10 years ago

:thumbsup: