grantcarthew / node-rethinkdb-job-queue

A persistent job or task queue backed by RethinkDB.
https://github.com/grantcarthew/node-rethinkdb-job-queue/wiki
MIT License
157 stars 16 forks source link

Add rethinkdb-changefeed-reconnect #77

Open zllovesuki opened 6 years ago

zllovesuki commented 6 years ago

I have an application running v2.1.0 and updated the dependency to v3.1.3, and the queue stops working after some period of time.

It works when I initially start up the queue, the handler can process incoming jobs. However, after some time the handler was no longer called (.process(function(job, done) {}))

Could you walk me through the changes? I've eliminated other dependency in the application.

grantcarthew commented 6 years ago

Hi @zllovesuki As per the Change Log the only breaking change from v2 to v3 was the event signatures. Specifically the error event signatures.

Read more about the error event signatures here: https://github.com/grantcarthew/node-rethinkdb-job-queue/wiki/Event.error

So check your event callbacks.

zllovesuki commented 6 years ago

@grantcarthew I think it's the same error that if a changefeed becomes unavailable, the Queue just silently ignores it and no errors were thrown.

grantcarthew commented 6 years ago

Oh. Not seen that before. There is a package called rethinkdb-changefeed-reconnect that might do the job here. Was wondering if it was worth adding it to the queue dependencies.

zllovesuki commented 6 years ago

It's similar to #72 .

It would be great if you could implement it. :)

zllovesuki commented 6 years ago

Actually, now that I think about it, it may not seem so easy.

My application connects to a proxy node (same application on the same node connects to the proxy on the same node), and each proxy node then connects to the actual data nodes.

If the proxy node drops (which is connection closed), then the liveness probe will pick it up and nothing terrible will happen.

However, if the data node drops, the error is silently ignored. If I use changefeeds by itself (.changes()), RethinkDB actually complains something like "changefeed unavailable".

zllovesuki commented 6 years ago

Right now my workaround is to query server_status and check for changes in time_connected (so the application can tell that a data node was dropped and it should restart).

disarticulate commented 6 years ago

@grantcarthew I used the rethinkdb-changefeed-reconnect almost immediately since the connections can be flaky.

I'd consider it a necessary dependency.

grantcarthew commented 6 years ago

OK, I just had a closer look at the project and will add this to the To Do list. My head isn't in this space at the moment so it will not be done asap. Sorry. If anyone wants to contribute?

I imagine exposing the reconnect package options onto the existing rethinkdb-job-queue connection options would work well.

disarticulate commented 6 years ago

This is the code I had when I inquired about broadcasting queues:

var processChangefeed = require('rethinkdb-changefeed-reconnect')
...
const queueChangeFeedWatcher = (name, q, queueHandler, vm) => {
  // wait for the q ready
  q.on('ready', (queueId) => {
    var r = q.r
    var newTasksFeed = () => { return r.table(name).changes() } // get the changes
    var handleNewTask = ({new_val, old_val}) => { //handle updates to changes
      if ('data' in new_val
        && new_val.status === 'completed'
        && new_val.data.broadcast
        && new_val.data.broadcasters.length > 0) { // watch for a broadcast flag
        var [host,proc,cmd,count,uuid] = new_val.queueId.split(':')
        if (new_val.data.broadcasters.indexOf(hostID()) === -1
            && queueHandler[cmd]) { // check whether processor has already managed
          vm.log('[queueChangeFeedWatcher] > ', new_val.data.broadcasters, host, proc, cmd)
          queueHandler[cmd] (// run handler from handler objects
            new_val,
            (err, result) => {},
            (job, cb) => {}
          )
        }
      }
    }

    var handleError = (err) => {
      vm.log(err.stack)
    }
    processChangefeed(newTasksFeed, handleNewTask, handleError, { 
      changefeedName: name,
      attemptDelay: 30000,
      maxAttempts: 30,
      silent: false,
      logger: console
    })
  })
}

I roughed out what it looks like would take to just swap the reconnect driver.

https://github.com/grantcarthew/node-rethinkdb-job-queue/compare/master...disarticulate:patch-1

grantcarthew commented 6 years ago

Ah, that's where I saw it! From your issue @disarticulate. I couldn't remember.

That master compare is a good start thanks, however as I said before the reconnect options need to be exposed as part of the public API. Again I imagine either properties of the cxn object or a nested object on the cxn object. There should also be an options to disable the reconnect feature by ether a specific switch or lack of options.

For reference I did some research into where we are at with respect to reliable change feeds. Here is the main discussion as far as I can tell: https://github.com/rethinkdb/rethinkdb/issues/3471

It doesn't look like it will be anytime soon so the rethinkdb-changefeed-reconnect module looks like the way forward for now.

grantcarthew commented 6 years ago

I just published a minor change updating dependencies. I'm not adding features at this time. If someone is interested in working on this feature, please do so. I see this feature as the most useful addition to the queue.

sagivf commented 6 years ago

@grantcarthew Can you help clear up the issue here? Which changefeed might need restarting?

Is it the one mentioned here - Queue.changeFeed which should only affect custom client event handling functionality if I understand correctly?

Or is it something basic to rethinkdb-job-queue which might make job processing stop working? Thanks 😄

grantcarthew commented 6 years ago

Hi @sagivf, been a while. I'm not using RethinkDB at the moment and have been working on another package called perj.

The plan is to add rethinkdb-changefeed-reconnect as a dependency to the job queue and replace the change feed with connections mannaged by rethinkdb-changefeed-reconnect.

The code that needs to be updated would be here: https://github.com/grantcarthew/node-rethinkdb-job-queue/blob/master/src/queue-db.js#L25

I'm busy now so can not make the the changes needed.

Adding you as a collaborator.

https://github.com/grantcarthew/node-rethinkdb-job-queue/invitations

grantcarthew commented 6 years ago

Added you as a maintainer for the NPM package also @sagivf.

sagivf commented 6 years ago

Thanks @grantcarthew, been to long, good to hear from you :)
I will try to dive into the code next week.

Just to get started can you clarify how changefeeds are used in rethinkdb-job-queue? Are they just for events the user wants to tap into? Or is it core to the way the distributed queuing works? Is the management and communication of the distributed jobs done via some sort of pulling?

I realise you are not currently dedicated to this project, but you've built a great tool here, one which I use a lot an example. So I hope you stay around in some capacity.

I now the whole RethinkDB situation has been discouraging but just so you know there is an attempt to fork and revive it (https://spectrum.chat/rebirthdb) It seems to be moving along nicely, thelinuxlich has been leading it nicely and we are doing interesting things like rebuilding the admin in electron and separating it from the main repo along with some other important cleanup and plans down the road.

If this effort succeeds I would really love for https://github.com/grantcarthew/node-rethinkdb-job-queue to take a front stage as it is really useful and works great (great docs to 😄).

See you around buddy.

thelinuxlich commented 6 years ago

Also, since you probably dived deeply in changefeeds, you can help us format what needs to be done for 2.6, where part of the focus may be directed in providing changefeed reliability and performance.

grantcarthew commented 6 years ago

Change Feeds

It's great to hear that you are happy using rjq. I don't get much feedback apart from stars.

To answer your questions @sagivf, change feeds are core to how the distributed queue works. Polling is only carried out by the Queue Master. All other workers rely on the change feed. If the feed stops, they stop.

They have been implemented in a very simple way though. Rather than setup a heap of feeds, there is only one change feed applied to the whole table: https://github.com/grantcarthew/node-rethinkdb-job-queue/blob/master/src/queue-db.js#L28

This causes all the Queue objects to receive changes for any changes on the queue. The changes are then processed locally in the queue-change.js file. I have no doubt that there is a better way of doing this.

With this in mind, you can see the benefit of switching to an automatically reconnecting feed project.

RethinkDB Project

Yes I had seen the communications by @thelinuxlich and am a little dissapointed the guys with RethinkDB collaborator permissions can't add an active user like @thelinuxlich in to help. I even tweeted as such asking for help: https://twitter.com/GrantCarthew/status/996555096426074112

I moved away from RethinkDB because of the ghost town that is the original repository. I tried out PostgreSQL for a while and have now moved to MongoDB. This really hurts as you can imagine. I spent six months building this Job Queue with the goal of using it extensively.

I can't really help out with the RethinkDB fork though @thelinuxlich. I would love to spend my time working on such projects but this is not my primary job. The time I spend on open source is my own time. Not surprisingly it doesn't generate any income to help with life. Sorry.

thelinuxlich commented 6 years ago

@grantcarthew no problem man, any time you can spend helping us test would be great, take your time ^^

sagivf commented 6 years ago

I think the project wasn't adopted much due to: a) timing - about the same time as the start up closed. b) not enough PR.

@thelinuxlich I think we can help on the PR side as Job processing (and Job scheduling) is something most stacks need very early on and it's great when it just fits into your existing DB stack.

thelinuxlich commented 6 years ago

Sure, let's fork and bring the project to the org

thelinuxlich commented 6 years ago

I think this can get better with the write hooks feature in 2.4

sagivf commented 6 years ago

@thelinuxlich you mean use the right hooks instead of changefeeds? @grantcarthew Would you like to move it into the org or would you prefer it under your account?

thelinuxlich commented 6 years ago

Yeah, use write hooks as state machine for jobs

grantcarthew commented 6 years ago

I can move it guys. What's the address?