vsivsi / meteor-job-collection

A persistent and reactive job queue for Meteor, supporting distributed workers that can run anywhere.
https://atmospherejs.com/vsivsi/job-collection
Other
385 stars 68 forks source link

JobQueue might stop working when Meteor is unavailable #203

Open lightpriest opened 7 years ago

lightpriest commented 7 years ago

TL;DR: I'm using Job.processJobs from an external node application (worker), with the DDP/ddp-login client. It works as expected, but if the DDP server is unavailable for longer than the getWork interval, it stops processing new jobs.

It took me a while to figure out what's going on, but I think I got it. I first noticed that while developing a Meteor application. When it restarted for code changes, the worker sometimes stopped processing new jobs. When this happens, issuing a "soft" shutdown never seems to end, although jobQueue.length, jobQueue.running and jobQueue.idle are all 0.

I tracked it down to the point where meteor-job calls the DDP method "getWork". If the DDP server is down at this point, the callback is never fired and "getWork" enters a state where it cannot run again (@_getWorkOutstanding is not false'd).

This seems like a larger issue with node-ddp-client method calls when the server is unavailable. The following snippet would not fire the "test callback" when the server is down.

const DDP = require('ddp');

const ddp = new DDP({ host: '127.0.0.1', port: 3000, useSockJs: true });

ddp.connect((connectError, wasReconnect) => {
  if (!wasReconnect) {
    setInterval(() => {
      console.log('calling test');
      ddp.call('test', [], (err, result) => { console.log('test callback', err, result); });
    }, 2000);
  }
});

One workaround suggestion is to pause all job queues when ddp is disconnected, and resume them when it reconnected.

const jobQueues = [];

ddp.connect((connectError, wasReconnect) => {
  jobQueues.forEach(q => q.resume());
  if (!wasReconnect) {
    // ... setup job queues
  }
});
ddp.on('socket-close', (code, message) => {
  jobQueues.forEach(q => q.pause());
});

Come to think of it, this is more of an issue in how node-ddp-client behaves when the DDP server is down, because meteor-job handles errors in the method callback properly.

vsivsi commented 7 years ago

Hi, thanks for reporting this behavior. When running in the pure node.js environment, meteor-job is completely dependent on the node-ddp package for communication with the server, and to the extent that it doesn't handle the connection state well, meteor-job (or any other package built on top of it) will suffer.

Unfortunately, in my experience, there is no one organization or author willing or able to step-up and actively maintain that package. I have the permissions to commit and publish updates to that package on github and npm, and I occasionally will respond to high priority requests, particularly when they come in the form of a good quality pull-request. But unfortunately, as seems to be the case with all of the other contributors over there, I don't have time to go through that package and do everything that really should eventually be done.

If you can identify a focused set of changes to the ddp npm package that will help this issue (ideally as a PR) I would be happy to help integrate and test them and publish the results to npm for everyone's benefit.