vsivsi / meteor-job-collection

A persistent and reactive job queue for Meteor, supporting distributed workers that can run anywhere.
https://atmospherejs.com/vsivsi/job-collection
Other
388 stars 68 forks source link

job.cancel() with { dependents: false } fails #243

Open gbhrdt opened 7 years ago

gbhrdt commented 7 years ago

I am running job.cancel() inside a running job. Getting this Mongo exception when trying to cancel a job with the dependents option set to false:

W20170616-15:26:44.401(2)? (STDERR) { [MongoError: $and/$or/$nor must be a nonempty array]
W20170616-15:26:44.402(2)? (STDERR)   name: 'MongoError',
W20170616-15:26:44.403(2)? (STDERR)   message: '$and/$or/$nor must be a nonempty array',
W20170616-15:26:44.404(2)? (STDERR)   waitedMS: 0,
W20170616-15:26:44.404(2)? (STDERR)   ok: 0,
W20170616-15:26:44.405(2)? (STDERR)   errmsg: '$and/$or/$nor must be a nonempty array',
W20170616-15:26:44.406(2)? (STDERR)   code: 2 }
I20170616-15:26:45.158(2)? Exception in callback of async function: MongoError: $and/$or/$nor must be a nonempty array
I20170616-15:26:45.158(2)?     at Object.Future.wait (/Users/marius/.meteor/packages/meteor-tool/.1.4.0-1.bycbho++os.osx.x86_64+web.browser+web.cordova/mt-os.osx.x86_64/dev_bundle/server-lib/node_modules/fibers/future.js:449:15)
I20170616-15:26:45.159(2)?     at [object Object]._.extend._nextObject (packages/mongo/mongo_driver.js:1021:47)
I20170616-15:26:45.159(2)?     at [object Object]._.extend.forEach (packages/mongo/mongo_driver.js:1055:22)
I20170616-15:26:45.160(2)?     at [object Object].Cursor.(anonymous function) [as forEach] (packages/mongo/mongo_driver.js:904:44)
I20170616-15:26:45.160(2)?     at JobCollectionBase._idsOfDeps (packages/vsivsi_job-collection/src/shared.coffee:270:9)
I20170616-15:26:45.161(2)?     at JobCollectionBase._DDPMethod_jobCancel (packages/vsivsi_job-collection/src/shared.coffee:710:18)
I20170616-15:26:45.161(2)?     at JobCollection.<anonymous> (packages/vsivsi_job-collection/src/server.coffee:147:22)
I20170616-15:26:45.162(2)?     at packages/vsivsi_job-collection/src/server.coffee:85:40
I20170616-15:26:45.162(2)?     at methodCall (/Users/marius/Dev/cleverpush-worker/.meteor/local/build/programs/server/packages/vsivsi_job-collection.js:72:18)
I20170616-15:26:45.163(2)?     at Job.cancel (/Users/marius/Dev/cleverpush-worker/.meteor/local/build/programs/server/packages/vsivsi_job-collection.js:1321:14)
I20170616-15:26:45.163(2)?     at server/jobs/updateNotificationCounts.js:65:29
I20170616-15:26:45.164(2)?     at runWithEnvironment (packages/meteor/dynamics_nodejs.js:110:1)
I20170616-15:26:45.165(2)?     - - - - -
I20170616-15:26:45.165(2)?     at Function.MongoError.create (/Users/marius/.meteor/packages/npm-mongo/.1.5.45.rebn1z++os+web.browser+web.cordova/npm/node_modules/mongodb-core/lib/error.js:31:11)
I20170616-15:26:45.165(2)?     at Object.queryCallback [as cb] (/Users/marius/.meteor/packages/npm-mongo/.1.5.45.rebn1z++os+web.browser+web.cordova/npm/node_modules/mongodb-core/lib/cursor.js:197:36)
I20170616-15:26:45.166(2)?     at /Users/marius/.meteor/packages/npm-mongo/.1.5.45.rebn1z++os+web.browser+web.cordova/npm/node_modules/mongodb-core/lib/connection/pool.js:443:18
I20170616-15:26:45.167(2)?     at authenticateStragglers (/Users/marius/.meteor/packages/npm-mongo/.1.5.45.rebn1z++os+web.browser+web.cordova/npm/node_modules/mongodb-core/lib/connection/pool.js:399:16)
I20170616-15:26:45.168(2)?     at [object Object].messageHandler (/Users/marius/.meteor/packages/npm-mongo/.1.5.45.rebn1z++os+web.browser+web.cordova/npm/node_modules/mongodb-core/lib/connection/pool.js:421:5)
I20170616-15:26:45.168(2)?     at Socket.<anonymous> (/Users/marius/.meteor/packages/npm-mongo/.1.5.45.rebn1z++os+web.browser+web.cordova/npm/node_modules/mongodb-core/lib/connection/connection.js:294:22)
I20170616-15:26:45.169(2)?     at emitOne (events.js:77:13)
I20170616-15:26:45.169(2)?     at Socket.emit (events.js:169:7)
I20170616-15:26:45.170(2)?     at readableAddChunk (_stream_readable.js:153:18)
I20170616-15:26:45.170(2)?     at Socket.Readable.push (_stream_readable.js:111:10)
I20170616-15:26:45.170(2)?     at TCP.onread (net.js:536:20)
vsivsi commented 7 years ago

Hi, can you provide more information about how to reproduce this error?

Specifically, what do you mean by:

... a job with the dependents option set to false.

Can you please share a working minimal reproduction (a working Meteor project that does nothing but create a JobCollection and then create a job or jobs that trigger this error).

Just from your minimal description and the included stack trace I can't easily intuit the code that will reproduce this error.

Thanks!

gbhrdt commented 7 years ago

Hi @vsivsi

jus see this example:

import { Meteor } from 'meteor/meteor';
import { _ } from 'lodash';

const PushJobs = new JobCollection('pushJobs');

// process 'removeOldJobs'
PushJobs.processJobs('removeOldJobs', { concurrency: 1, workTimeout: 20000 }, (job, cb) => {
  const oldJobsTime = new Date((new Date()).getTime() - 30 * 1000); // 30 secs

  const updateCountJobs = PushJobs.find({ type: 'updateNotificationCounts', status: { $in: PushJobs.jobStatusCancellable }, updated: { $lte: oldJobsTime, $exists: true } }, { fields: { _id: 1 } 
  }).fetch();
  if (PushJobs.cancelJobs(_.map(updateCountJobs, '_id'), {
      dependents: false // prevent checking for dependent jobs that don't exist anyway
  })) {
    PushJobs.removeJobs(_.map(updateCountJobs, '_id'));
  }

  job.done();
  cb();
});

// process 'updateNotificationCounts' dummy job
PushJobs.processJobs('updateNotificationCounts', { concurrency: 1, workTimeout: 20000 }, (job, cb) => {
  // do nothing
  job.done();
  cb();
});

Meteor.startup(() => {
  // start the job that removes the old jobs
  new Job(PushJobs, 'removeOldJobs', {})
      .priority('normal')
      .repeat({
        schedule: PushJobs.later.parse.recur().every().minute()
      })
      .save({ cancelRepeats: true });

  // fill the DB with dummy jobs to be removed
  Meteor.setInterval(() => {
    new Job(PushJobs, 'updateNotificationCounts', {})
      .priority('normal')
      .save();
  }, 5000);
});

This is what I mean by a job dependents option set to false.

By doing that, I can reproduce the error easily, my jobs don't have any dependencies and meteor-job-collection seems to query mongoDB with an empty array. Just leave the example running for about one minute and the error will show.

vsivsi commented 7 years ago

Okay, thanks for that. I see the bug, weird that no one has ever encountered this before. Probably because cancelling dependent jobs is both the default behavior, and the normally correct thing to do. (Why would you want dependent jobs to remain waiting forever after their antecedent is cancelled? )

Anyway, I'll fix this and it will be in the next release whenever that is. In the meantime, just don't do this (set dependents: false) and you'll get no error.

Thanks for the bug report!

vsivsi commented 7 years ago

Fixed on master branch. Thanks again! I'll leave this open until I publish a release with this fix.

gbhrdt commented 7 years ago

Thanks for the fix!

We were creating and also cancelling many jobs at once and we detected a very high CPU usage at our mongo DB. This was probably due to the dependency check queries, which are executed every time when cancelling the job. if you don't use job dependencies, I would recommend anyone to disable that option, if you have many jobs running, because of performance reasons.

vsivsi commented 7 years ago

Oh, that's a good tip! Thx.