vsivsi / meteor-job-collection

A persistent and reactive job queue for Meteor, supporting distributed workers that can run anywhere.
https://atmospherejs.com/vsivsi/job-collection
Other
388 stars 68 forks source link

Is it possible that the same job is running on two different containers at the same time? #267

Closed vfonic closed 6 years ago

vfonic commented 6 years ago

I've got a strange error that I can only reproduce on live site where I have more than one container running my meteor app.

I'm "observing" the Jobs and RepeatingEvents collections. When document is added, I trigger the 'createRepeatingEvent' queue. I've posted shortened version of my code and added some extra comments.

What's happening in // do some work is that I query for the last event created and use its date to create next event instance. What happens is that, sometimes, I end up with "next event instance" being created twice. I think it might be some race-condition in which both containers query for last event, get the same document and then create next event instance for the same same future date. Other times, one container queries for last event and creates new instance before the second container queries. Is this possible? If not, what could be the explanation? How can I make the queue to only be processed by one container?


const Jobs = new JobCollection('queue');
export default Jobs;

if (Meteor.isServer) {
  Jobs.startJobServer();

  // remove all jobs on server start to prevent duplicate jobs for
  // the same repeating event to be running
  Jobs.remove({});

  const creatingRepeatingEventsQueue = Jobs.processJobs('createRepeatingEvent', { pollInterval: 1000000000 }, (job, cb) => {
    // do some work

    job.done(message);
    job.remove();

    if (moreWork()) {
      new Job(Jobs, 'createRepeatingEvent')
        .retry({ retries: 5, wait: 5 * 1000, backoff: 'exponential' })
        .save();
    }

    return cb();
  });

  Jobs.find({ type: 'createRepeatingEvent', status: 'ready' }).observe({
    added() { creatingRepeatingEventsQueue.trigger(); },
  });

  RepeatingEvents.find().observe({
    added() {
      // crete a new job and run it immediately
      new Job(Jobs, 'createRepeatingEvent')
        .retry({ retries: 5, wait: 5 * 1000, backoff: 'exponential' })
        .save();

      // create a new job and schedule it to run once every 24h
      const time = new Date().toTimeString().split(' ')[0];
      const pattern = Jobs.later.parse.recur().on(time).time();
      new Job(Jobs, 'createRepeatingEvent')
        .repeat({ schedule: pattern })
        .retry({ retries: 5, wait: 5 * 1000, backoff: 'exponential' })
        .save();
    },
  });
}
vsivsi commented 6 years ago

I assume that RepeatingEvents is just a vanilla Meteor collection?

If so, then yes, if you have two containers each observing that same collection, then they should both get triggered with an added event on it. And that will lead to there being two createRepeatingEvent jobs (well four actually, since each added() creates two).

And some of the time those jobs will run at the same time (one in each container) and the find in // do more work will succeed for both of them.

Race conditions are tricky, as I'm sure you know. Anytime you have a find(), do some stuff, write(), then whatever's in that write() had better be idempotent, or you are going to get into trouble if you have multiple processes that can be processing that same find in parallel.

vsivsi commented 6 years ago

Just to clarify...

In this case though, if I'm understanding correctly, it seems that the problem is that you have multiple containers observing RepeatingEvents and each of them will create createRepeatingEvent jobs for each event. If you somehow don't do that... then the race will disappear.

vfonic commented 6 years ago

Thanks for the reply!

Is there any way to use only one container to run jobs on?

EDIT:

If you somehow don't do that... then the race will should disappear.

Exactly. :)

EDIT 2: I guess what prevented this on server startup is Jobs.remove({}); which removes all the jobs, including those from the other container.

vsivsi commented 6 years ago

Not sure I understand... You can decide where to run Jobs.processJobs( ... ).

If you only run that in one container, then jobs will only run in that one. But again, I think the core issue here is you are creating multiple createRepeatingEvent jobs for each observed add on RepeatingEvents.

So if you can just run this code on one container, you should be golden.

  RepeatingEvents.find().observe({
    added() {  ...  }

This really doesn't seem to be a JobCollection question per se, more an issue with Collection.find().observe() running in multiple containers.

vfonic commented 6 years ago

You're right. I thought maybe JobCollection has some mechanism to easily run code on one container only. I'll do some googling. Thanks for help!