vsivsi / meteor-job-collection

A persistent and reactive job queue for Meteor, supporting distributed workers that can run anywhere.
https://atmospherejs.com/vsivsi/job-collection
Other
388 stars 68 forks source link

Is depends() working properly? #93

Closed skosch closed 9 years ago

skosch commented 9 years ago

Hi, first off, job-collection is fantastic, thanks for sharing it with the world!

I have two jobs that need to run in sequence. The input data to the second job depends on the result of the first. I'm creating the second job in the save() callback of the first. However, the second job is called before the first one finishes, and so it's not getting the result it needs.

I haven't found any point in the code that actually makes use of the .depends field—if such code exists, could you point me to it so I may understand it?

There is a test case:

Tinytest.addAsync 'Dependent jobs run in the correct order', (test, onComplete) ->
  jobType = "TestJob_#{Math.round(Math.random()*1000000000)}"
  job = new Job testColl, jobType, { order: 1 }
  job2 = new Job testColl, jobType, { order: 2 }
  job.save (err, res) ->
    test.fail(err) if err
    test.ok validId(res), "job.save() failed in callback result"
    job2.depends [job]
    job2.save (err, res) ->
      test.fail(err) if err
      test.ok validId(res), "job.save() failed in callback result"
      count = 0
      q = testColl.processJobs jobType, { pollInterval: 250 }, (job, cb) ->
        count++
        test.equal count, job.data.order
        job.done()
        cb()
        if count is 2
          q.shutdown { level: 'soft', quiet: true }, () ->
            onComplete()

... however, this only makes sure that the jobs run in the correct order, not that the first job is completed before the second one starts. Am I missing something, or is the actual depends() functionality simply not there?

PS. Note that it does work if I disregard the depends() function and instead set up the second job within a Tracker.autorun() block that checks for status === "completed". I'll do that for now; it'd be cool though if that wasn't needed.

vsivsi commented 9 years ago

Hi, thanks for your question. Depends is not one of the more heavily used features of job-collection, and there's always the possibility of a bug, but in my experience it does work as advertised.

... however, this only makes sure that the jobs run in the correct order, not that the first job is completed before the second one starts.

So long as the first job does no work after calling job.done(), then "running in the correct order" and "the first job is completed before the second one starts" are the same thing, right?

Using the code above as an example, job2 has a dependency on job, and will not become 'ready' to run until the job.done() is called and executed on the server. You can convince yourself that this works by adding a call to job.delay(1000) in the above code. That gives job2 every opportunity to run first, but it won't because of the dependency on job. In fact, I just added this to the test on the dev branch, and everything is still good: https://github.com/vsivsi/meteor-job-collection/blob/dev/test/job_collection_tests.coffee#L202

If you have some simple repro code that demonstrates a dependent job running before its antecedent job has completed, please share it and I'll be happy to take a look and see if there's a bug somewhere that has thus far evaded detection.

vsivsi commented 9 years ago

Also, you can see that dependencies are resolved within the job.done() call here: https://github.com/vsivsi/meteor-job-collection/blob/dev/src/shared.coffee#L1036-L1073

skosch commented 9 years ago

Holy cow, that was a quick reply. Thanks (also for your link to shared.coffee, which I had missed somehow). Things are clearer now.

I'm not sure I can tease a working minimal example out of my code right now, because I'm using autorun everywhere now. Here is what I had, more or less (untested):

var job1 = new Job(allJobs, "Type1", {
  someData: "abc"
}).save(function(err, jobId) {
  var job2 = new Job(allJobs, "Type2", {
    someStuff: allJobs.findOne(jobId).result
}).depends([job1]).save(...)

On second look, the problem in my particular case is that the creation of the data object I'm passing to job2 happens right away. Even though job2 may run strictly after job1, it nevertheless gets stale (i.e. nonexistent) data. Can this be circumvented somehow?

vsivsi commented 9 years ago

Quick because I just happen to be putting the finishing touches on the 1.2.0 release of job-collection right now, so if there's a bug, I want to kill it for this release.

I've annotated your code above to explain what's going on...

var job1 = new Job(allJobs, "Type1", {
  someData: "abc"
}).save(function(err, jobId) {
  var job2 = new Job(allJobs, "Type2", {
    someStuff: allJobs.findOne(jobId).result // job1 has almost certainly not run yet at this point!
}).depends([job1]).save(...)

The callback to job1.save() is invoked as soon as that job is written into the server database. The job itself runs at some point in the future (milliseconds or years...). So in your code above, there is no result to find (the job exists in the database, but it hasn't run yet).

There are two strategies to solving your problem, besides the autorun approach, which I don't recommend...

  1. Write the _id of job1 in the data object of job2, and write job2's worker to perform the find when it runs. At that point it is guaranteed to exist (assuming job1.remove() hasn't been run...)
  2. Don't use .depends() and instead write job1's worker to create job2 and save it to the server. That way the .find() can be eliminated because at that point job1's worker knows the result and can just directly write it into job2's data.

Anyway, if the code you provided above is representative, I think .depends() is working fine, you just need to figure out the mechanism you want to use to pass job1's result to job2.

skosch commented 9 years ago

Yep, that agrees with what I wrote above then. The autorun is definitely slow, so I'd rather avoid it. I think the best solution will be to just combine the two jobs into one.

Thanks for your help. :)

vsivsi commented 9 years ago

If the work of both jobs needs the same resources and doesn't require some delay, then combining jobs is a fine solution.

Splitting work into multiple jobs is most valuable when:

  1. the jobs need to run on multiple physical machines: e.g. job1 splits a piece of work into 100 smaller independent subtasks. It creates 100 jobs to work on the subtasks (which can run on 100 machines if necessary). It also creates a merge/cleanup job that depends on the 100 subtask jobs. Basically a classic map/reduce pattern
  2. when some part of the work needs to be delayed: e.g. job1 does some work, and then 1 hour later job2 finishes it.

Anyway I hope that helps.

vsivsi commented 9 years ago

BTW, I'm going to check out Crimson for some work I'm about to start working on... I love random connections.

skosch commented 9 years ago

Ha, pleased to hear that :) I combined the two jobs and everything works now.