vsivsi / meteor-job-collection

A persistent and reactive job queue for Meteor, supporting distributed workers that can run anywhere.
https://atmospherejs.com/vsivsi/job-collection
Other
387 stars 68 forks source link

workers hanging forever #167

Closed greatramu closed 8 years ago

greatramu commented 8 years ago

hello Vaughn, Thanks for the excellent package. We are finding the jobs collection and the files collection extremely helpful in our project. Am using the jobs for handling thumbnail generation, with code heavily inspired with your thumbnail demo program.

The problem is, in some trivial cases fc.upsertStream in a job worker never calls the callback. The worker callback and job.done() are being called in this callback of the fc.upsertStream. So whenever there is such case, the worker goes in a limbo state and there is no way to get back the worker back to the pool. Am using the jc.processJobs() with a concurrency of 4. If we have 4 such cases, ll the 4 workers are blocked and basically our job queue stops.

I tried two ways => manual cancel + restart and timeout on jobqueue . Both do not work as the job reaches "ready" state, but there are no workers available to process them (the workers are still waiting for the callback in the previous job run instance). The only restart the server to kill all these zombie workers.

I would really appreciate if you can show some pointers for the fix. Thanks!

vsivsi commented 8 years ago

Hi, jc.processJobs() is an async queue, in an async system, and as such the worker must eventually call the callback it is provided. There's no way around this. It's fundamentally no different that using the popular node.js async library. Any async function provided with a callback is responsible for eventually calling it, or things will break, by design. This includes catching all throws, etc.

So from your description, I see no bug/issue for job-collection.

You also mention that fc.upsertStream() seems to have a failure mode where it is not calling its callback. If you can reproduce that, please file a bug over there with a clear reproducing case (preferably a sample app that demonstrates the issue) and I'll be happy to look into it.

Thanks! -V

vsivsi commented 8 years ago

See also:

https://github.com/vsivsi/meteor-job-collection/issues/164 https://github.com/vsivsi/meteor-job-collection/issues/158

greatramu commented 8 years ago

Thank you! I figured it out. My issues were mainly due to the wrong usage of globals.