Navvy Concurrency - Githubissues

mrrooijen commented 13 years ago

Hi,

When running multiple Navvy workers concurrently, it seems like workers are able to pick up each others jobs when invoked exactly at the same moment.

For example, say I have 25 jobs queued, and I spawn 4 workers at the same time, it seems that 2 workers are taking on the first job in the queue, and the other 2 are taking on the second job in the queue (instead of just picking up 4 jobs, one per worker).

I already saw that Job.next(1) will ensure that only the next available job will be fetched and picked up. However, I believe that since I spawn multiple workers at the exact same time, that the first worker picks up the job and marks it as started, but before it can mark it as started, the other worker already pulled down that same object, meaning that it assumes it can also process that job. (So then it gets marked as started twice basically, by two different workers).

Also, I noticed this in for example the Mongoid extension:

def self.next(limit = self.limit)
  where(:failed_at => nil).
  where(:completed_at => nil).
  where(:run_at.lte => Time.now).
  order_by([[:priority, :desc], [:created_at, :asc]]).
  limit(limit).to_a
end

Is there a reason why you don't include a where(:started_at => nil)? Judging by this, and the fact that you can pass in a limit to pull in an array of objects to process, it seems like Navvy wasn't meant to run multiple workers concurrently, or am I missing something?

In any case, by using Job.next(1) in the rake task, and adding the where(:started_at => nil) to the Mongoid extension, it partially solves the problem, but there will still be times where a single job will be processed by 2 or more workers, which of course is not what you want and will cause issues. Looking for a good solution.

I guess the main question is: Can (or should) Navvy be able to run multiple workers concurrently?

Let me know, thanks!

mrrooijen commented 13 years ago

Digging in to Delayed Job, it looks like they lock jobs before they are even fetched. https://github.com/collectiveidea/delayed_job/blob/master/lib/delayed/backend/base.rb (see method: def reserve on line 36)

This way they can ensure a job has not yet been picked up, cause it'll reserve it for a worker (by process id). Then the worker will query something like where(:pid => Process.pid) to fetch the "already reserved" job and will avoid conflict with other workers.

What do you think?

Here's the Delayed Mongoid extension, also with a reserve method: https://github.com/collectiveidea/delayed_job_mongoid/blob/master/lib/delayed/backend/mongoid.rb

jeffkreeftmeijer commented 13 years ago

Hi Mike,

Navvy doesn't reserve jobs right now, since it was originally built to process jobs using a single worker. It would be cool to support it, but it's not on my priority list right now. I would love to do some work on this together, since you apparently looked into this already. Could you set up something single (just supporting one adapter should be fine) so we can discuss and decide on the best way to implement this?

Thanks for reporting! :)

mrrooijen commented 13 years ago

That's cool. I'm probably releasing the gem that needs this for both DJ as well as Navvy since I already wrote 90% of it with DJ in mind since I originally thought Heroku had it's own version of DJ for some reason and couldn't run any other worker until I read that it invokes anything on rake jobs:work, then looking at your source I noticed your Heroku comment. But at that time it was almost more or less finished.

The nice thing is that it's pretty easy to port from DJ to Navvy since it's quite generic, I already had a working Navvy branch actually, but due to the concurrency issue I left it at that, and once we fix the concurrency issue I'll finish it up. The nice thing about Navvy is that all the support for various backends (mongoid, ar, sequel) are all in a single gem, rather than scattered like DJ, so all the methods are abstract. Job.next will just get the next job regardless of what mapper is being used, which greatly simplifies stuff on my end. I was able to toss away my rewritten backends for example which I need for DJ.

Anyways, a brief background on what I'm trying to accomplish with my gem:

It's purely for Heroku users that have web applications that require either barely any or a lot* of processing power. Once the gem is set in your Gemfile it'll hook in to Delayed::Job or Navvy::Job to inject some code. Then once your application is deployed to Heroku, it'll start at 0 workers. Once a new job gets enqueued, it'll check to see how many jobs there are, and depending on what you configured, it'll spawn the necessary amount of workers to processes these jobs concurrently. Once that's done, it'll reduce the workers back to 0 immediately, saving a shit load of money. Cause really, what do we need idle workers for?! Not very "Cloud-like", pay only for what you use.

So you can set a job/worker ratio in the configuration file. Example: "Spawn 1 worker if there are 1-14 jobs", "Spawn 2 workers if there are 15-29 jobs", "Spawn 5 workers if there are 30 or more jobs".

Running multiple workers concurrently isn't more expensive since workers are pro-rated to the second, it's actually only better cause even though you pay the same amount of money as a single worker would, finishing all the jobs by itself, when you run 5 workers concurrently, it'll cost 5 times more, but it'll process 5 times faster, meaning you are saving time without spending more money. So there's only benefit, no loss due to the fact that it's pro-rated by the second.

So every time the queue gets adjusted it'll check the configuration, and compare it to the current amount of queued jobs, and based on that, spawn the proper amount of workers (defined by you). This is sweet cause if you have some really long running processes, you might want to spawn 5 workers for only 10 jobs since a job might take a minute, or if jobs generally take around 3-5 seconds, then you could set it to just 1 worker since it'll run through the 10 jobs quite fast. (So yeah, depending on the application you can configure it).

The nice thing is that, in most cases, people just want to process data for maybe 2 hours a month for low-traffic applications. Rather than having 1 worker running 30 days straight, it'll actually only run 2 hours, meaning your monthly bill isn't $36, but ±$0.10 a month. :) People will more likely use Heroku, even for smaller apps when this is the case cause almost every application wants workers, but not for $36/month if it only processes 2 hours. That's like spawning 2 Dyno's for 100 visitors a month. Pointless and a waste of money.

In any case, the first version will be released probably tomorrow after I finish cleaning it up and thinking of a name. Eventually I'd like to port it to Navvy, so Navvy users can also benefit from this.

I'll also look in to Navvy later to see what can be done about "locking" jobs to enable concurrency.

What do you think?

jeffkreeftmeijer / navvy

Navvy Concurrency #9