michelsalib / BCCResqueBundle

The BCC resque bundle provides integration of php-resque to Symfony2. It is inspired from resque, a Redis-backed Ruby library for creating background jobs, placing them on multiple queues, and processing them later.
123 stars 92 forks source link

Preventing duplicate entries in Queue. Adding only 1 job to a queue. #102

Open epicwhale opened 10 years ago

epicwhale commented 10 years ago

This is more of a query than issue.

If I have a Products queue with a list of products that need to be updated using data from a remote source, I want to ensure that no two products get updated at the sametime as it causes some concurrency issues.

What are the recommended solutions to approach this problem?

  1. Make sure my Jobs are concurrency proof (using database locks, etc).. but difficult to achieve with NoSQL.
  2. Ensure that the same Job is not being processed in parallel. (How can I achieve this?)
  3. Ensure that there are no duplicate jobs in the queue. (How can I achieve this?)

Any other solutions, would be appreciated.

danhunsaker commented 10 years ago

Option 2 - Set up a dedicated queue for sequential jobs, and assign only one worker to it. Assign all jobs that need to be done sequentially to that queue instead of the default(s). Mission accomplished.

epicwhale commented 10 years ago

@danhunsaker I forgot to mention this earlier, but my complication here is that our use-case is that of an e-commerce platform that is syncing products for multiple store's product queues. So new stores can be added automatically and each new store should have its own product queue...

But this has to happen dynamically as we can't re-configure running workers, queue names, etc in supervisor, linux each time we add a store.

danhunsaker commented 10 years ago

Does each store need a unique sync queue, or can the application operate with a single sync queue and a separate work queue for each store for other operations?

epicwhale commented 10 years ago

@danhunsaker each store doesn't need a unique sync queue as of now for anything except for products syncing. Everything else can be pooled into a common 'default' queue.

I need to make sure that one store's product sync is not delayed because another store having too many items in the queue. Would create a Quality of Service issue. Hence, 1 queue per store for product sync.

Do share your thoughts!

danhunsaker commented 10 years ago

My first response to that is virtualization. Each new store spins up a new VM, and anything it needs to do separately from other stores is done there. If I were implementing it, I'd have each store's web interface in its own VM, and possibly separate workers out as well. You'd still have a common worker pool in its own VM, and all the workers would connect to a single Redis instance.

However, that's often beyond the available technology, and it's probably too late in your dev cycle to set things up that way anyhow. So you'll want another approach. Ruby Redis has a plugin that allows certain queues to be marked as sequential when they're created, and then ensures that jobs are not removed from those queues while other workers are processing on them. I haven't looked at its code to see how portable it would be to how PHP Resque operates, but it's a starting point, I think.

epicwhale commented 10 years ago

@danhunsaker that does seem like an overkill.. especially since I'm building a SaaS service and want this to scale to a a few 100 and then thousand customers.

I did see the Ruby stuff on serialization in a queue and lock-in.. but looks a tad bit of complication to replicate and manage.. http://www.bignerdranch.com/blog/never-use-resque-for-serial-jobs/ (its almost maintaining another stand-alone project within my project). Don't have the benefit of time there.

Maybe for the products queue, I should be exploring some other alternative? Do you know of any other background or MQ solution that could support this and have a good bundle/library for php/sf2?

mrbase commented 10 years ago

@epicwhale i'm currently looking at gearman - which has a bundle - and it's under active development, needs sf 2.4 tho

if it meets your requirements i don't know, but its simple, fast and scales

otherwise, look at http://queues.io/ - here is a fine collection of queue systems

mrbase commented 10 years ago

and there is a pecl extension for php as well: http://www.php.net/manual/en/book.gearman.php

danhunsaker commented 10 years ago

In my experience, virtualization scales better, and is more secure to boot. But my experience varies wildly from that of many others, who haven't had any problem using such platforms as cPanel and WordPress for all of their needs. I just got tired of one site being able to consume the full resources of my servers, with no reliable way to restrict their activities without affecting anyone else. Also got tired of one hacked site infecting everything on the server. As with anything, your mileage will vary.

Resque wasn't really designed for sequential operation, and making it do it anyway will always be a hack. Even scheduled tasks are a hack, really. So PHP-Resque may not be your best fit. As to alternatives, there are many, and @mrbase has presented some useful starting points. I can't speak to Symphony interop, because I don't use Symfony. To me, SF2 is overkill. :-) I'm sure I'll encounter a project where Symfony makes sense eventually, though.

Best of luck!

mdjaman commented 10 years ago

@danhunsaker How to do : Option 2 - Set up a dedicated queue for sequential jobs, and assign only one worker to it. Assign all jobs that need to be done sequentially to that queue instead of the default(s)

epicwhale commented 9 years ago

Why didn't anyone suggest the enqueueOnce(..) function in this bundle? I also noticed that it isn't documented for some reason...

cc: @danhunsaker

danhunsaker commented 9 years ago

Possibly because that's not actually what was asked for. It wasn't preventing more than one of a job at a time from being queued. It was preventing more than one of a job at a time from being run. Very different approach, then.

Also, the fact it's undocumented doesn't help.

epicwhale commented 9 years ago

Point 3 in the question was Ensure that there are no duplicate jobs in the queue. (How can I achieve this?). I guess this solves that?

Yes, this seems to be a hidden gem. the enqueueOnce(..) function.. has it been tested / used in production?

danhunsaker commented 9 years ago

Better to write idempotent jobs, but yeah, that would probably also work.

I honestly don't recall.

danhunsaker commented 9 years ago

I take that back. I don't know how much testing enqueueOnce() has gotten, but it's brand new, within the last couple of weeks, so that's why it's neither documented nor mentioned above. It didn't exist yet. Somehow completely forgot working with the contributor on that one.

Hopefully we'll see some documentation on that soon.

darkromz commented 9 years ago

@epicwhale i came across this thread after looking for the same thing, preventing duplicate jobs being added to the queue, and then saw your comment about "enqueueOnce" and also also mentioned by your self i can't seem to find anything about it, can you give any examples of code on how to use this.

epicwhale commented 9 years ago

@darkromz been long since I've worked with something around this library.. have a look at the function maybe? https://github.com/michelsalib/BCCResqueBundle/blob/b4dfd5ae76a12da591ef12f5932838196083676c/Resque.php#L83

darkromz commented 9 years ago

thanks for the quick reply, i will give it a look.