lightster / hodor

🚪 A worker queue that is evolving to a job queue
MIT License
4 stars 1 forks source link

Figure out how to handle RabbitMQ failures after the superqueuer has already committed the DB transaction #241

Open lightster opened 8 years ago

lightster commented 8 years ago

If the superqueuer commits the DB transaction successfully, but then the connection to Rabbit is lost, the jobs will never be cleared out of queued_jobs.

lightster commented 8 years ago

Perhaps if all queue's were published on the same AMQP channel, we could batch publish all jobs to RabbitMQ at once. Then in the queued_jobs table, we can keep track of a 'superqueuer transaction ID', and move those jobs back to pending_jobs if the superqueuer failed during the RabbitMQ publish.

Also, it would be good to consider how we can make the situation self-healing. If for some reason the DB transaction is committed, the RabbitMQ publishes fail, and then the DB transaction fails, how can we roll it back? Perhaps we monitor the transaction IDs and "rollback" a transaction if a certain number of newer transactions are processed first?