samsondav / rihanna

Rihanna is a high performance postgres-backed job queue for Elixir
MIT License
439 stars 47 forks source link

We should enforce the order in which retries can happen. #9

Closed Adzz closed 6 years ago

Adzz commented 6 years ago

Jobs are not guaranteed to be commutative. That means you could imagine a scenario like this:

3 (non commutative) jobs fail, and appear on the failed queue. Imagine for example, switching a boolean value.

That means on the retry list we have:

  1. A job which wants to go from true -> false
  2. A job which wants to go from false -> true
  3. A job which wants to go from true -> false

Now imagine a dev retires the job, as far as I can see there's nothing guaranteeing the jobs run in the correct order - so 3 could finish, then 1 could finish, then 2 could finish.

That would leave the system in an incorrect state.

samsondav commented 6 years ago

@Adzz Interesting question, and you are correct that Rihanna makes no guarantees about the order that jobs will be executed in. However, this is a problem inherent to all distributed job queues and is unsolvable for any level of concurrency greater than 1.

Let's look at an example:

We enqueue two jobs, Job A and Job B. The dispatcher reads these jobs sequentially and distributes them to two different workers. They are executed like this:

Job A starts
 |
 |         Job B starts
 |            |
 |            |
 |            |
 |            |
 |         Job B ends
 |
 |
Job A ends

Even though Job A was distributed to a worker first, there is no guarantee that it would FINISH before Job B. What if worker A is slower?

The only way to guarantee ordering in a distributed system is to create a worker lock such that only one job can ever be executing at the same time (serializable jobs):

Job A starts
 |
 |         
 |            
Job A ends           
Job B starts           
 |          
 |        
Job B ends

But this means you essentially don't have any concurrency! So there is no point in a distributed job queue at all for this example. The only solution for a concurrent system is to write your jobs in such a way that ordering does not matter.

Does that make sense?

samsondav commented 6 years ago

Closing as not mathematically possible.