uwiger / locks

A scalable, deadlock-resolving resource locker
Mozilla Public License 2.0
204 stars 26 forks source link

Deadlock due to outdated lock_info #37

Open xinhaoyuan opened 6 years ago

xinhaoyuan commented 6 years ago

I was testing locks using my testcase. I believe that there is a bug in the lock_info handling of locks_server and locks_agent, which may cause deadlock.

My testcase has 3 concurrent clients/agents, namely C1, C2, and C3, and 3 locks, [1], [2], and [3].

Here is how the bug happened (in sketch):

  1. C1, C2, and C3 competed on locks. Due to the deadlock resolving algorithm, C1, C2 eventually acquired all locks and finished.

  2. In the resolution process, C3 got lock_info of [2] (due to locks_agent:send_indirects/1) even C3 hadn't reach the point of requesting it, which means C3 was not in [2]'s queue.

  3. The locks_server remove the local lock_info entry of [2] since the queue is empty now. This effectively resets the vsn of the lock_info.

  4. C3 started requesting [2], but the locks_server would respond with lock_info that had lower vsn than what C3 was told with. Thus C3 got stuck.

I've tried to fix by not removing lock_info entries in locks_server, but my fix seems to fail the test in other ways. Maybe this breaks the algorithm?