uwiger / locks

A scalable, deadlock-resolving resource locker
Mozilla Public License 2.0
204 stars 26 forks source link

locks_leader cannot coexist with nodes not running locks application #11

Closed garret-smith closed 9 years ago

garret-smith commented 9 years ago

locks_leader makes the assumption that all connected nodes are running the 'locks' application. If a node not running 'locks' connects to a node running a locks_leader process, the locks_leader process deadlocks.

Steps to reproduce

Start node 'a'. Start the 'locks' application. Start a locks_leader process.

Observe that the locks_leader process on node 'a' is the leader and responsive.

Start named node 'b'. Connect to 'a'.

Observe that the locks_leader process on node 'a' is now stuck in safe_loop and no longer responds to normal messages.

Investigation

locks_leader receives nodeup message, processed on line 558 of locks_leader.erl The new node is not in nodes, so include_node (line 693) is called. include_node calls locks_agent:lock_nowait locks_agent sends a {locksagent, , 'waiting'} message, handled on line 571 Process gives up leadership, causing it to enter safe_loop, but a response from the newly connected node will never come since it is not running 'locks'.

I'm not sure what to do next. I don't know the locks application well enough to attempt a fix. Any guidance would be helpful.

uwiger commented 9 years ago

Hi Garrett,

Could you test PR #12 ?

garret-smith commented 9 years ago

This is looking great. Tested 2-node and 3-node setups, where a non-locks node connects to {1,2} other locks nodes.

garret-smith commented 9 years ago

Thanks for the quick turnaround!

uwiger commented 9 years ago

Anything for you, Garrett. ;-)

uwiger commented 9 years ago

PR merged into master