Closed garret-smith closed 9 years ago
Hi Garrett,
Could you test PR #12 ?
This is looking great. Tested 2-node and 3-node setups, where a non-locks node connects to {1,2} other locks nodes.
Thanks for the quick turnaround!
Anything for you, Garrett. ;-)
PR merged into master
locks_leader makes the assumption that all connected nodes are running the 'locks' application. If a node not running 'locks' connects to a node running a locks_leader process, the locks_leader process deadlocks.
Steps to reproduce
Start node 'a'. Start the 'locks' application. Start a locks_leader process.
Observe that the locks_leader process on node 'a' is the leader and responsive.
Start named node 'b'. Connect to 'a'.
Observe that the locks_leader process on node 'a' is now stuck in safe_loop and no longer responds to normal messages.
Investigation
locks_leader receives nodeup message, processed on line 558 of locks_leader.erl The new node is not in nodes, so include_node (line 693) is called. include_node calls locks_agent:lock_nowait locks_agent sends a {locksagent, , 'waiting'} message, handled on line 571 Process gives up leadership, causing it to enter safe_loop, but a response from the newly connected node will never come since it is not running 'locks'.
I'm not sure what to do next. I don't know the locks application well enough to attempt a fix. Any guidance would be helpful.