Closed Revisor closed 8 years ago
The fix is better than my, but there is still a fundamental problem. Calling connect method should establish connection and by removing my fix, it won't.
Thank you for having a look at the proposal. Can you please explain under what circumstances calling the connect()
method can land us without a working connection (other than if no node is available at all)?
Finding the first working node is the purpose of findAvailableConnection()
and this method has only two outcomes - either it returns a connected node, or it throws an exception if no node works.
I described one case in my original ticket.
When adding into the queue the Queue::checkConnected
is called, the isConnected
there returns false
, so the Client::connect
is called, which then forwards message to the Connection/Manager::connect
, that function that selects a random node and calls $this->switchToNode($currentNode);
. The problem in switchToNode
is that it tries to be smart and do not connect to a node if we are swittching to the same one. The problem is that this function doesn't know the status of the connection of that particular node.
Thanks for the great description, now I understand the whole problem at last. I'll update my proposal.
Here's a new version. I have fixed the bug reported by Miha and improved the reconnection overall:
The reason for this order is that the HELLO most likely contains newer information about the cluster than the initial credentials.
And I have added two tests for this. The first one tests reconnection based on HELLO, the second one tests reconnection that falls back to the credentials.
I'm looking forward to your code review and criticism.
@Revisor a total +1 on my side and <3 the mock overdose on the test. Let me know if there's anything else you want to add or if we are good for merging
I have slept on it, read it again and I have nothing else to add. I'm not sure if that much mocking doesn't make the tests brittle. But we need them, so...
When switching nodes, the prioritizer can recommend to stay on the current node. The current node however could have lost its connection in the meantime. In that case we will try and reconnect, or if not possible, connect to the next best possible node.
The check whether the current node has disconnected has been moved so that it is checked every time we execute any command, not just after the GETJOB command.
I tried to write a test for this situation, but ended up with a brittle monster of a test depending on the order of method calls among several mocks. :-/