Closed TwitchChen closed 6 years ago
Yesterday, I found the cluster was wrong.the node could not get the kv or other thing
Can you describe exactly which node is not working as you expect? Is it every node broken or just one? CAn you show your server and client configs?
2018/11/14 04:00:01 [ERR] memberlist: Failed push/pull merge: Node '30264' protocol version (0) is incompatible: [1, 5] from=100.68.156.174:51128
Hmm that's strange since all the nodes you pasted in the list there say they support protocol version 2. 0 is not even a valid protocol version. I wonder if this is a red-herring and is caused by something like a vulnerability scanner trying to talk to the node on it's gossip port but sending it garbage?
@banks @TwitchChen I suspect that's the exact same bug as https://github.com/hashicorp/consul/issues/3217 -> this is not linked to any specific Consul version, we have it from time to time (especially when there are many elections, such as when upgrading Consul version). It also happens when all agents are in the same exact version.
The fix we use in that case is to restart sequentially all servers, it works all the time for us (we had this up to versions 1.2.x, but we never found the exact root cause).
In that case, what also might happen is that a few agents can see each others, but cannot see all servers. Sometimes restarting those agents do work, but when it does not, restarting all servers sequentially is the only reliable way we found to remove this issue.
Thanks @pierresouchay I agree this seems to be the same issue. I'll close this as a duplicate for now since the other is already in our backlog for attention (it's a long backlog sadly!)
Dupe of #3217. Thanks for reporting this @TwitchChen.
We hava a consul cluster include three server and some nodes, all the consul agent's version is 0.9.2
Yesterday, I found the cluster was wrong.the node could not get the kv or other thing
I checked the server's log, i found this :
on 2018/11/14 04:00:01, we restarted some nodes, then hte node 30264 protocol version (0) is incompatible.
But I don't know how the problem is generated.