This bug report is related to the limitations described here. But I wanted to write down my findings, not sure these can be resolved or maybe I am missing something.
We are encountering an issue while setting up Besu networks on different Kubernetes clusters with the aim of enabling discovery between nodes across clusters.
We setup one node on cluster A and a second node on cluster B. When starting the node on cluster B we configured the enode from the node on cluster A as a bootnode. Both nodes have a LoadBalancer configured with a different discovery port for udp than the rlpx port for tcp as described in the limitations.
Next to this we also configured the KubernetesNATManager which discovers the portMapping from the LoadBalancer. All of this seems to be working correct, if i use admin_nodeInfo it gives me the right enode url with the discport which was found by the KubernetesNATManager.
The problems are starting to happen when the discovery mechanism starts to do its thing. The PING/PONG mechanism is not taking the portForwarding from the KubernetesNATManager into account. You see the node on cluster B sending a PING with the "wrong" udpPort configured. I'm putting wrong between "", because it is the udpPort configured on the besu service itself, but not the one from the LoadBalancer.
The enode returned from admin_nodeInfo is this: enode://9440beacdc22fa80e18f1ca9093c79d7ff520d99b8223e15aab7279a2949794d55bbaa3077172b9f5c8c44eb08394d33b763591b3a6ef48d3cbf70bed31bd333@18.159.175.159:30303?discport=40404 so the ip address is wrong, but also the discport is wrong. The issue with the ip address is solved here in PR #6225 i tested this and this seems to work. And the discport i guess is the port from the connection which was set up for the udp protocol by the node on cluster B. The logs are still from a test case i did on the latest 24.1.2 branch that's why the ip address is wrong.
In the end this discovery between these two nodes succeeds at least pre PR #6225, because the node on cluster A can talk to the node on cluster B and the other way around. But things go wrong if i set up another node on cluster C with the node of cluster A as bootnode. It will receive the neighbours of cluster A, which contains the node on cluster B but with this enode uri it had build during the discovery phase. And the node on cluster C cannot connect the node on cluster B using this enode uri, since it doesn't go through the loadbalancer.
after PR #6225 the PING/PONG between the node on cluster A and the node on cluster B also doesn't succeed because the enode using the ip address from the PingPacketData which is the ip address of the LoadBalancer but with a wrong discport.
Description
This bug report is related to the limitations described here. But I wanted to write down my findings, not sure these can be resolved or maybe I am missing something.
We are encountering an issue while setting up Besu networks on different Kubernetes clusters with the aim of enabling discovery between nodes across clusters.
We setup one node on cluster A and a second node on cluster B. When starting the node on cluster B we configured the enode from the node on cluster A as a bootnode. Both nodes have a LoadBalancer configured with a different discovery port for udp than the rlpx port for tcp as described in the limitations. Next to this we also configured the
KubernetesNATManager
which discovers the portMapping from the LoadBalancer. All of this seems to be working correct, if i useadmin_nodeInfo
it gives me the right enode url with the discport which was found by theKubernetesNATManager
.The problems are starting to happen when the discovery mechanism starts to do its thing. The PING/PONG mechanism is not taking the portForwarding from the
KubernetesNATManager
into account. You see the node on cluster B sending a PING with the "wrong" udpPort configured. I'm putting wrong between "", because it is the udpPort configured on the besu service itself, but not the one from the LoadBalancer.The node on cluster B receives this PING and you see it has a very wrong enode URL:
The enode returned from
admin_nodeInfo
is this:enode://9440beacdc22fa80e18f1ca9093c79d7ff520d99b8223e15aab7279a2949794d55bbaa3077172b9f5c8c44eb08394d33b763591b3a6ef48d3cbf70bed31bd333@18.159.175.159:30303?discport=40404
so the ip address is wrong, but also the discport is wrong. The issue with the ip address is solved here in PR #6225 i tested this and this seems to work. And the discport i guess is the port from the connection which was set up for the udp protocol by the node on cluster B. The logs are still from a test case i did on the latest 24.1.2 branch that's why the ip address is wrong.In the end this discovery between these two nodes succeeds at least pre PR #6225, because the node on cluster A can talk to the node on cluster B and the other way around. But things go wrong if i set up another node on cluster C with the node of cluster A as bootnode. It will receive the neighbours of cluster A, which contains the node on cluster B but with this enode uri it had build during the discovery phase. And the node on cluster C cannot connect the node on cluster B using this enode uri, since it doesn't go through the loadbalancer.
after PR #6225 the PING/PONG between the node on cluster A and the node on cluster B also doesn't succeed because the enode using the ip address from the
PingPacketData
which is the ip address of the LoadBalancer but with a wrong discport.Acceptance Criteria
PingPacketData
from end point should take into account the port mapping from theKubernetesNATManager
. I'm not familiar enough with the code but should we here grab the discoveryPort from the NATService?handleIncomingPacket
should also build up the enode for the DiscoveryPeer taking into account the discovery port configured in thePingPacketData
in the same way it is taking the host from the packet as well.Steps to Reproduce (Bug)
KubernetesNATManager
to look at that loadbalancerRelated issues: