Research about discovery over UDP or recursivly

y0sher commented 5 years ago

The discovery protocol we use is kademlia, in the kademlia paper, UDP is suggested as the transport layer, mainly because we rapidly query many nodes that we have never established contact to before and possibly will never see again during a lookup.

Currently our implementation uses TCP, we should dive further to understand if we better switch to a different transport for discovery (and break from the normal connection-session flow). or maybe implement a recursive kademlia protocol.

avive commented 5 years ago

My opinion on this is that we should stay with TCP for the next milestone for simplicity reasons (it already works) and due to the additional reliability that TCP provides over raw udp packets e.g. retransmission of lost packets. How many nodes does a node query as part of the protocol? Isn't this limited to a low number, e.g. 5? The goal is for each node to have n=5 random connections so we can establish the gossip network. Once these connections are established - are there many other connection attempts to new nodes?

zalmen commented 5 years ago

As part of this work we should also revisit our decision to use TCP for our secured session. @noamnelke you had some thoughts on this topic - please share them as a comment

avive commented 5 years ago

I'd like to chime in as a survivor of p2p 1.0 space back in 2004 - we need to keep in mind that we are going to ask validators to be routable on a port so they can accept incoming remote connections from other remote Internet nodes. The less they need to do (configure access points and routers) the better it is for us as it is easier to setup a node. We are going to lose some interested validators who can't configure their local access point. e.g. college dorms, desktops at work behind a restrictive firewall, etc... So, if possible, we should choose 1 protocol and not 2 (e.g. udp and tcp). In Bitcoin core full nodes - TCP is used and they are required to open 1 tcp port to the Internet as part of the initial setup. So, if we switch the UDP does it mean that all comm is going to be UDP or do we still need TCP to be working as well? Sure, there's an overhead with TCP but it gives us the auto retransmission of occasionally lost packets in multi-packet messages. Is the overhead really material with modern PCs? Curious to hear what @iddo333 thinks about this as well...

noamnelke commented 5 years ago

I'm actually of the opinion that we should switch everything to UDP.

Setup simplicity is a big factor: NAT traversal is more straight forward, and sometimes only possible, over UDP. Implementing STUN would make it possible for most users to run a node with no need for port forwarding.

While it would be possible to coordinate TCP hole punching over UDP, it adds complexity and won't always work.

Another reason I'm for using UDP exclusively is that it makes connections more incidental. Over UDP transmitting a message to, or receiving a message from a new peer only requires computing and possibly caching a shared secret. This is not free and we should absolutely be selective about it, but it enables each node to maintain more connections using the same resources.

Implementing our own delivery acknowledgement mechanism will require more work, but it enables behaviors that wouldn't be possible over TCP. E.g. try another peer when delivery fails instead of retrying the same one; or prefer peers with shorter acknowledgement times.

Existing implementations of Kademlia and other gossip implementations reached the same conclusion and use UDP.

avive commented 5 years ago

These are good arguments - I support this as as long as we only ask validators to open 1 udp/tcp port to become routable

noamnelke commented 5 years ago

The point of STUN is that they don't even have to do that. It's a way to get local NATs (essentially home/small office routers) to naturally assign and forward an external port to our node, without user intervention.

y0sher commented 5 years ago

@avive at the process of bootstrap a node might query the whole network. the alpha constant in kademlia is just to limit the amount of concurrent queries. Since kademlia queries are iterative then we only need one request-response with each node in the route of our bootstrap. that means we establish a secured tcp connection with a session for each of these, in order to make space for peers connecting for other protocols we have to close them which results in a lot of creating and closing connections while bootstrapping. also currently we have no way to identify the purpose of the connection when its formed but only after the first message.

noamnelke commented 5 years ago

I find the replies in this thread interesting and relevant: https://twitter.com/lopp/status/1076476329560850433?s=21

spacemeshos / go-spacemesh

Research about discovery over UDP or recursivly #259