vacp2p / research

Thinking in code
MIT License
62 stars 4 forks source link

discv5 #19

Closed decanus closed 4 years ago

decanus commented 4 years ago

closes #15

decanus commented 4 years ago

So @oskarth, I was working a bit with @kdeme and we both had the same thought. I actually started googling and on most devices there is no practical way to get your external IP. This to me means that mobile devices cannot sign an ENR and participate in the DHT in 2 ways. They can only run lookups. So the only real experiment that has any form of practical effect is figuring out how many lookups a "light" node needs to execute to find interesting nodes, this is totally dependent on the ordering of the DHT and if we were to take into account what @arnetheduck said yesterday may not even be relevant. IMO topics is what is needed here, has the lowest overhead and always guarantees that we ideally get what we are looking for when doing 1 lookup.

decanus commented 4 years ago

So also talking to @kdeme, a node can reply with the external IP in a PONG message so people can update their ENRs. But considering the short connection windows + the fact that mobile gets new IP address much more often it is questionable if this will work well or even work at all.

kdeme commented 4 years ago

Yes, the important part is to know how expensive (how many lookups) it will be to get n interesting nodes in a bunch of nodes (randomly filled DHT). This will also depend on some of the parameters I mention here: https://github.com/vacp2p/research/issues/15#issuecomment-601657832 Obviously, topic advertisement is an important one there, and depending on the ratio of interesting versus general nodes, one could probably already conclude that it is necessary.

The fact of short connection windows + frequently updated ip-address will probably cause a mobile device not to be discoverable from other nodes in the DHT (chance of others finding a correct ENR in the DHT). At least, not when the mobile device is moving ( guess this also depends somewhat per mobile operator, ipv6, etc.).

oskarth commented 4 years ago

We've known for a long time that mobile phones wouldn't be publicly connectable. The purpose of the DHT isn't to find mobile phone, it is to find nodes that provide a useful service. The questions that should be answered are still the same, are they not? I don't see how this changes anything related to dependencies of topics etc.

I'm curious about the ENR signing of IP requirement. Do we have more info on this?

decanus commented 4 years ago

@oskarth connection info in the ENR should be signed https://github.com/ethereum/devp2p/blob/master/enr.md. This includes IP etc, even though these fields are optional an ENR without them would be usefless.

decanus commented 4 years ago

So we seem to have a problem @kdeme, there are times when nodes just return nothing in the lookup.

kdeme commented 4 years ago

So I ran the code and did a little investigation

First I noticed a lot of revalidation failures, and many of them are with the bootstrap node (makes sense, all are spamming this one initially). I also noticed a lot messages arriving after the timeouts (and thus being dropped).

I added some logging to see how many entries each node has in its DHT, it turned out to be very low for the nodes started later, see first file here: https://gist.github.com/kdeme/e498f4a9efc1dbb21931f74648d2bcec

So I increased the timeouts (handshakeTimeout = 5.seconds, responseTimeout = 10.seconds). Results for the entries in the DHT seem better (see second file in gist), however, the lookups are still rather bad. And there are some strange cases where it needs several lookups or sometimes even fails when a node is located in 99 of the DHTs.

So, it looks like there is some bug in the findNode call, I will further investigate this.

Even with the increased timeouts, there are still a lot of messages that get lost. We could probably improve this by doing the lookups (that now happen in the loops) in a more controlled way. Or by adding them "manually" (yet randomly) with addNode, as we have all the nodes their information (ENR).

decanus commented 4 years ago

@kdeme @oskarth just posting this here for persistence, when I now start a network by pairing nodes randomly using the addNode function, ensuring that start function is called on no node other than the bootstrap node we seem to get a success rate more close to 60-80%, rather than a failure rate to that. This would mean that there is something wrong with the lookup loop used by the discv5 implementation in nim.

Also important to note, when I call start on only the bootstrap node, it seems like the results are the best.

Important to note: The pairing currently adds 20 random records to every node, magically this is a better result then the DHT which sometimes has nodes that have 99 nodes.

oskarth commented 4 years ago

Questions on baseline test results: https://gist.github.com/decanus/dea759e3471b0a5dcebba1218eb070c7

PEERS_PER_NODE = 16

This is the setting, but is this the reality? That's what I would assert in log statements

If a node returns 0 nodes when called with a distance of 256, we consider it to be failed. We limited max lookups to 100, it seems unreasonable to assume it would go higher, and none of the results seemed to go higher.

IIRC you saw behavior where a node return 0 nodes several times in a raw, but then it returned more. Is that issue fixed and do we know the origin of that?

by pairing nodes randomly using the addNode function

The important property to me is that each node should have kademlia connectivity. This may or may not happen by adding random nodes. I'd make sure the nodes in the network have connectivity by logging/asserting that they should have a node in each k-bucket.

kdeme commented 4 years ago

So I've investigated the code for FindNode in nim-eth better and found a crucial bug. Fix is here https://github.com/status-im/nim-eth/pull/219 More background here: https://github.com/status-im/nim-eth/issues/205#issuecomment-612095492 As you can read there, if I now also change the returned Nodes packet to return only ENRs of nodes with the correct distance (as by spec), then I think we will see an increase in required lookups.

I've ran (an older version, sorry didn't update) of your code and results are much better. Failure can still occur of course but at least it seems more consistent (less nodes in the tables of the individual nodes -> more chance on failure, etc.)