vacp2p / rfc

Modular p2p messaging stack, with a focus on secure messaging.
https://rfc.vac.dev/
115 stars 15 forks source link

Better peer discovery mechanism #32

Closed adambabik closed 3 years ago

adambabik commented 5 years ago

Problem

At the very beginning, each Status client had only a single list of peers it could connect to. Mailservers and Whisper nodes were separated. As you can imagine, this is not very clever way of sharing peers' addresses but it is simple and works great if all goes fine.

However, later, we started thinking about issues that may occur and had actually occurred. Namely, many of our peers hosted in AWS or GCP were blocked in Russia and China. So, we introduced another set of server peers in Asia + implemented the old PoC Discovery V5 algorithm from geth. When we tested it, it turned out that it consumes a lot of bandwidth and also other resources like CPU and memory. So, we want with a strategy to only turn on Discovery V5 if there are not enough peers connected and we also looked at other peer discovery algorithms like Rendezvous.

All that effort brought more and more complexity, it became harder and harder to measure the source of bandwidth usage and battery draining. It also decreased our velocity.

We need a cleaner, simpler solution for this task.

Acceptance criteria

Possible Solutions

  1. Node Discovery via DNS + DNS over HTTPS. This is actually only a part of the story because it suggests only sharing bootnodes this way and later use a dynamic peers discovery mechanism. In our case, this might be sufficient as end-to-end solution.
oskarth commented 5 years ago

Thanks for the issue!

Right now this issue seems very Status/client specific. Can we rewrite it to be more general where Status client and its specific circumstances is just a specific instance? It references Status infrastructure and "us" running the cluster, which is more of a client/downstream/deployment concern.

The problem description makes it seem like the issue in is complexity, but it doesn't really explain what the actual issue is with the current setup ('m not saying there aren't any). As far as I can the current strategy is:

Which seems like a reasonable algorithm, assuming Discovery v5 is well documented and sane and meets our needs. If it isn't or doesn't, then that seems like a more relevant spec bug/feature request.

Can we be more precise about the problems with that approach?

I also don't think we are likely to change peer discovery mechanism in version zero, but for future iterations for sure. It also seems like a much larger and general issue than Waku mode per se. I.e. it's an onoging topic in Eth2 afaik, etc.

adambabik commented 5 years ago

but it doesn't really explain what the actual issue is with the current setup

I did describe that, maybe not in a single place. The currently used Discovery V5 is not sustainable on mobile, it's too much bandwidth and resources. And the complexity is a problem because the current approach is hybrid Discovery V5 + Rendezvous + caching + peer pools + no spec.

We can try to use the new Discovery V5 but as far as I know it's still experimental and not even in go-ethereum master branch. We would still need to confirm Discovery V5 works fine in a case where nodes supporting a specific protocol are sparse.

It references Status infrastructure and "us" running the cluster, which is more of a client/downstream/deployment concern.

Because that's the reality. We can generalize it in the spec of course. I think we still should optimize for a single entity providing Waku nodes. Which entity would do that does not matter.

I also don't think we are likely to change peer discovery mechanism in version zero, but for future iterations for sure.

I think we should because it's gonna be a problem for Nim. As far as I know they only have Discovery V4 and that's useless for us (Disc v4 is not topic-based peers discovery algorithm).

I think we should do as simple as possible so either static or EIP 1459: Node Discovery via DNS but instead of bootnodes provide Waku nodes instead.

oskarth commented 4 years ago

I think we should do as simple as possible so either static or EIP 1459: Node Discovery via DNS but instead of bootnodes provide Waku nodes instead.

Agree with KISS, I wonder how this plays into the finding closest mailserver change?

--

It also seems like this also touches on a larger research issue, which is finding discovery mechanism for resource restricted devices.

Eth2 and libp2p https://github.com/ethereum/eth2.0-specs/blob/02bb92e71455adaa7da101563a6c367efe9e1cc7/specs/networking/p2p-interface.md#the-discovery-domain-discv5 as well as Swarm Kademlia light connection strategy

What about splitting this into two issues, one vac/reseach and another KISS for status/spec deploy which is more like static nodes or possible the DNS thing? cc @decanus too

adambabik commented 4 years ago

Agree with KISS, I wonder how this plays into the finding closest mailserver change?

It's still can be applied. You get a list of Waku nodes via DNS and HTTPS. You collect RTTs for all of them and pick the closest one.

What about splitting this into two issues, one vac/reseach and another KISS for status/spec deploy which is more like static nodes or possible the DNS thing?

Yes, that is exactly the approach I would like to take for the time being.

decanus commented 4 years ago

@adambabik why would a rendezvous only protocol not work?

adambabik commented 4 years ago

@decanus that would work but this protocol (1) is not used by anyone else, (2) is not researched very well , for example I have no idea what kind of security guarantees it makes.

To me, it seems like EIP 1459 is a safer choice. With rendezvous you also need to have bootnodes so solution described in EIP 1459 would still be needed. Basically, having EIP 1459 is a prerequisite to whatever dynamic peers discovery mechanism we decide on.

oskarth commented 4 years ago

@decanus what's the state of this issue?

oskarth commented 4 years ago

Issue not well defined, too broad. Should be factored out into a more minimal acceptance criteria (and more long term research Q).

oskarth commented 4 years ago

@decanus @adambabik what's happening with this issue? Are you going to factor it out into something more minimal? It's been pending for two weeks with no updates...

adambabik commented 4 years ago

We want Waku/0 with both geth and Nimbus so we should figure out the easiest solution. Like I said in the comment, EIP-1459 seems the easiest and will stay useful in the future. It requires development from both status-go and Nimbus but considerably less than implementing the current peers discovery algorithm by Nimbus.

decanus commented 4 years ago

@adambabik so I think a problem with the DNS option is an issue we also have documented in the rendezvous issue. Blocking by an adversary becomes rather simple, I am questioning what other schemes we can use in case an adversary blocks access to certain DNS entries. https://github.com/status-im/rendezvous/issues/8

adambabik commented 4 years ago

@decanus not sure we should try to solve this problem in this issue. We don't even have the simplest possible way to share node IPs except for hardcoding them in the code/app itself.

With the current solution, an adversary can block our bootnode IPs and we would need to release a new version of the app or share the new bootnode IPs with the community in some way. Blocking DNS entries might be a bit more tricky and one can always switch DNS resolver.

In my opinion, using DNS, for now, improves the current situation and also allows us to remove huge chunk of complicated architecture we need to maintain, and in the case of Nim, develop.

decanus commented 4 years ago

@adambabik I agree that DNS is probably a good solution for now at least and that it solves most of our issues, I'm highlighting that concern here so that it is persisted somewhere.

decanus commented 4 years ago

so I am currently looking into various protocols that use some form of DNS discovery, both libp2p and bitcoin come to mind although the libp2p DNS spec is for LAN based discovery. See mdns. It probably makes sense to make it similar to some other formats using a certain standard.

kdeme commented 4 years ago

I don't mind initially going for static or for EIP 1459. And using it together with Discovery v5 in the longer run. Regarding the latter, discovery v5 (latest spec) is already (far) in development in nim-eth. Which comes with ENR support, which is also needed for EIP-1459.

decanus commented 4 years ago

Oh also I thought that instead of using ENR we could use mulitaddrs. Seems like a more versatile standard. @kdeme what do you think?

decanus commented 4 years ago

@adambabik @oskarth @kdeme, I decided the best way to go forth here was quickly write up a forum post on my suggestion of how I would do this: https://forum.vac.dev/t/node-discovery-via-dns/29. If this is fine we can spec it out, if not further discuss.

fjl commented 4 years ago

I'd be so happy to see Waku use the devp2p discovery stack as-is. If you already have an implementation of discv5 proper, you can tune it for your needs.