Closed adambabik closed 6 years ago
Can we turn on and off discovery protocol when the node is running?
right now there is no such method, but in general yes. cause discovery v5 is not used for dynamic connections in p2p server. the tradeoff is that we will have to wait for local kademlia table to sync every time we need to use it.
Can Discovery V5 protocol carry more information than protocol name and version? How can we distinguish Light Whisper nodes or Whisper nodes with MailServer capability?
it can carry arbitrary label=value pairs, the problem however with mailserver/shh pair is how to enforce N connections with shh and M with mail server, where N is higher. I proposed to make a mail server a separate protocol cause it allows enforcing this number on a geth node/p2p layer, while with labels we will have to enforce it externally which has some problems.
How to make sure that there are always at least N peers with given protocol support. For instance, we always want to have at least some Whisper and some LES peers connected. Also, we want to always have at least one Whisper peer with MailServer support.
I will copy proposal from a mailserver spec: On the side of status-go we should have clear per-protocol max peers limits. For example: whisper peers < 5, mail server peers < 2, les peers < 10. With max peers as a sum of all sub protocols, < 17 in this case. Plus eth peers of course if they are needed.
As a summary: I will work on part of this idea, as it intersects with what I am doing with mail servers. For tests I suggest to use status-scale as it allows to build required topology easily and already has an example of topology we are using for the production environment (not with v5 obviously).
I proposed to make a mail server a separate protocol cause it allows enforcing this number on a geth node/p2p layer, while with labels we will have to enforce it externally which has some problems.
So what these pairs (label, value)
are useful for (haven't read the content of the link I posted yet but will do it today)? Using a separate protocol, we will end up with a MailServer peer that runs regular Whisper with a mail server registered (just like it is now) and this new protocol which will be used only to retrieve historic messages, right? Or this separate protocol will just act as a proxy and messages will still be delivered through shh
protocol?
@dshulyak
As a summary: I will work on part of this idea, as it intersects with what I am doing with mail servers.
I will include you as a contributor then so that we can avoid work duplication.
So what these pairs (label, value) are useful for (haven't read the content of the link I posted yet but will do it today)?
i was going to use them to discover our nodes instead of random nodes that can be connected to our bootnode, it can be done by registering topic network=status
and then querying it from mobile app, see les for usage example https://github.com/ethereum/go-ethereum/blob/master/les/serverpool.go#L148-L152
But we can also add multiple topics to nodes, like cap=whisper
, cap=les
, cap=mailsever
and then get all those services with described capabilities from the network.
Using a separate protocol, we will end up with a MailServer peer that runs regular Whisper with a mail server registered (just like it is now) and this new protocol which will be used only to retrieve historic messages, right? Or this separate protocol will just act as a proxy and messages will still be delivered through shh protocol?
1st variant in my current implementation.
at the moment we have discovery peer pool that acts as a mediator between discv5 and p2p server (more about it here https://github.com/status-im/status-go/pull/736). It works as expected, but there are some problems. I made e2e test with pool (https://github.com/status-im/status-scale/pull/4) , for now the biggest problem is that discv5 is very spammy and cpu intensive (especially if we want to connect with peers fast). One optimization that we must use is to guarantee that bootnode kademlia table is already filled with valid peers when mobile tries to connect. Then mobile peer connects quite fast and without significant network overhead, below you can find a stats for mobile peers that connected with 3 static nodes in less than 5s.
HEADERS | ingress | egress |
---|---|---|
0 | 0.056871 mb | 0.198661 mb |
1 | 0.057573 mb | 0.130130 mb |
2 | 0.048485 mb | 0.219504 mb |
3 | 0.057013 mb | 0.193834 mb |
4 | 0.057305 mb | 0.146696 mb |
5 | 0.046149 mb | 0.247809 mb |
6 | 0.058166 mb | 0.224956 mb |
7 | 0.056060 mb | 0.150866 mb |
8 | 0.046549 mb | 0.246145 mb |
9 | 0.085530 mb | 0.481873 mb |
After mobile will connect with a sane number of static peers, we will make lookups very rare.
Great stuff ❤️
To make it a bit more clear, I have a few questions:
maxPeers
?) via a regular bootnode?Headers
in this table? A mobile node tries to connect to three nodes (3 is enforced with maxPeers?) via a regular bootnode?
yes, every mobile node will be connected with at most 3 (maxpeers)
What is Headers in this table?
those are headers of the table, ingress and egress. rows are indexes of leaf nodes
Is it possible to fill in Kademlia's DHT table manually based on static nodes in our cluster?
not sure, definitely there is no public api for that
This network stats are without that optimization you mentioned?
stats are right after peers joined, so i don't think that we will be able to make them lower. on a good side is that topic peers are saved in a local leveldb used by discv5
UPD: and right they with the optimization
unfortunately after more tests, discv5 doesn't seem as a viable option for mobile devices. register topic requests result in lots of spam in the network, like 1mb per minute. I have an idea to use only part of discv5, instead of connecting to a kademlia table i will try to use only topicSearchReq and listen for an answer from bootnodes.
UPD: so the tradeoff is that topics will be advertised only to bootnodes, it is a bit different from original design, but least we will get nicer network profile
HEADERS | ingress | egress |
---|---|---|
0 | 0.096202 mb | 0.072366 mb |
1 | 0.098247 mb | 0.069385 mb |
2 | 0.112502 mb | 0.073955 mb |
3 | 0.099119 mb | 0.071841 mb |
4 | 0.087120 mb | 0.077138 mb |
5 | 0.086700 mb | 0.074245 mb |
6 | 0.084472 mb | 0.080240 mb |
7 | 0.087930 mb | 0.077628 mb |
8 | 0.091138 mb | 0.076612 mb |
9 | 0.064683 mb | 0.086961 mb |
Even if we can discover peers fast it is still not good enough for mobile usage, we will have to wait for 2-5 seconds every time when mobile app is started. While discv5 has a caching layer, that layer depends on another type of discv5 messages that are very rare.
Looks like we will have to introduce our own layer for persisting valid peers
Looks like I was able to achieve desirable latency and low resources consumption.
My proposal is to use peer pool that will subscribe to required topics, start searching for peers from those topics in aggressive mode, once the minimal amount of peers are found switch to low-sync mode. Additionally, we will cache peers in leveldb on every mobile device, and once app re-starts we will use that cache to fill a peer pool with last connected peers. It will guarantee that when apps re-starts we won't spend any cycles (no additional latency) on finding valid peers.
So in the test I run two groups of nodes (3 peers called central, 2 peers called rare), each mobile device needs to get 2 peers from central group and 1 from rare. Additionally, after peers are connected app waits in idle mode for 10m. It can take up to 5 seconds to get all required peers initially, but it will get better with a bigger cluster.
HEADERS | ingress | egress |
---|---|---|
0 | 0.093270 mb | 0.057930 mb |
1 | 0.093826 mb | 0.063409 mb |
2 | 0.097032 mb | 0.055281 mb |
3 | 0.139220 mb | 0.063438 mb |
4 | 0.095371 mb | 0.056707 mb |
Then containers are restarted and devices will be reconnected immediately, so app starts in the low-sync mode right from the beginning.
HEADERS | ingress | egress |
---|---|---|
0 | 0.038832 mb | 0.026244 mb |
1 | 0.030502 mb | 0.028702 mb |
2 | 0.040454 mb | 0.025824 mb |
3 | 0.040807 mb | 0.021741 mb |
4 | 0.043946 mb | 0.017592 mb |
We should make it clear that device will have to spend some time for initial peers discovery (potentially 10s for really bad connections).
Additionally, after peers are connected app waits in idle mode for 10m.
That idle mode is enforced by us (status-go) or it's DiscV5 feature?
These numbers look really promising! Is the code available somewhere already?
That idle mode is enforced by us (status-go) or it's DiscV5 feature?
i meant that i am waiting in test for additional 10m after peers are connected :) to get a better understanding how traffic changes
everything is available: all status go code is here https://github.com/status-im/status-go/pull/736 . I will try to split it later if there will be problems with review
and the test is https://github.com/status-im/status-scale/pull/4
i will probably also make a doc with explanation how it all works specifically in our case
Preamble
Summary
Discovery protocol is essential for long-term scaling of the Status app. It will improve security and reliability of the app and prepare it for the beta launch.
Swarm Participants
Product Overview
Discovery protocol will allow us to dynamically change the number of peers in our cluster and also distribute the peers among various providers and countries without rebuilding the app itself, hence improving security and reliability.
The biggest challenge may be increased CPU and network usage. This needs to be carefully measured as it may lead to proposing some discovery protocol changes for light nodes.
Product Description
This is a research swarm. It must provide answers to the following questions:
Requirements & Dependencies
The discovery protocol should be testable locally with Docker, but for some more extensive tests, we may utilize a public cluster.
Minimum Viable Product
Goal Date: TBD
Description:
Success Metrics
Links
Copyright
Copyright and related rights waived via CC0.