Open SionoiS opened 11 months ago
Apps would self segregate from other apps for efficiency
True, at least in terms of the service nodes they use and we specifically want to cater for this use case by making it easy for third party service providers to be contracted by apps for their exclusive service-provisioning. However, I think the advantage of of decentralized services will be such that many apps will use it, provided: 1) it's easy 2) it works reliably 3) it's cheap to use
(1) we can achieve with proper service discovery (discv5 topics?) and good defaults (filter being provided by default is a good start). (2) is a factor of proper SDKs and best practice documentation (e.g. subscribing for redundant services). (3) we are working on, but presumably the market will decide what is reasonable here.
Intro
In what context do we need to find nodes on specific shards and feature set?
Let's imagine future Waku, used by many apps with each many users. In this context, light nodes still don't contribute and can be seen as clients only. The servers would be Waku nodes, some supporting all protocols in the family, other dedicated to Filter or Store only. This modularity makes it difficult to predict the architecture of apps built on top of Waku. I will detail 2 possible scenarios, one with general service providers the other without. Keep in mind that a mix of both is the most probable outcome.
Apps sub-network
For this kind of app, each nodes would share the same exclusive peer list and also expect clients (and light-nodes) to bootstrapping from them. Theses apps would self segregate from other apps for efficiency. In other words, the nodes serving the same app would always connect to each others for Filter, Light Push, Store, Sync and not take part in discovery to reduce overhead. Some nodes must connect to outside nodes for Relay and discovery but those would be special.
We should also consider the more niche kind of apps that aim for full decentralization. Apps relying on edge computing, stewarded by decentralized and anonymous organisation. The form that these apps would take are even more amorphous. Any combination of node and protocols in the family could work thanks to Waku's modularity.
In this context, nodes would not need to "find" other nodes, most interconnections would be predetermined by which community a node is part of.
A market of general service providers
Imagine AWS, Google or Infura but for Waku. A service provider would have a website to manage client payments and authentication. In this case, all the information needed to use the service would already be available. A more interesting case would be anonymous service providers. Smart contracts would replace payment systems and ZK access tokens would gate the service or maybe RLN could be used. The only missing piece would be a good way to search for them. As the network grows it gets harder and harder to find a suitable service peer.
Paths to improvements
Some numbers;
In addition to the information below, see this Vac blog post about the limit of our current discovery mechanism and what could be done.
Misc. notes
Discv5 Service Discovery
DISCv5: robust service discovery details how the advertisement system works and provide various analysis. The specification is actionable but not implemented anywhere.
Advertisers place ads randomly along the way towards the topic via the use of a topic table. This results in ad density increasing closer to the topic. For searchers, the chances to find an ad by walking towards the topic increases as the number of peers placing ads increases.
Nodes don't accept all ads, a ticket system is used to prevent attacks and maintain fairness between different topics as some will be more popular than others.
What should the ad be? I see two possibilities here, the first would be to advertise both shard and protocol. This method increases the number of unique ads to advertise and reduces the "tickets" per ads. This could leads to weakness against various attacks because of the lower number of "tickets" but reduce the number of queries required.
Another way would be to advertise protocols and shards separately. This would increase the number of "tickets" per ads and reduce the number of unique ads (1 per protocol + 1 per shard) but it would require 2 queries instead of one and to cross reference the results to find matching peers.
This system would be useful in case we want to track more features in the future, e.g. content topics, new protocols, etc...
Sub-DHTs
IPFS Composable DHT would allow apps to share a base DHT, specify and discover other peers based on each features they support. Delivery date is unknown but work is ongoing. Not implemented anywhere yet. We could also implement this concept without waiting for IPFS, it consist mostly of biasing our peer selection in favor of peers with similar features in our routing table. This solution has not been studied for possible attacks but since it consist of random walks, we can expect it to be resilient.
Race 2 queries on sub-DHTs, one for the (Waku) protocol the other for the shard. Finding the correct sub-DHT might be fast depending on the peers already known but then finding the peer that fulfill the second parameter would be a random walk. Random walks are more resilient to attacks than storing values with close peers (in hash space) and in this case the concentration of suitable peers is much higher (25% and 12.5%) which speeds up discovery. As soon as one peer matches the query, that peer has a high probability of already be connected with other peers with similar feature set. We can expect that finding one or many peers sharing a feature set to be equally fast.
If service providers are required to register an RLN membership we may be able to limit sybil attacks in our hypothetical DHT.
Meridian
Self described as, a light weight framework for performing network positioning. Meridian is an overlay network structured around latency, in contrast to Kademlia which is based on XOR distance between peers. No DHT is built on top of this overlay network, it's sole purpose is to answer queries about a node position in "internet" space. Why is this useful? By itself, it cannot be used to find specific peers but can be a solid foundation. By combining; gossip based discovery, one routing table per feature and service clustering then finding specific service nodes can be done efficiently. By virtue of being so light weight, bandwidth cost and state can be increased without becoming prohibitively expensive for node operators. Although it does not solve the problem in a generally applicable way it might be good enough for us.
Meridian is designed to find the closest peer possible which could reduce the latency of all our protocols.
Provider DB
An alternative could be to maintain a DB of all providers (a prolly tree based index maybe?) so that every node can keep their own curated "provider list" but sync with others for updates. The process would be to just ask peers randomly until a suitable service provider is found. Since you cannot control what peers store in their provider DB it's hard to estimate the performance of a query. On the other hand, the system is harder to attack since there's no structure like in a DHT. There is a risk of centralization but with easy replication and sync, it is minimal. There is a question about incentives, why would nodes store providers information?