Open jm-clius opened 1 year ago
Below an initial task breakdown to achieve the epics listed in the issue description.
Single RFC that specifies the Waku Network. Some details that should be included:
Owner: @jm-clius (with specific sections covered by relevant owners) Priority: Critical for launch
Note: This is already done.
Tracks the work necessary to design and specify the mechanism to hash content topics to the shards defined for the gen 0 network.
Owner: @SionoiS Priority: Critical for launch
Note: This is already done in nwaku.
This tracks the work necessary in each client to provide API(s) to applications using req-resp protocols (store, lightpush, filter, etc.) with optional pubsub topic arguments. Each client should then use autosharding to determine the associated shards on behalf of applications. Note that underlying protocols should not be affected and each client implementation should locally populate pubsub topic fields with the shards hashed from content topics received via the API.
Owners:
Priority: Critical for launch
Currently the only way to manipulate a relay node's pubsub topic subscriptions is through an explicit subscribe
call with the pubsub topic as argument or via static configuration when setting up the node. This task tracks work to add a subscribe
method to relevant Relay APIs allowing applications to provide desired content topics as (relay) subscribe arguments, while the node translates those to relay subscriptions for specific shards. This task should involve a mechanism where an application is only notified of new messages on subscribed content topics, instead of all messages on the subscribed shards.
Owners:
Priority: Critical for launch
This tracks work necessary to change req-resp protocol requirements for pubsub topic fields to be optional in wire specifications. This implies a spec change, changes to clients to use the simplified specs and changes to server-side Waku implementations to interpret client requests with no populated pubsub topics correctly.
Owners:
Priority: Not critical for launch (all main benefits of autosharding is already achieved via API changes)
Once the critical parts for autosharding has been implemented, we should dogfood the end-to-end operation. This could constitute a soft launch of the Waku Network with the first traffic relayed on all new network shards, albeit without any DoS mitigation mechanisms in place. Specific things to test include reviewing how well autosharding works with sharded discovery and bootstrapping (mechanisms that were developed previously for static sharding).
Owner: @SionoiS Priority: Critical for launch
This tracks the work to define and implement a relay validator that allows nodes to limit the amount of bandwidth they're willing to allocate for routing "free traffic". In this case, "free traffic" would be Waku messages with no RLN proofs.
Owners:
Priority: Critical for launch
This tracks work to ensure we have proper metrics and network-wide dashboard available to monitor bandwidth usage per shard. This should also consider other types of metrics (e.g. number of applications/content topics per shard, node distribution, etc.)
Owner: @vpavlin Priority: Not critical for launch
This tracks work to guide node operators on how to configure third party tools on various platforms to set a hard limit on Waku bandwidth usage as a failsafe mechanism.
Owner: @vpavlin Priority: Not critical for launch
With autosharding, nodes' shard subscriptions (relay) are influenced by:
Owner: @vpavlin Priority: Not critical for launch
Tracks work necessary to ensure that nodes can filter discovered peers according to the shards the node is currently subscribed to and for the node to update its own discoverable ENR every time its subscription set changes.
Owners:
Priority: Critical for launch.
If we assume that peer discovery takes care of filtering peers by shard, these peers must now be managed in a way that makes sense in a dynamic auto-sharded environment. Some requirements/ideas:
n
peers for each subscribed shard (note that dynamically subscribing/unsubscribing from shards could complicate this)Owners:
Priority: Critical for launch
Ensure that new and existing Waku fleets use PostgreSQL by default.
Owner: @Ivansete-status Priority: Critical for launch
Optimising the DB schema in general for faster queries. Task also includes guidelines for optimising PostgreSQL configuration for performance, if there's anything to be done (e.g. suggestions re replication, etc.)
Owner: @Ivansete-status Priority: Critical for launch
Status and other applications are starting to use the latest version of these protocols. It is likely that specific support queries or protocol improvement requests arise from such dogfooding efforts.
Owner: ??? Priority: Critical for launch
Filter relies in many ways on the same building blocks as relay for its reliability, but in a modular, "pick your own tradeoffs" way:
filter
, but you could imagine using occasional store
queries to achieve something similar)As such it will be helpful to provide a configurable "reliability" SDK on top of filter for projects without the capacity to build these features from the ground up with filter.
Owner: @danisharora099 Priority: not critical for launch
Currently a service peer (e.g. a store node) is either provided by the application or it expects the peer manager to suggest a suitable service peer (selectPeer()
). This task focuses on the latter: peer manager should provide ability to find service peers for a specific shard, based on the content topic that the application is interested in (e.g. an API call to selectPeer(Protocol, ContentTopic)
), or kick off an ad-hoc discovery process until a suitable service peer is found. Client APIs (for filter, lightpush, peer-exchange) should automatically make use of this mechanism if the application does not explicitly set the desired service peer.
Owners:
Priority: critical for launch
Note: This item is underdefined as its completion leads to the definition of many subtasks and a roadmap for the Store protocol
This tracks the work to specify/define the roadmap for distributed store services. This includes:
Owner: @jm-clius Priority: not critical for launch
Note: This work has already started.
Benchmarking RLN in a production environment by incorporating it into local simulations.
Owners: @alrevuelta / @rymnc Priority: critical for launch
Once we have enough confidence in RLN performance after benchmarking, RLN validation should be enabled by default for relay nodes and deployed to relevant fleets (i.e. fleets that serve RLN-protected shards). For example, once autosharding dogfooding has started, RLN relay could be enabled in the same fleets allowing RLN dogfooding to start as soon as the first members start publishing. Membership contract could at this stage still be deployed only to a testnet. Membership mechanisms are tracked as separate tasks. This task is dependent on a suitable membership contract being deployed and configured on validators.
Owner: @rymnc Priority: Critical for launch
Ensuring that on all Waku clients publishers can:
Owner: @rymnc (but delegate implementation ownership for different Waku clients) Priority: Critical for launch
To kickstart dogfooding, a membership mechanism mostly focused on getting internal CCs to start using RLN. This could e.g. provide instructions to internal CCs on how to generate, configure and keep a suitable key for RLN, collect public key material for eligible CCs (and other parties) via a Typeform, register a membership on their behalf in a membership contract and ensure that deployed nodes are aware of this contract. At this stage the membership contract can still be deployed on a testnet, although we may want to migrate these initial members to a "final" mainnet contract once stable.
Owner: @rymnc Priority: Critical for launch
Mainnet contract (and instructions/guide) where anyone willing to pay the registration transaction fee can register a new membership, with membership limit set upfront. At this stage deployed RLN validators should build memberships off mainnet.
Owner: @rymnc Priority: Critical for launch
Once the correct contracts are in place, RLN validation is enabled by default on all nodes and the necessary features implemented for publishers to generate RLN proofs, end-to-end dogfooding can commence on the Waku Network.
Owner: @rymnc Priority: Critical for launch
Although technically tracked under the general track (1), I've listed this as the last task. Once all tasks labelled as critical for launch has been completed, launch and dogfood the Waku Network MVP as set out in https://github.com/waku-org/research/issues/1 by the end of 2023.
For go-waku, I have the following observations:
Autosharding this work has not been started in go-waku, but looking at the implementation on nwaku, i believe we should reach feature parity quickly! I'll extract the list of task tomorrow at early morning my time so a milestone can be defined in go-waku for it.
Task: Validation mechanism to limit "free traffic" on the network There's an open PR in https://github.com/waku-org/go-waku/pull/616 with this functionality. It needs to be modified to comply with the design mentioned here. I can be assigned as owner for that task.
Task: Failsafe mechanism (guide) for BW limiting: I understand that the work done here for nwaku can (probably) be reused with go-waku as well, since it seems to rely only on documentation and third party tools
Few observations/suggestions:
High level Suggestion: It would be good if we can indicate what is the expectation/priorities from each Waku implementation for the Waku Network MVP. This would help us also think specifically what else maybe required to be ready for the network launch. My assumption/understanding is nwaku
would be used as service-node(run only by node operators), go-waku
and js-waku
would be used by developers/applications to build light/full node clients to interact with service nodes.
As the network goes live with Auto-Sharding and gen0 shards, is there some metric users/node-operators can refer to get an idea of the bandwidth usage for each shard? This would help them decide which shards to support/use. Or will this also be defined as part of the Network spec itself?
This tracks the work necessary in each client to provide API(s) to applications using req-resp protocols (store, lightpush, filter, etc.) with optional pubsub topic arguments.
Shouldn't the API take an optional content topic or am i missing something? The idea with Autosharding is that application/users need not worry about pubsub topic and only care/exposed about content-topic right.
Task: Peer management with shard as a dimension: You can assign this to me.
Task: Service peer selection on specific shards : You can assign this to me.
I think we should have some sort of a network view/health view( maybe a separate milestone) which provides some information regarding health of the shards and network. Something like this https://beaconcha.in/
- how many relay peers are available, service peers per shard.
- Current bandwidth utilization of each shard
- how many nodes for each implementation are online.
Task: Distributed store service roadmap - Can assist you in case you need help
Thanks for the comments so far! Answering one by one:
@richard-ramos
Autosharding this work has not been started in go-waku, but looking at the implementation on nwaku, i believe we should reach feature parity quickly!
May be worth syncing with @SionoiS here on timeline, as there're still some architectural discussions happening during PRs (in other words, nwaku's current changes may be provisional in some aspects).
Validation mechanism to limit "free traffic" on the network There's an open PR
Great! Have added you as owner for go-waku.
Failsafe mechanism (guide) for BW limiting: I understand that the work done here for nwaku...
Yes, I think these will be general guidelines that apply to all clients. Have updated ownership to reflect.
@chaitanyaprem
Good comments re contextualising role of each client. I think this also depends on the client team themselves, but generally nwaku/go-waku aim to have full range of capabilities while js-waku focuses on client-side protocols. nwaku is indeed the one we direct operators to run, but if e.g. Status integrates with the Waku Network, similar capabilities would be required in go-waku (we want Desktop nodes to be runnable as a store/filter/lightpush service node).
is there some metric users/node-operators can refer to get an idea of the bandwidth usage for each shard
Ensuring we have proper metrics is indeed implicit especially under the 1.3 tasks, but good to have this mentioned! Existing metrics might already be sufficient. Note that the MVP for the Waku Network currently assume that nodes simply subscribe to the shards belonging to subscribed applications, with more sophisticated auto subscribe/unsubscribe roadmapped for the future. For operator nodes (those not running a specific application), the metrics will be helpful.
Shouldn't the API take an optional content topic
Ah, I see why this may be confusing. In short - the API calls generally takes a mandatory content topic and mandatory pubsub topic. While content topic must remain mandatory, the APIs should be modified to make it optional for applications to provide a specific pubsub topic (i.e. it can be inferred from the provided content topics).
I think we should have some sort of a network view/health view
Great idea! I think the existing Waku Network Monitor will already cover much of the required functionality here, but, indeed, we have to make sure that we have all the metrics we need and an accessible dashboard for it.
Thanks for volunteering ownership elsewhere, will add.
I added 1.4 as a milestone in Nwaku repo
Tasks that do not have specifically assigned owners:
1.2 Autosharding for autoscaling
1.3 Node bandwidth management mechanism
1.4 Sharded peer management and discovery
2.1 Production testing of existing protocols
2.2 Sharded capability discovery for light protocols
More observations product of @chaitanyaprem and I's weekly sync call:
Task: SDK for using filter/lightpush: randomness (selecting random peers for connection/subscription, preferably with some peer cycling), periodically checking that you received all messages against a cache (this doesn't really exist yet for filter, but you could imagine using occasional store queries to achieve something similar)
Task: Store/archive schema optimisations: Optimising the DB schema in general for faster queries. Task also includes guidelines for optimising PostgreSQL configuration for performance, if there's anything to be done (e.g. suggestions re replication, etc.)
Task: Failsafe mechanism (guide) for BW limiting Task: Dynamic subscribe/unsubscribe mechanism (or similar)
Would be good to note this is important for Status
Regarding any relay specific work in js-waku: This should be done in js-waku but is not critical for launch. Cc @waku-org/js-waku-developers
Regarding any "service node" work for js-waku: not needed at this point in time (e..g ENR update)
Regarding nwaku vs js-waku vs go-waku. Let's align on the priorities and what is critical for launch:
This does not mean that nwaku should not implement client side of req/resp protocols or go-waku should not implement postgresql.
What it does mean is that nwaku should prioritize operator tools (e..g bw management) over redundancy capabilty for filter as client. Or that go-waku should prioritize RLN free traffic over PostgreSQL implementation.
Let's focus in what is critical for Waku Network Gen 0.
Do note:
Let me know if disagree/not clear.
- go-waku as a relay and light client node, meaning relay and client side of req/rep protocols
Edit: Need to clarify rust-bindings usage from TheGraphcast as I believe they use it as a service node too, if so, we should include go-waku as service node for bindings in scope.
Task: SDK for using filter/lightpush
I would suggest we do this for go-waku and rust bindings too.
cc @waku-org/go-waku-developers
Do we need to include REST API also as an item to be tracked as part of this?
- go-waku as a relay and light client node, meaning relay and client side of req/rep protocols
Edit: Need to clarify rust-bindings usage from TheGraphcast as I believe they use it as a service node too, if so, we should include go-waku as service node for bindings in scope.
Currently TheGraph only use rust bindings as a "client" or relay node and do not provide filter/lp service.
Do we need to include REST API also as an item to be tracked as part of this?
Yes it's a good point, I think we have think about the REST API needs to evolved based on the changes here. cc @jm-clius Also from the convo here: https://github.com/waku-org/bounties/issues/9 there may be opportunities to enhance the API too for more flexible usage of RLN for example.
Do we need to include REST API also as an item to be tracked as part of this?
Indeed. This task breakdown tries to answer "what needs to be done for the network to work?". Within each client there are some implied tasks that I did not necessarily include, but which could be tracked. For example, "Autosharding API for (relay) subscriptions" implies that the API(s) within each client should be extended, depending on which APIs in those clients are used by applications. In nwaku's case this would include the REST API.
Do we need to include REST API also as an item to be tracked as part of this?
Indeed. This task breakdown tries to answer "what needs to be done for the network to work?". Within each client there are some implied tasks that I did not necessarily include, but which could be tracked. For example, "Autosharding API for (relay) subscriptions" implies that the API(s) within each client should be extended, depending on which APIs in those clients are used by applications. In nwaku's case this would include the REST API.
OK, maybe i should have clarified. Is REST API a priority for go-waku for this launch? It may not be, considering go-waku serviceNode is not a priority for this launch.
Is REST API a priority for go-waku for this launch?
Afaict, no, it's not a requirement for the launch.
Background
For more background on the public Waku Network MVP, see https://github.com/waku-org/research/issues/1. This issue provides an overview of the proposed roadmap and parallel tracks to achieve this end goal.
For a graphic representation of the roadmap and epics, see the Miro board
Tracks
Individual epics are roughly sorted under tracks that can in most cases be tackled in parallel, although there are some interdependency between tracks as qualified below.
1. General Track
This tracks the main body of work to build an autoscaling, autosharded network and provides the foundation for the other tracks.
Epic 1.1: Network requirements and design
Goal: 31 Aug 2023
This tracks the work necessary to determine the specifications for a public Waku Network and a rough design to achieve this. The deliverable of this epic is to answer questions such as:
Epic 1.2: Autosharding for autoscaling
Goal: 30 Sept 2023
This epic tracks designing and implementing an autosharding strategy based on content topics and the design requirements established in
1.1
. The first phase would likely be to launch with a limited number of shards (perhaps no more than 128?). More shards can be added as the network grows organically, perhaps using some "generation ID", that allows opening more shards with successive generations.Epic 1.3: Node bandwidth management mechanism
Goal: 31 Oct 2023
As the autosharded public network grows and traffic increases per shard, we want to provide some bandwidth management mechanisms for relay nodes to dynamically choose the number of shards they support based on bandwidth availability. For example, when the network launches, it's reasonable for relay nodes to support all shards and gradually unsubscribe from shards as bandwidth increases. The minimum amount of shards to support would be
1
, so the network design and DoS mechanisms (seeTrack 3
) would have to provide predictable limits on max bandwidth per shard. We could also envision a bandwidth protection mechanism that drops messages over a threshold, but this will affect the node's scoring so should be carefully planned.Epic 1.4: Sharded peer management and discovery
Goal: 30 Nov 2023
This epic tracks work necessary to allow applications to interact with shards in a transparent manner. Relay nodes should preferably be subscribed to the shards carrying the content topic(s) the application is interested in. Where this is not the case, the application should be able to interact with service peers within the shards it's interested in via a peer management scheme. The peer manager should be tracking peers from all public shards or be able to discover such peers on demand.
Epic 1.5: Launch and dogfood integrated public Waku Network MVP
Goal: 31 Dec 2023
Combining outputs from all other tracks and epics, launch and dogfood the Waku Network MVP as set out in https://github.com/waku-org/research/issues/1 by the end of the year.
2. Service protocols track
This tracks work necessary to make service protocols, specifically
filter
,lightpush
,peer-exchange
andstore
work in a public network and in a decentralised manner.Epic 2.1: Production testing of existing protocols
Goal: 31 Aug 2023
All service protocols are currently in alpha/beta implementation phase and have undergone significant changes aimed at delivering the Status MVP. Filter protocol, for example, has been redesigned from the ground up, while a new PostgreSQL archive backend for the Store protocol has been added. Before building on these protocols, their use in a production environment must be properly dogfooded and hardened.
Epic 2.2: Sharded capability discovery for light protocols
Goal: 30 Sept 2023
Ability to discover peers providing
filter
,lightpush
andpeer-exchange
services within the decentralised network. To the application this should present as a peer selection mechanism that translates content topics to underlying shard.Epic 2.3: Basic distributed Store services
Goal: 31 Dec 2023 (dogfooded and integrated in 2024)
Ability to discover Store nodes that provide store services for specific content topics and time ranges and for service nodes to advertise their store services as such. This includes work to allow Store service nodes to only store messages for specific content topics/applications.
Epic 2.4: Basic design for service incentivization
Goal: post 2023
Once services can be provided in a distributed (if not fully decentralised) manner, we need to revisit previous ideas on incentivising provision of such services (e.g. using a service credentials approach). This will become a large research track on its own. Only the first step is tracked here to show how it relates to work for the public Waku network.
3. DoS Protection track
Epic 3.1: DoS requirements and design
Goal: 31 Aug 2023
This builds off the general specification phase to answer questions specific to DoS/spam protection in the public Waku Network. The deliverable is answer to questions such as:
Epic 3.2: Basic DoS protection in production
Goal: 31 Oct 2023
An RLN mechanism implemented and productionised, which includes:
This phase would include allowing applications to provide their own DoS protection mechanisms by opening a message validation API.
Epic 3.3: Membership for Status Communities
Goal: 30 Nov 2023
A mechanism to allow Community Owners to assign RLN membership to community members.
Epic 3.4: Production and memberships on mainnet
Goal: 2024
Expanding membership (and therefore high QoS access) of the public Waku Network to third parties and more community members. General membership that falls outside the targeted member groups may require a staking mechanism to be in place.
This may include a membership mechanism to allow Status users to "sign up" to use 1:1 chat with DoS protection.
For this epic, RLN contracts must be deployed to mainnet.
Further epics: Slashing, staking and incentivisation
Goal: 2024
Cryptoeconomic design to allow "full RLN" with staking, slashing and incentivisation of spam detection implemented. This follows the launch of the public Waku Network and extends beyond "basic" DoS protection.
4. Status Communities Track
This tracks epics performed by Status product developers for the rollout of Status Communities over Waku.
Epic 4.1: ~10 Logos internal communities
Goal: 31 Aug 2023
With the Status 10K MVP reaching feature-completeness, dogfooding of statically-sharded communities, with all the features designed around scalability of this model, must start in earnest. The suggestion is to start with a single Test Community, followed by a Status CC Community and then one for each of the existing Logos teams/communities (Vac, Waku, Codex, Nomos, etc.). Features such as Media Shards, Control Message sharding, etc. must be implemented and tested fully during this period.
Epic 4.2: ~100 statically-mapped communities
Goal: 30 Sept 2023 (and continuing to onboard more communities afterwards)
Inviting interested parties to move their communities to Status, with services such as store and bootstrapping fully provided by Status during this period. Specific static shards are assigned to each of these communities to allow Status to provide the necessary services until Community Owners (and members) can do so themselves using mechanisms introduced for Waku distributed services (see
2.3
).Epic 4.3: Open (permissionless) communities
Goal: 30 Sept 2023
This allows "anyone" to create a community, with some services provisioned by Status. This can be the mechanism for people creating communities via the app:
4.2
) and to have their data migrated to this new shard.Epic 4.4: Open communities on public Waku network
Goal: 31 Jan 2024
Once the public Waku network has the following required features, open communities can move from their weakly-protected shared shards to the autoscaling public Waku network:
Epic 4.5: Distributed Community services
Goal: 2024
Large/statically-mapped communities will continue operating on their own shards with Status-provisioned services for as long as Status can provide this service. Distributed Waku services will also allow these communities to provision their own Store/service/bootstrap nodes or make use of third party service providers.
Risks