Closed jm-clius closed 8 months ago
tagging Testing team: @AlbertoSoutullo, @0xFugue, @Daimakaimura
NB: requirements and tasks may change as we encounter unknowns. The task breakdown below assumes that nothing more has to be done for Discovery and Bootstrapping other than proper configuration (to be described in the Scaling Strategy BCP).
Understand the expected:
for 10K community users.
Note: this is not necessarily an analytical exercise but ballpark figures and sanity checking current Status Community message rates. @Menduist has done analysis of message rates in large Discord servers to get to rough estimate of what we would expect to see for Status Communities. However, analysis of existing Status Community shows significantly higher message rate and bandwidth usage. See conversation.
Tracked in: ?? Owners:
Sharding strategy for Waku relay in general and Status Communities specifically. This plan will consider short term and longer term strategies. This item is set out in more detail in @kaiserd's Secure Scaling Roadmap.
Tracked in: https://github.com/vacp2p/research/issues/154 Owners:
Strategy and implementation to protect relay and store against simple DoS attack vectors. This item is set out in more detail in @kaiserd's Secure Scaling Roadmap.
Tracked in: https://github.com/vacp2p/research/issues/164 Owners:
Already part of https://github.com/waku-org/pm/issues/8 but repeated here for completeness. Note that this includes work to allow concurrent queries.
Tracked in: https://github.com/waku-org/pm/issues/4 Owners:
Tracked in: https://github.com/vacp2p/rfc/issues/563 Owners:
Basic testing to see that PostgreSQL implementation works at expected message and query rates. (Note this is in addition to simulation with Kurtosis).
Tracked in: ?? Owner: @LNSD
Revising the RFCs and implementations in nwaku and go-waku. Already part of https://github.com/waku-org/pm/issues/8 but repeated here for completeness.
Tracked in: https://github.com/waku-org/pm/issues/5 Owners:
RFC for basic peer management strategy and implementations in nwaku and go-waku.
Tracked in: https://github.com/waku-org/nwaku/issues/1353 Owners:
This can be seen as the final goal for all the moving parts and separate tasks listed above. Output will likely take the form of one or more Best Current Practices RFCs that focus on the Status 10K use case. It will bring together the short term strategies for sharding, DoS mitigation, bootstrapping, discovery and store configuration. It may include suggestions on when to use lightpush and filter rather than relay.
Tracked in: https://github.com/vacp2p/research/issues/165 Owners:
This is in addition to simulation with Kurtosis. Individual owners of each task will be responsible for testing and dogfooding their strategies/features. This task ensures that we have considered each item for targeted network testing, including:
Tracked in: ?? Owner: @jm-clius
multiaddrs
discovery: libp2p rendezvousAlthough it is possible to encode multiaddrs
in ENRs, which are currently being exchanged by all existing discovery methods, ENRs are limited in size and can consequently not contain more than one or two multiaddrs
. We need a discovery method more suitable for multiaddrs
. We have chosen libp2p rendezvous as solution here.
Tracked in: https://github.com/vacp2p/research/issues/176 Owners:
This is an outflow of the Community sharding plan as specified by @kaiserd and covers the implementation portion, including configuration and enabling shard discovery via ENRs.
Owners:
This is described in https://github.com/logos-co/wakurtosis/issues/7
It covers testing the scalability of the relay protocol, specifically measuring:
Owner:
This step will either confirm our (positive) assumptions about relay scalability or highlight bottlenecks/bugs in the protocol or implementations, which must be addressed and considered in the overall network roadmap.
Owners:
This is a collaborative task flowing from the results of the first test to refine the simulation(s) and plan the next, most useful tests.
Owners:
This is an administrative step. It may require updating the RFC to match the latest implementation, moving sections around, etc.
Owner:
Grasping the content of each protocol and how it maps to real-world Waku network traffic. This is potentially an involved task, so the scope should be minimized for this MVP. This relates to Verify scaling target requirements
under the Network Requirements.
Owner:
Provisioning a performant machine(s) which the dev team can use for sandbox testing features using ad-hoc Wakurtosis deployments.
Owners:
Integration test environment for nwaku. Most likely it will take the form of a pipeline that deploys a Wakurtosis network topology and runs a series of scripted integration tests for nwaku.
Owners:
Automated release pipeline for nwaku that builds a release, compile release notes and publish release binaries and tagged docker image for most common OSs/architectures.
Tracked in: https://github.com/waku-org/nwaku/issues/611 Owners:
Create a document that summarizes all the common tasks that a fleet owner generally has to do, including deployment, monitoring and debugging. This will also allow us to communicate to other platforms planning on deploying their own Waku fleets what they need to consider. The document should include a section on what Status fleet ownership specifically entails, including a procedure to log and escalate bugs/network anomalies.
Owner: @jm-clius
Based on the requirements determined above, determine who will take ownership of the Status fleets and schedule training sessions.
Owner:
5. Scalable storage: deterministic message ID
Tracked in: https://github.com/vacp2p/rfc/issues/563 Owners:
waku (protocol): @LNSD nwaku (implementation): @LNSD go-waku (implementation): @richard-ramos
The also known as Message Unique ID initiative progress is tracked in the following issue: waku-org/nwaku#1914
Thoughts on current status:
- [ ] 1. Verify scaling target requirements
Several discussions have happen. outputs I am aware of are:
@jm-clius @richard-ramos did we have more to this?
- [ ] 2. Community sharding plan https://github.com/vacp2p/research/issues/154 (note this issue also track 1mil work)
This can be closed as static sharding was delivered. The quoted issue also tracks for 1mil.
- [ ] 3. Simple Waku Relay DoS Mitigation https://github.com/vacp2p/research/issues/164
https://github.com/vacp2p/research/issues/164#issuecomment-1672531792
- [ ] 4. Scalable storage: nwaku archive PostgreSQL implementation https://github.com/waku-org/pm/issues/4 https://github.com/waku-org/nwaku/issues/1888
https://github.com/waku-org/nwaku/issues/1888#issuecomment-1672537221
This needs clean-up. Implementation of MUID to avoid dupe in store is done. Which was the main reason to do it for 10k. Moving forward, we could use MUID for gossipsub seen message logic, is that something we need for 1mil?
Then, MUID is possibly going to be used for Distributed store.
@jm-clius please confirm
- [ ] 6. Scalable Storage: testing store at scale
https://github.com/vacp2p/research/issues/191#issuecomment-1672542165
@jm-clius were we thinking DST simulation for this?
- [ ] 7. Filter and lightpush improvements https://github.com/waku-org/pm/issues/5
https://github.com/waku-org/pm/issues/5#issuecomment-1672547298
- [ ] 8. Peer management strategy https://github.com/waku-org/nwaku/issues/1353
https://github.com/waku-org/nwaku/issues/1353#issuecomment-1672547801
- [ ] 9. Combine into comprehensive scaling strategy https://github.com/vacp2p/research/issues/165
@jm-clius this seems done. Not sure if we tracked an output somewhere?
- [ ] 10. Targeted dogfooding
I suggest to descope this from Waku work. By delivering this milestone we enable Status to integrate Waku tech and start dogfooding. We are tracking hardening of Waku protocols as part of https://github.com/waku-org/research/issues/3 with 2.1
- [ ] 11. New multiaddrs discovery: libp2p rendezvous https://github.com/vacp2p/research/issues/176
https://github.com/vacp2p/research/issues/176#issuecomment-1672550555
- [ ] 12. Waku static sharding implementation
Done. What issue tracked the work/output? @jm-clius
- [ ] Setup staging fleet with static sharding for Status dogfooding
Last remaining task. Are we tracking somewhere @jm-clius ? edit: is this it? https://github.com/status-im/status-go/issues/3528
- [ ] Specify fleet ownerships requirements to enable Status team to maintain own fleet
The other last remaining task. Are we tracking somewhere @jm-clius ?
Thanks for revising, @fryorcraken. See my comments below.
Several discussions have happen. outputs I am aware of are: https://github.com/vacp2p/research/issues/177 @jm-clius @richard-ramos did we have more to this?
Afaik many of the suggestions have been implemented or are in the process of being implemented, also in status-go. @richard-ramos may have better idea of current status. Perhaps the work that's being done in status-go should be tracked there, which would mean the Waku side can be closed?
This can be closed as static sharding was delivered. The quoted issue also tracks for 1mil.
I agree.
This needs clean-up. Implementation of MUID to avoid dupe in store is done. Which was the main reason to do it for 10k. Moving forward, we could use MUID for gossipsub seen message logic, is that something we need for 1mil? Then, MUID is possibly going to be used for Distributed store.
Yes, I would close https://github.com/vacp2p/rfc/issues/563 as the only issue really needed for the 10K milestone. We also don't need to do anything else for the 1 mill milestone, but we can keep https://github.com/waku-org/pm/issues/9 open to track the work that would be necessary for the distributed store.
https://github.com/vacp2p/research/issues/191#issuecomment-1672542165
@jm-clius were we thinking DST simulation for this?
Initially, yes. But I think a reasonable step for the 10K epic would be (a) dogfooding and (b) local stress-testing of postgresql.
- Combine into comprehensive scaling strategy https://github.com/vacp2p/research/issues/165 @jm-clius this seems done. Not sure if we tracked an output somewhere?
Yes, I've gone ahead and closed the issue. The output here was just moving the RFCs to vac repo and revising them.
- Waku static sharding implementation Done. What issue tracked the work/output? @jm-clius
Main tracking issue was: https://github.com/waku-org/pm/issues/15 which I think can just be closed. There were also tracking issues in nwaku (and probably go-waku/js-waku).
Setup staging fleet with static sharding for Status dogfooding Last remaining task. Are we tracking somewhere @jm-clius ? edit: is this it? https://github.com/status-im/status-go/issues/3528
No, the first fleet that can be used for initial tests/dogfooding is tracked here: https://github.com/status-im/infra-waku/issues/1 Since this fleet has been deployed, this issue can probably be closed. This is not quite a staging fleet for Status yet, which I'll link to the issue I create for the Status fleet requirements below.
Specify fleet ownerships requirements to enable Status team to maintain own fleet The other last remaining task. Are we tracking somewhere @jm-clius ?
It is now: https://github.com/waku-org/pm/issues/61 Not a very detailed issue, but should do the trick. :)
I think suggestions from: https://github.com/vacp2p/research/issues/177 have not been implemented, or I could not find them on status-go code.
@jm-clius https://github.com/status-im/infra-waku/issues/1 tracks for "auto-sharding" I assume you mean it can also be used for static sharding dogfooding.
Weekly Update
All software has been delivered. Pending items are:
Monthly Update
Staging fleet for Status (static sharding + Postgres) has been defined and handed over to infra: https://github.com/waku-org/nwaku/issues/1914
Stress testing of PostreSQL in progress, INSERT
done, SELECT
in progress.
1k nodes simulation blogpost: https://github.com/vacp2p/vac.dev/pull/123/
Weekly Update
Weekly Update
Integration of static sharding in go-waku is continuing (see updates below).
Testing of PostgreSQL enabled some performance improvement in the implementation that are being implemented.
Internal instructions have been distributed to dogfood static sharding with the Waku team (Waku Discord private channel).
risks:
Weekly Update
Weekly Update
Weekly Update
Weekly Update
We will run one more week of internal dogfooding of static sharding + PostgreSQL in Status Communities. Once done and if no new issues are found. We will close this issue.
The go-waku and waku chat sdk team will continue to support Status with their integration of Waku v2 but no major effort is scheduled in term of software development and testing.
Weekly Update
https://github.com/waku-org/pm/issues/97 is now done. Status QA is proceeding with testing. Most changes are now focused on status-go with ad hoc bug/issue investigation from Waku team. This Milestone can now be closed :tada:
Priority Tracks: Secure Scalability Due date: 31 May 2023 Milestone: https://github.com/waku-org/pm/milestone/5
Summary
Tasks / Epics
Extracted questions
Network requirements
1. Message Delivery and Sharding
Assumptions:
2. Discovery
Assumptions:
3. Bootstrapping
Assumptions:
4. Store nodes (Waku Archive)
Assumptions:
5. Security:
Assumptions:
Other requirements
1. Kurtosis network testing
A simulation framework and initial set of tests that can approximate:
2. Community Protocol hardening
The Community Chat Protocols specifications are moved to Vac RFC repo.
3. Nwaku integration testing
Nwaku requires integration testing and automated regression testing for releases to improve trust in stability of each release.
4. Fleet ownership
Ownership for infrastructure provided to Status communities should be established. This may require training and transfer of responsibilities which mostly lies de facto within the nwaku team. Fleet ownership comprises the responsibility for:
Initial work
The requirements above will lead to a design and task breakdown. Roughly the order of work:
Ownership for all three items below is shared between Vac, Waku and Status teams:
(1) Agree on requirements above as the complete and minimal set to achieve the 10K scaling goal. (2) A viable, KISS network design adhering to "Network requirements" (3) Task breakdown of each item and ownership assignment