vacp2p / research

Thinking in code
MIT License
62 stars 4 forks source link

Waku v2 - umbrella epic #39

Closed oskarth closed 3 years ago

oskarth commented 4 years ago

Waku v2 umbrella issue, see https://forum.vac.dev/t/waku-version-2-pitch/52/2 for context.

Problem

The Waku network is fragile and doesn't scale.

As Status is moving into a user-acquisition phase and is improving retention rates for users we need the infrastructure to keep up, specifically when it comes to messaging.

Based on user acquisition models, our initial goal is to support 100k DAU in September, with demand growing from there.

With the Status Scaling Model we have studied the current bottlenecks as a function of concurrent users (CCU) and daily active users (DAU). Here are the conclusions.

1. Connection limits. With 100 full nodes we reach ~10k CCU based on connection limits. This can primarily be addressed by increasing the number of nodes (cluster or user operated). This assumes node discovery works. It is also worth investigating the limitations of max number of connections, though this is likely to be less relevant for user-operated nodes. For a user-operated network, this means 1% of users have to run a full node. See Fig 1-2.

2. Bandwidth as a bottleneck. We notice that memory usage appears to not be the primary bottleneck for full nodes, and the bottleneck is still bandwidth. To support 10k DAU, and full nodes with an amplification factor of 25 the required Internet speed is ~50 Mbps, which is a fast home Internet connection. For ~100k DAU only cloud-operated nodes can keep up (500 Mbps). See Fig 3-5.

3. Amplification factors. Reducing amplification factors with better routing, would have a high impact, but it is likely we'd need additional measures as well, such as topic sharding or similar. See Fig 8-13.

Figure 1-5 [see https://forum.vac.dev/t/waku-version-2-pitch/52/2]

See https://colab.research.google.com/drive/1Fz-oxRxxAFPpM1Cowpnb0nT52V1-yeRu#scrollTo=Yc3417FUJJ_0 for the full report.

What we need to do is:

  1. Reduce amplification factors
  2. Get more user-run full nodes

Doing this means the Waku network will be able to scale, and doing so in the right way, in a robust fashion. What would a fragile way of scaling be? Increasing our reliance on a Status Pte Ltd operated cluster which would paint us in a corner where we:

Tracks