near / near-one-project-tracking

A repository for tracking work items that NEAR One is working on.
0 stars 0 forks source link

[ProjectTracking]: congestion control #48

Open jakmeier opened 9 months ago

jakmeier commented 9 months ago

(Supersedes #11)

Goals

We want to guarantee the Near Protocol blockchain operates stable even during congestion. This is currently impeded due to a lack of cross-shard congestion control. Specifically the delayed receipt queues of each shard may grow indefinitely during congestion, which is bad since those are part of the state tries.

With this project, we want to ensure all queues of receipts have a fixed limit in size. The queues in question are:

  1. Delayed receipts queue
  2. Postponed receipts queue
  3. Any new queues introduced by the congestion control solution itself

Current status: Local Congestion Control

Right now we have fully implemented Local Congestion Control, meaning that shard validators and RPC nodes will not be overloaded by transactions coming into their local transaction pools and receipts generated from local transactions.

Technically, this is achieved by introducing two limits:

For the exact implemented features see https://github.com/near/nearcore/milestone/26?closed=1

Next steps: Cross-Shard Congestion Control

To achieve our goal of bounded queues, we need global congestion control. The solutions currently in consideration fall into two categories.

For both categories, many ideas have been discussed already but there has been no clear winner so far.

To move this issue forward, we plan the following steps:

Likely, this will require protocol changes, so we should also add this step:

Links to external documentations and discussions

NEP-539 Cross-Shard Congestion Control

Congestion control design proposal documentation, March 2024.

Kick-off for global congestion control February 2024

State of congestion control in September 2023. This document also provides links to additional docs.

Current Zulip thread (feel free to drop any questions or comments there or here in GitHub)

Zulip thread September 2023

High-level overview of final proposal, as accepted in NEP-539: Slides

Estimated effort

We aim to solve most of the engineering work by 31st of May, 2024, with @wacban and @jakmeier working on it 50% each. This is unlikely to be the perfect solution (see Assumptions and Out of scope below) but it should guarantee bounded queues.

Taking NEP approval and the release schedule into account, we expect this to be live in mainnet some time around July or August 2024.

Assumptions

Pre-requisites

Work starts immediately, no pre-requisites needed.

Out of scope

walnut-the-cat commented 8 months ago

Either: We drop messages that do not fit in the fixed sizes of the queues. Then we provide a solution to deal with dropped (failed) receipts to contract developers. Or: We apply backpressure from congested shards to other shards. Shards experiencing backpressure need to throttle the amount of outgoing receipts to the congested shards.

Maybe I am not super clear on what you meant by 'Either' and 'Or' here.

jakmeier commented 8 months ago

I just wanted to make it clear that the two categories of solutions discussed so far are split. One set of solutions involves dropping receipts on the go. The other involves backpressure. Either of those can work independently of the other.

But I guess it makes it sounds as if they are incompatible. That's not true, indeed we could combine the two approaches for the final solution.

jakmeier commented 8 months ago

Quick status update:

Done We have a basic model (nearcore/10695) and even the ability for local Grafana dashboard to look at the results (nearcore/10719).

A few sample workloads and strategies are also already included. But those are more of a demo of the model. The workloads are too simple to give complete picture. And the strategies are mostly just demos or exploring specific ideas in isolation. None of the strategies would be a suitable proposal.

Ongoing work This week we are looking into specific strategies and workloads. @wacban and I discussed general ideas and what exactly we want to try this week. We will share relevant results from the experiments when we have them.

And on the way, we will improve the model and the output as necessary.

Progress vs Time Estimate The original plan was to come up with a proposed strategy based on model output by next week. I think we are just in time to hit that mark. Then we can start working on a design document in the form of an NEP and start gathering feedback from everyone else.

jakmeier commented 8 months ago

Status update:

Done

Based on several workloads and strategies we simulated, we collected ideas and evaluated which of those are good and which are bad or useless.

This has lead to two main strategies we've looked at in more details:

We then compared the two in this document: https://docs.google.com/document/d/1wVQIF0cgilO9m-iI_P5HK6MVc0b6RAxsxtyTZ1nMnBs/edit?usp=sharing

The final result is a merge of the two ideas and results in:

Ongoing work

Progress vs Time Estimate:

Projected Solution Quality

Initially we defined a set of must-have properties and a set of aspiration properties. Let's check in on them on which we think we can achieve.

Must-have:

Queues in the system are bounded in how many bytes they require. (This is per chunk)

We will have limits in place. But they won't be explicit limits in bytes that we can guarantee. So this requirement will not be fulfilled as cleanly as we hoped for.

The NEP for this is approved and merged.

The implementation is merged to the nearcore master branch, stabilized, and ready for testnet deployment.

It looks like we will hit those in time.

Aspirational Properties

Queues are small enough to be kept entirely in memory

Again, we don't have hard guarantees in our solution. But we believe this trade-off is necessary and will still ensure that in all but the most targeted malicious cases, it will fit into memory. And certainly, it will be a strict improvement over today's system in malicious cases.

Once accepted, transactions and all following receipts will not fail.

We will satisfy this.

Bounded latency to resolve a receipt.

We will satisfy this and can still decide what we want to guarantee to be, trading it against utilization in marginal cases.

In a non-congested setting, every shard can run at full speed.

Satisfied.

Transactions that only touch non-congested shards are not affected by congested shards.

Partially satisfied. Backpressure means every shard that is on the path of congesting flows will become congested and experience negative consequences, even if their shard on its own wouldn't be congested. But all other shards are completely unaffected.

The same strategy can be used for any number of shards.

This seems fulfilled about as well as we can expect it to.

jakmeier commented 7 months ago

Status update:

jakmeier commented 4 months ago

Overdue update: