near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.31k stars 619 forks source link

Define scalable "typical" traffic outside Social DB and integrate it into the load generator / test runner #9097

Open jakmeier opened 1 year ago

jakmeier commented 1 year ago

For the TPS benchmark, we know that we want some SocialDB workload (see #9095) but we want to combine it with some other "typical" workload.

For this, we probably want to mimic workload observed on mainnet. Ideally it should include traffic from our largest users atm (sweat, aurora, ...) but also non smart contract workload (creating accounts, near transfers, etc).

All if this should be defined in the locust setup.

### Tasks
- [x] Basic FT transfer
- [x] SWEAT FT transfers (#9281)
- [ ] SWEAT batches
- [ ] Aurora (postponed)
- [ ] nearcrowd (postponed) (#9285)
jakmeier commented 1 year ago

Traffic Today (Analysis)

I've looked at what mainnet traffic currently looks like. Very roughly, the summary is:

Proposed Benchmark Workload

Based on that rough analysis, I want to create a benchmark with 20M daily transactions distributed like this:

This tries to project how traffic could grow from today (300k daily tx) to 20M daily tx. One assumption we established at the start of the quarter was 5M of that should be from social DB. The remainder I have now defined.

I allocate 10M to FT use cases because Sweat today already takes >50% of TPS. But instead of forcing it all onto a single account, I want to spread the load across shards, so I split it evenly between Sweat and "other" FTs. That could also simulate more loyalty-program-like use cases coming to near, which isn't unlikely given the success of Sweat.

Then I thought aurora and nearcrowd shouldn't be missing either, given they have >10% of the tx volume each. Aurora is slightly larger today, so I think 3M aurora & 2M nearcrowd makes sense. Note that aurora transactions also consume much more gas than nearcrowd transactions.

Notably, de-fi is missing completely. We could add that later but for now I want to focus on the more common and more performance-critical use cases.

cc @akhi3030 @bowenwang1996 please let me know if you have concerns regarding my proposed benchmark workload

akhi3030 commented 1 year ago

This seems like a good approach to me.

jakmeier commented 1 year ago

Currently the plan is to:

jakmeier commented 1 year ago

I have been running with Sweatcoin batches for a while now and I managed to make it work. However, there is one remaining annoying issue: record_batch can only be called by a registered oracle account, and registering an oracle can only be done by the owner of the contract account.

Currently I create one oracle for each worker. The problem then is that all users of the worker will share one oracle. This results in nonce conflicts. Sometimes this results in a simple retry and things work fine after that. But the most annoying case is where the RPC node accepts the transaction, as the nonce would be valid at the time, but the validator filters it out in its tx pool. Then the user ends up waiting forever (cut off is currently at 30min) and never see a useful error.

I think I can resolve it by using multiple keys per oracle but I'm not sure if I'm willing to spend the extra effort on this just yet.