stellar / stellar-core

Reference implementation for the peer-to-peer agent that manages the Stellar network.
https://www.stellar.org
Other
3.13k stars 973 forks source link

Improve "apply-load" to cover more network settings and scenarios #4520

Open MonsieurNicolas opened 1 month ago

MonsieurNicolas commented 1 month ago

Tagging as a discussion for now.

Right now it's fairly difficult to validate proposed ledger wide limits, yet having a good understanding of their impact can really help with

Here is a high level way we could iterate on bench-marking capability in this context:

dmkozh commented 4 weeks ago

we should document how to build a "reference min-spec node" that can be used to run those benchmarks. I

That's a good idea, but I'm not sure which spec this should be. I'm not sure we should hinder ourselves too much, given that validators should have much more reasonable specs.

should get a new method that performs more interesting work

There are some issues with this contract, but I'm not sure if they're about not 'interesting' enough work. I don't think randomization on the contract side is necessary. I'm also not even sure if we should modify any entries on the contract side at all, it might be better to just pass them through (at least for now; eventually we might start filtering out no-op writes). Basically all the IO happens based on the footprint and it's not very useful to do anything about the ledger entries on the host side. A big issue about this benchmark so far has been variance, I would work on eliminating it as much as possible before trying anything fancy.

ledger state is generated by adding contract data that will correspond to all IDs reached during a run

Yeah, I've been thinking about BL emulation as a next step. Appending directly to the buckets is an interesting idea, I didn't know that's possible. I've no clue as to how much that will change the results though.

basically, ApplyLoad::benchmark() should always apply transactions over the same state (so same ledger number), by resetting its state after applying a ledger

That's a great idea, thanks.

MonsieurNicolas commented 4 weeks ago

That's a good idea, but I'm not sure which spec this should be. I'm not sure we should hinder ourselves too much, given that validators should have much more reasonable specs.

I think the issue is that people's laptops are likely much faster at a lot of things than the typical validator in AWS and in general we need a way for everyone to normalize numbers so that we can have sane conversations.

The actual docker args cannot be hardcoded (at least CPU seems to be relative to the host CPU), but we can and should document the absolute numbers in terms of IOPS and core performance (probably look at some raw number easy to benchmark like X sha256/second so that people can easily tweak their setup to get close enough to that performance) . Note that we may already have some instructions/second "basic validator" number as we need something like that in the context of calibration. IOPS -> at some point we'll shoot for 1M IOPS, but for now conservative number should be more like 100k IOPS.

Basically all the IO happens based on the footprint and it's not very useful to do anything about the ledger entries on the host side. A big issue about this benchmark so far has been variance, I would work on eliminating it as much as possible before trying anything fancy.

yeah I don't think you actually need to modify entries within the contract, but I am pretty sure you need to call the put host function (how entries end up in the "modified" list) -- otherwise we'd allow contracts to rewrite entries they don't own.

dmkozh commented 3 weeks ago

but I am pretty sure you need to call the put host function (how entries end up in the "modified" list)

Nope, currently we actually overwrite all the RW entries unconditionally, the security guarantee comes from the host not allowing contracts to modify entries they don't own. We might start filtering out no-op changes eventually, but I'm not sure if that's a useful optimization for the real traffic.

sisuresh commented 4 days ago

@MonsieurNicolas you mentioned using --cpu-period=100000 --cpu-quota=40000 to limit resources as an example. If the docker image is being run on one of our dev machines (same hardware as our validators) is there any reason to limit CPU? I don't think so. I couldn't get the iops limiting args to work on my laptop, which is why I'm using a dev machine.

MonsieurNicolas commented 3 days ago

Reason for doing something like throttling is that we want to be able to both normalize tests (so that people can try things out on their laptop or any hosted environment) and to get a reasonable estimate of performance for both "current proposed" and "future safe" on baseline hardware.

CPU line performance and IOPS are both problematic: if you test on hardware that has better capabilities than baseline, it will give a false sense of performance, so we need to mitigate for this.

I think that throttling is not the only acceptable solution.

For example if you can estimate the ratio, it may be possible to add artificial work. If you know that your machine is 3x faster than the baseline, you can scale the amount of work that transactions perform by 3x (or add special "padding" synthetic work at a few place for both CPU and IO). I think we need to come up with simple synthetic benchmarks for CPU and IO that allow to compute the ratio to baseline. A IO benchmark test could be as simple as generating (deterministically) a large bucket list (in the order of GB) and count the number of random ledger entries that can be read over a fixed period of time like 5 minutes without any cache enabled (in application and disk access).