Open MonsieurNicolas opened 1 month ago
we should document how to build a "reference min-spec node" that can be used to run those benchmarks. I
That's a good idea, but I'm not sure which spec this should be. I'm not sure we should hinder ourselves too much, given that validators should have much more reasonable specs.
should get a new method that performs more interesting work
There are some issues with this contract, but I'm not sure if they're about not 'interesting' enough work. I don't think randomization on the contract side is necessary. I'm also not even sure if we should modify any entries on the contract side at all, it might be better to just pass them through (at least for now; eventually we might start filtering out no-op writes). Basically all the IO happens based on the footprint and it's not very useful to do anything about the ledger entries on the host side. A big issue about this benchmark so far has been variance, I would work on eliminating it as much as possible before trying anything fancy.
ledger state is generated by adding contract data that will correspond to all IDs reached during a run
Yeah, I've been thinking about BL emulation as a next step. Appending directly to the buckets is an interesting idea, I didn't know that's possible. I've no clue as to how much that will change the results though.
basically, ApplyLoad::benchmark() should always apply transactions over the same state (so same ledger number), by resetting its state after applying a ledger
That's a great idea, thanks.
That's a good idea, but I'm not sure which spec this should be. I'm not sure we should hinder ourselves too much, given that validators should have much more reasonable specs.
I think the issue is that people's laptops are likely much faster at a lot of things than the typical validator in AWS and in general we need a way for everyone to normalize numbers so that we can have sane conversations.
The actual docker args cannot be hardcoded (at least CPU seems to be relative to the host CPU), but we can and should document the absolute numbers in terms of IOPS and core performance (probably look at some raw number easy to benchmark like X sha256/second so that people can easily tweak their setup to get close enough to that performance) . Note that we may already have some instructions/second "basic validator" number as we need something like that in the context of calibration. IOPS -> at some point we'll shoot for 1M IOPS, but for now conservative number should be more like 100k IOPS.
Basically all the IO happens based on the footprint and it's not very useful to do anything about the ledger entries on the host side. A big issue about this benchmark so far has been variance, I would work on eliminating it as much as possible before trying anything fancy.
yeah I don't think you actually need to modify entries within the contract, but I am pretty sure you need to call the put
host function (how entries end up in the "modified" list) -- otherwise we'd allow contracts to rewrite entries they don't own.
but I am pretty sure you need to call the put host function (how entries end up in the "modified" list)
Nope, currently we actually overwrite all the RW entries unconditionally, the security guarantee comes from the host not allowing contracts to modify entries they don't own. We might start filtering out no-op changes eventually, but I'm not sure if that's a useful optimization for the real traffic.
@MonsieurNicolas you mentioned using --cpu-period=100000 --cpu-quota=40000
to limit resources as an example. If the docker image is being run on one of our dev machines (same hardware as our validators) is there any reason to limit CPU? I don't think so. I couldn't get the iops limiting args to work on my laptop, which is why I'm using a dev machine.
Reason for doing something like throttling is that we want to be able to both normalize tests (so that people can try things out on their laptop or any hosted environment) and to get a reasonable estimate of performance for both "current proposed" and "future safe" on baseline hardware.
CPU line performance and IOPS are both problematic: if you test on hardware that has better capabilities than baseline, it will give a false sense of performance, so we need to mitigate for this.
I think that throttling is not the only acceptable solution.
For example if you can estimate the ratio, it may be possible to add artificial work. If you know that your machine is 3x faster than the baseline, you can scale the amount of work that transactions perform by 3x (or add special "padding" synthetic work at a few place for both CPU and IO). I think we need to come up with simple synthetic benchmarks for CPU and IO that allow to compute the ratio to baseline. A IO benchmark test could be as simple as generating (deterministically) a large bucket list (in the order of GB) and count the number of random ledger entries that can be read over a fixed period of time like 5 minutes without any cache enabled (in application and disk access).
Tagging as a discussion for now.
Right now it's fairly difficult to validate proposed ledger wide limits, yet having a good understanding of their impact can really help with
Here is a high level way we could iterate on bench-marking capability in this context:
--device-read-iops=1000 --device-write-iops=1000
for iops; and things like--cpuset-cpus=0-3 --cpu-period=100000 --cpu-quota=40000
for CPU (as to throttle CPU performance to the desired number -- in my example 10% of the host performance)apply-load
(do_work
at the moment) should get a new method that performs more interesting work.seed: int
(seed for prng), other args related to the amount of work (amount of cpu work, number of reads, number of writes, bytes per read or write, bytes of events to generate, key padding size).apply-load
(METADATA_OUTPUT_STREAM
orMETADATA_DEBUG_LEDGERS
enabled) as that method emits eventscontractID | zeros[padding_size] | counter
(so that they can be sorted bycounter
)counter
is generated by a prng seeded withseed
)ApplyLoad
counter:0..N
that picks a random bucket and simply appends to that bucket (as keys are sorted). Goal is to create a fairly realistic distribution (in terms of bucket size) -- an exponential distribution is probably good enough.ApplyLoad
works so that ledger state always starts from the generated snapshotApplyLoad::benchmark()
should always apply transactions over the same state (so same ledger number), by resetting its state after applying a ledgerCATCHUP_WAIT_MERGES_TX_APPLY_FOR_TESTING
does when running "catchup" so that benchmark step boundaries are clean