Identify approach for consistent benchmark environment

In order to be able to more accurately compare benchmarks over time and between PRs and master, we'd like to ensure that the environment the benches run are as similar as possible to avoid measurement changes due to differences in the benchmark environment.

Some options:

Continue running them in the public Github Actions runners. These have the advantage of being consistently resource constrained (https://docs.github.com/en/free-pro-team@latest/actions/reference/specifications-for-github-hosted-runners), but may be noise-y. We should try out repeatedly running the benchmarks for a given commit to see what this noise might look like. We have already moved the unit tests to self-hosted runners though, due to memory pressure, and run the risk of hitting that here as well eventually. We already saw that compiling in wasm caused us to hit it.
Run them on the self-hosted runners. Currently there doesn't appear to be any sort of resource isolation for jobs running on these runners which I expect will cause the benchmarks to vary quite a bit depending on which other jobs are running at the same time on the runner. We should explore whether there are ways to isolate the jobs and restrict resources (CPU, memory, and disk I/O).
Consider exploring this Kubernetes controller for Github Actions: https://github.com/summerwind/actions-runner-controller ; it seems to allow for resource restriction of jobs (via the normal k8s controls). Tthe overhead of running Kubernetes could be undesirable but could offer other benefits like more easily autoscaling based on job load.

An alternative to this might be to use a different measurement that is less sensitive to noise:

CPU cycle count (supported on x86 via https://crates.io/crates/criterion-cycles-per-byte ; we could expand support to ARM)
Linux perf events to count instructions. @bruceg mentioned he had worked on this some and could finish that work if needed. There is https://crates.io/crates/criterion-perf-events but it depends on nightly Rust.

In the cloud environment, the idea of comparing different runs collected at different times seems to have very few chances to succeed. I think the only more or less accurate way would be to clone both master and the PR code on the same run, bench both, and compare the results. This should eliminate as much noise related to the code being run on-different-nodes/with-different-neighbors as possible, but then it's still susceptible to noisy neighbors appearing/disappearing during the benches.

I've seen the ACM ICPC judge servers tweaked from physical hardware to the kernel and user-space to be able to run with almost no noize for accurate and reliable results - I'd say ideally, we'd want something on a similar level for our benchmarking environment. I recall there was an article on some tips and tricks on how to achieve it, I'll try to find it in case we decide to go this route.

UPD: found the article I had in mind, but it's not in English :laughing:

Notes so far:

I tried out running benchmarks twice for a given commit to get an idea of noise (here). I only ran it once so far, but, as somewhat expected, there is still substantive noise (-10-10% with a couple of outliers of 13% and 20%). I'll plan to run this some more to try to get a baseline of the noise we might expect with this strategy. It does have the advantage of simplicity if we are ok with setting noise thresholds high enough that we may miss some smaller performance degredations in CI (but that could later be caught with trend analysis).

I tried out criterion-cycles-per-byte in my AWS Workspace just to get an idea of it. I was still seeing a surprising amount of noise though. I'll aim to try this out in Github Actions to see what I see there.

I tried out Bruce's criterion-linux-perf to see if I could count CPU instructions in Linux, at least, as, theoretically, this should be very consistent (or even exactly the same?). However, I ran into issues counting the hardware instructions in my AWS Workspace which led me to realize (with Bruce's help) this probably wouldn't be tractable in virtualized environments which usually restrict access to hardware. I plan to try it in Github Actions just to make sure that it doesn't work there, but I'm fairly confident that it won't. We can actually make use of it in AWS with dedicated hosts, which we could consider using as benchmark runners, but these end up being fairly pricey, on the order of $2-5k / month on-demand.

Current plan:

Re-run strategy, n times, of using the same Github Actions runner to run both master and PR benchmarks for comparison to get a better idea of the average noise.
Try using cycles-per-byte in Github Actions to get an idea of the noise we'd see there with that measurement.
Try using criterion-linux-perf in Github Actions just to verify that it won't run on GA's runners. Verify that it would run on an AWS dedicated host. Get an estimated cost for a dedicated AWS host to run these.

Won't CPU counters still be susceptible to CPU cores reshuffling among the tasks? The in the cloud environment is shared, but we can work around that with dedicated hosts, as you suggested. However, is the situation better on a dedicated host? It might be that even running on physical hardware with a clear OS install, running benches still yields large amounts of noise. There are two of reasons for this:

CPU cores going in and out of turbo boost.
Kernel scheduling CPU cores among its and other userspace tasks.

Both can be solved with physical hardware though. Turbo can be turned off in BIOS, and it's possible to tell linux to reserve some CPU cores to not run any tasks on them. We can then run our benches explicitly on that reserved set of cores, eliminating any external noise.

I think it worth trying benches locally (in the simple scenario, on your local workstation) to estimate what noise levels to expect from physical hardware without any tweaks. Going into the efforts I described above might not worth it for us if the untweaked physical hardware results in good enough noise level.

Won't CPU counters still be susceptible to CPU cores reshuffling among the tasks?

Replying to myself: https://github.com/bruceg/criterion-linux-perf uses perf_event_open apparently, and it can scope perf events to the process - so we can trust the kernel to provide us with the correct data.

I did some more testing today, primarily on a dedicated AWS host (c5).

Some more findings:

I was able to access perf hardware events on the dedicated host.

Counting instructions using criterion-linux-perf is very consistent, as expected, but, with Ana's input, I'm starting to feel like that measurement isn't what we are really interested in. Time / throughput is a better measure.

Some things I did to make the environment more consistent:

Isolated the CPU I was using for benchmarking (isolcpus=1 for grub)
Prevented the CPU from sleeping (intel_idle.max_cstate=1 for grub)
Used taskset to run the benchmarks just on one CPU

I didn't compare/contrast to see which ones actually had an effect though.

Example benchmark suite run on the dedicated host with these changes:

name                                                                                                 time       time change  throughput      throughput change  change
partitioned_batching/partitioned_batching_none_2097152                                               9.5431 ms  +1.6168%     999.33 MiB/s    -1.5911%           none
partitioned_batching/batching_none_2097152                                                           5.0203 ms  +1.7071%     1.8551 GiB/s    -1.6784%           none
partitioned_batching/partitioned_batching_none_512000                                                9.5116 ms  -2.2736%     1002.6 MiB/s    +2.3265%           none
partitioned_batching/batching_none_512000                                                            5.2054 ms  +0.7299%     1.7892 GiB/s    -0.7246%           none
partitioned_batching/partitioned_batching_gzip(6)_2097152                                            109.29 ms  -0.2260%     87.262 MiB/s    +0.2265%           none
partitioned_batching/batching_gzip(6)_2097152                                                        104.47 ms  +0.1496%     91.288 MiB/s    -0.1494%           none
partitioned_batching/partitioned_batching_gzip(6)_512000                                             109.59 ms  -0.3178%     87.019 MiB/s    +0.3188%           none
partitioned_batching/batching_gzip(6)_512000                                                         104.80 ms  -0.0684%     90.996 MiB/s    +0.0685%           none
buffers/in-memory                                                                                    36.169 ms  +2.3278%     26.367 MiB/s    -2.2749%           regressed
buffers/on-disk                                                                                      86.998 ms  +1.5791%     10.962 MiB/s    -1.5546%           regressed
create and insert single-level                                                                       712.31 ns  +9.6315%                                        regressed
iterate all fields single-level                                                                      205.26 ns  -0.2483%                                        none
create and insert nested-keys                                                                        1.1669 us  +6.6561%                                        regressed
iterate all fields nested-keys                                                                       541.32 ns  -0.4195%                                        none
create and insert array                                                                              1.1763 us  +0.9123%                                        none
iterate all fields array                                                                             515.43 ns  -0.3937%                                        none
files/files_without_partitions                                                                       13.282 ms  -0.0233%     71.803 MiB/s    +0.0233%           none
http/compression/none                                                                                1.0061 s   +0.0066%     97.067 KiB/s    -0.0066%           none
http/compression/gzip(6)                                                                             1.0065 s   +0.0079%     97.026 KiB/s    -0.0079%           none
isolated_buffers/channels/futures01                                                                  19.753 ms  +1.8233%     96.562 MiB/s    -1.7906%           regressed
isolated_buffers/channels/tokio                                                                      20.650 ms  +1.9936%     92.366 MiB/s    -1.9547%           regressed
isolated_buffers/leveldb/writing                                                                     30.634 ms  +1.4034%     62.263 MiB/s    -1.3840%           regressed
isolated_buffers/leveldb/reading                                                                     192.43 ms  +2.3015%     9.9118 MiB/s    -2.2498%           regressed
isolated_buffers/leveldb/both                                                                        223.10 ms  +2.5604%     8.5493 MiB/s    -2.4965%           regressed
from_string/simple_string                                                                            28.626 ns  -0.8216%     433.09 MiB/s    +0.8284%           none
from_string/foo.bar.baz.bat[0]                                                                       25.550 ns  -1.0600%     671.87 MiB/s    +1.0713%           none
from_string/foo[0].bar[0].baz[0]                                                                     25.351 ns  -1.0171%     752.36 MiB/s    +1.0276%           none
from_string/foo[0].bar[0][0].baz                                                                     25.405 ns  -0.9134%     750.77 MiB/s    +0.9219%           none
from_string/foo[0]                                                                                   28.562 ns  -0.6366%     200.34 MiB/s    +0.6407%           none
from_string/"boo\\"p"                                                                                28.676 ns  -0.9025%     266.05 MiB/s    +0.9107%           none
from_string/p4th_wi7h.numb3r5                                                                        25.379 ns  -1.6139%     638.81 MiB/s    +1.6404%           none
from_string/"boop"                                                                                   28.604 ns  -0.4653%     200.04 MiB/s    +0.4675%           none
from_string/"boop"."snoot"                                                                           28.582 ns  -1.3965%     467.13 MiB/s    +1.4162%           none
from_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]  28.864 ns  -1.0382%     2.8072 GiB/s    +1.0491%           none
to_string/simple_string                                                                              112.73 ns  +0.3256%     109.98 MiB/s    -0.3246%           none
to_string/foo.bar.baz.bat[0]                                                                         113.10 ns  -1.0173%     151.77 MiB/s    +1.0277%           none
to_string/foo[0].bar[0].baz[0]                                                                       113.21 ns  +0.5080%     168.48 MiB/s    -0.5055%           none
to_string/foo[0].bar[0][0].baz                                                                       113.20 ns  +0.2344%     168.49 MiB/s    -0.2338%           none
to_string/foo[0]                                                                                     134.86 ns  -0.3630%     42.430 MiB/s    +0.3643%           none
to_string/"boo\\"p"                                                                                  112.69 ns  -0.0072%     67.701 MiB/s    +0.0072%           none
to_string/p4th_wi7h.numb3r5                                                                          113.19 ns  +0.3556%     143.24 MiB/s    -0.3543%           none
to_string/"boop"                                                                                     134.77 ns  -0.6546%     42.458 MiB/s    +0.6589%           none
to_string/"boop"."snoot"                                                                             112.66 ns  -0.7355%     118.51 MiB/s    +0.7410%           none
to_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]    114.24 ns  +0.4230%     726.28 MiB/s    -0.4212%           none
serialize/simple_string                                                                              175.93 ns  -0.3258%     70.470 MiB/s    +0.3269%           none
serialize/foo.bar.baz.bat[0]                                                                         178.30 ns  +1.2346%     96.276 MiB/s    -1.2195%           none
serialize/foo[0].bar[0].baz[0]                                                                       179.60 ns  -0.0472%     106.20 MiB/s    +0.0472%           none
serialize/foo[0].bar[0][0].baz                                                                       179.01 ns  +0.3964%     106.55 MiB/s    -0.3948%           none
serialize/foo[0]                                                                                     197.77 ns  -1.7488%     28.932 MiB/s    +1.7800%           none
serialize/"boo\\"p"                                                                                  200.67 ns  -0.9922%     38.020 MiB/s    +1.0021%           none
serialize/p4th_wi7h.numb3r5                                                                          178.06 ns  +0.5377%     91.049 MiB/s    -0.5348%           none
serialize/"boop"                                                                                     209.14 ns  -0.8693%     27.360 MiB/s    +0.8769%           none
serialize/"boop"."snoot"                                                                             210.06 ns  +0.1330%     63.560 MiB/s    -0.1328%           none
serialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]    266.34 ns  +0.0571%     311.52 MiB/s    -0.0571%           none
deserialize/simple_string                                                                            523.88 ns  +0.9933%     23.665 MiB/s    -0.9836%           none
deserialize/foo.bar.baz.bat[0]                                                                       1.2821 us  +1.7879%     13.389 MiB/s    -1.7565%           regressed
deserialize/foo[0].bar[0].baz[0]                                                                     1.2671 us  +1.8442%     15.052 MiB/s    -1.8108%           regressed
deserialize/foo[0].bar[0][0].baz                                                                     1.2641 us  +1.9227%     15.088 MiB/s    -1.8864%           regressed
deserialize/foo[0]                                                                                   581.29 ns  +0.9371%     9.8438 MiB/s    -0.9284%           none
deserialize/"boo\\"p"                                                                                654.89 ns  +2.0462%     11.650 MiB/s    -2.0052%           regressed
deserialize/p4th_wi7h.numb3r5                                                                        788.81 ns  +0.7398%     20.553 MiB/s    -0.7344%           none
deserialize/"boop"                                                                                   620.27 ns  +2.4946%     9.2251 MiB/s    -2.4339%           regressed
deserialize/"boop"."snoot"                                                                           999.42 ns  +2.2332%     13.359 MiB/s    -2.1845%           regressed
deserialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]  3.1361 us  +1.2704%     26.456 MiB/s    -1.2545%           none
lua_add_fields/native                                                                                1.0733 us  +1.1452%     931.69 Kelem/s  -1.1322%           none
lua_add_fields/v1                                                                                    3.2298 us  +0.9373%     309.62 Kelem/s  -0.9286%           none
lua_add_fields/v2                                                                                    4.2890 us  +0.9080%     233.15 Kelem/s  -0.8998%           none
lua_field_filter/native                                                                              5.4191 us  +1.0746%     1.8453 Melem/s  -1.0632%           none
lua_field_filter/v1                                                                                  26.601 us  +3.2960%     375.93 Kelem/s  -3.1908%           regressed
lua_field_filter/v2                                                                                  187.66 us  +1.1531%     53.287 Kelem/s  -1.1400%           regressed
regex/regex                                                                                          69.067 us  +0.9482%     21.665 MiB/s    -0.9393%           none
elasticsearch_indexes/dynamic                                                                        722.21 ns  -0.3143%                                        none
elasticsearch_indexes/static                                                                         171.63 ns  +5.8829%                                        none
pipe/pipe_simple                                                                                     36.471 ms  +2.1015%     26.149 MiB/s    -2.0582%           regressed
pipe/pipe_small_lines                                                                                28.588 ms  +2.6563%     341.59 KiB/s    -2.5875%           regressed
pipe/pipe_big_lines                                                                                  163.64 ms  +0.0761%     116.56 MiB/s    -0.0760%           none
pipe/pipe_multiple_writers                                                                           37.640 ms  +2.3016%     2.5337 MiB/s    -2.2498%           regressed
interconnected/interconnected                                                                        102.49 ms  +1.3589%     18.611 MiB/s    -1.3407%           regressed
transforms/transforms                                                                                55.955 ms  +0.8562%     18.748 MiB/s    -0.8489%           none
complex/complex                                                                                      1.9195 s   +0.7190%                                        none

Regression detected. Note that any regressions should be verified.

Here we can see the noise is much less compared to running in GA. I think the benchmarks that do continue to show noise may actually need some tweaks to themselves to reduce noise. For example, Mike noted this morning that the topology benchmarks (pipe/*) are creating their inputs in the benchmark iteration itself when they should be created outside. As Mike noted, the high level benchmarks that engage in network activity are also likely to be noisier in-general so we can tune criterion thresholds to require larger changes for them to flag while still catching smaller regressions to the more focused benchmarks.

The AWS dedicated hosts are fairly expensive but they are quite large; the one I was running (c5 at $2423.52 / month on-demand) had 36 cores, that we could partition into, probably, 4-6 instances. Ana pointed out some other providers that offer entire physical hosts for much cheaper ($100-$200/mo): Hetzner and OVH.

My feelings currently are:

We should investigate using dedicated hardware for the benchmarks as it results in substantial noise reduction over virtual hosting. We could consider also using these dedicated hosts to run the test harness tests. The alternative, is that we set the noise thresholds high enough to not create too many false positives. We will miss small regressions at the time they are introduced, but may be able to catch these smaller regressions through trend analysis.
For comparisons, we should run the baseline and and changes at the same time on the same host to try to reduce noise resulting from the system.

Some reference materials I found today:

Talk by Percona about noise reduction. Emphasized running the benchmarks on the same system configuration.
Robust benchmarking in noisy environments. This recommended, among other things, looking at the minimum value for benchmarks to find a stable value. It also found that running the benchmarks for longer or repeatedly amortized noise. They have improved Julia's benchmarking CI which could be a useful reference for us.
AWS recommendations for OS optimizations. This is where I got the cstate tip from.
pyperf benchmark guide
Blog with some articles about stable benchmarks
How to get consistent results when benchmarking on Linux? had some tips like CPU isolation, disabling address randomization, and changing the scaling governor (though I couldn't figure out how to do this on the AWS dedicated host). criterion does warm up CPUs so hopefully wouldn't hit CPU scaling issues.

Another random thought is that we could defer performance analysis to releases and just do a before / after from the previous release. If we notice a difference, we can use git bisect to find where regressions are introduced. This would let us use the more expensive hardware less frequently, but may mean that fixes are harder as we don't notice them until later. Also people running nightlies may experience regressions.

Nice work! I would prefer to run the benchmarks with pull requests. I am not concerned about the cost you quoted since getting ahead of performance requests, at the PR stage, will be much cheaper overall. A reserved instance will also reduce the cost, but I would prefer to start with on-demand until we have more confidence in the setup. So do what you need to make these accurate and useful.

Some more findings:

I did have to disable address randomization (ASLR) to get more consistent benchmark, which I did via: setarch $(uname -m) -R ...., along with the aforementioned changes of CPU isolation, running the benchmarks on a single CPU, running on a dedicated AWS host, etc.

To verify this, I chose a benchmark that doesn't have a lot of inherent noise and ran it 100 times:

 for i in $(seq 1 100) ; do setarch `uname -m` -R taskset -c 2 cargo bench --no-default-features --features "benches remap-benches" --bench remap downcase | tee -a /tmp/criterion.out.2 ; done

Disabling ASMR took this from:

     32                         Change within noise threshold.
      6                         No change in performance detected.
     32                         Performance has improved.
     30                         Performance has regressed.

     24                         Change within noise threshold.
     74                         No change in performance detected.
      1                         Performance has improved.
      1                         Performance has regressed

I believe the ones marked as regressed/improved were actually me accidentally running a job on the same core (different hyper thread) as the timing lines up.

I found that Github Actions' self-hosted runners actually only run one job at a time (https://github.community/t/parallelism-in-self-hosted-runners/17000/2) which I verified by noting that our Linux unit test job seems to only have had one executing at the time I looked (the others were queued). I think this means we could use a dedicated AWS Host that we partition into, say, 6 instances to allow 6 simultaneous, CPU isolated, benchmark runs.

Next steps:

Run two instances on the same AWS dedicated host and verify that they don't step on each other and that the physical CPUs are divided up rather than scheduled across and that memory is shared effectively. Consider what other resources might be shared across the instance, like disks which might share I/O.
Try disabling Hyper Threading (https://aws.amazon.com/blogs/compute/disabling-intel-hyper-threading-technology-on-amazon-linux/) to see if this has an impact on noise.

Assuming the shared host test goes well, we can put together some Terraform config for spinning up one, configuring it to reduce the benchmark noise, and install / run the Github actions worker to start scheduling the jobs on there. The CI workflow itself will need some adjustments to use taskset as part of its execution.

If running multiple benchmark runs on the same AWS dedicated host, in different instances, proves to cause interference, we can always consider using OVH or Hetzner to provision smaller dedicated hosts that shouldn't share any resources.

Disabling ASLR seems to generally have had a good effect. Here are two full, no code change, benchmark runs compared:

name                                                                                                 time       time change  throughput      throughput change  change
partitioned_batching/partitioned_batching_none_2097152                                               9.5418 ms  -0.6307%     999.47 MiB/s    +0.6347%           none
partitioned_batching/batching_none_2097152                                                           5.1245 ms  +0.6204%     1.8174 GiB/s    -0.6166%           none
partitioned_batching/partitioned_batching_none_512000                                                9.8532 ms  +1.4346%     967.88 MiB/s    -1.4143%           none
partitioned_batching/batching_none_512000                                                            5.3224 ms  -0.6772%     1.7498 GiB/s    +0.6818%           none
partitioned_batching/partitioned_batching_gzip(6)_2097152                                            108.76 ms  -0.0607%     87.689 MiB/s    +0.0607%           none
partitioned_batching/batching_gzip(6)_2097152                                                        104.01 ms  +0.1758%     91.692 MiB/s    -0.1754%           none
partitioned_batching/partitioned_batching_gzip(6)_512000                                             109.28 ms  -0.1229%     87.265 MiB/s    +0.1230%           none
partitioned_batching/batching_gzip(6)_512000                                                         104.40 ms  +0.0481%     91.347 MiB/s    -0.0480%           none
buffers/in-memory                                                                                    35.603 ms  +0.4614%     26.786 MiB/s    -0.4592%           none
buffers/on-disk                                                                                      88.006 ms  +0.0559%     10.836 MiB/s    -0.0558%           none
create and insert single-level                                                                       655.84 ns  +0.0239%                                        none
iterate all fields single-level                                                                      210.16 ns  +0.3099%                                        none
create and insert nested-keys                                                                        1.1058 us  -0.1336%                                        none
iterate all fields nested-keys                                                                       544.33 ns  -0.1342%                                        none
create and insert array                                                                              1.1426 us  +0.9559%                                        none
iterate all fields array                                                                             506.95 ns  -0.0390%                                        none
files/files_without_partitions                                                                       13.334 ms  -0.0475%     71.524 MiB/s    +0.0475%           none
http/compression/none                                                                                1.0059 s   +0.0154%     97.086 KiB/s    -0.0154%           none
http/compression/gzip(6)                                                                             1.0062 s   +0.0032%     97.051 KiB/s    -0.0032%           none
isolated_buffers/channels/futures01                                                                  19.598 ms  -0.0435%     97.324 MiB/s    +0.0436%           none
isolated_buffers/channels/tokio                                                                      20.541 ms  +0.1982%     92.855 MiB/s    -0.1978%           none
isolated_buffers/leveldb/writing                                                                     30.338 ms  +0.5288%     62.869 MiB/s    -0.5260%           none
isolated_buffers/leveldb/reading                                                                     191.59 ms  -0.1147%     9.9552 MiB/s    +0.1148%           none
isolated_buffers/leveldb/both                                                                        221.98 ms  +0.2083%     8.5924 MiB/s    -0.2078%           none
from_string/simple_string                                                                            28.946 ns  -2.3315%     428.31 MiB/s    +2.3872%           none
from_string/foo.bar.baz.bat[0]                                                                       25.563 ns  -0.0205%     671.52 MiB/s    +0.0205%           none
from_string/foo[0].bar[0].baz[0]                                                                     25.599 ns  -0.1995%     745.09 MiB/s    +0.1999%           none
from_string/foo[0].bar[0][0].baz                                                                     25.591 ns  -1.3118%     745.32 MiB/s    +1.3292%           none
from_string/foo[0]                                                                                   28.858 ns  -0.1480%     198.28 MiB/s    +0.1482%           none
from_string/"boo\\"p"                                                                                28.759 ns  +0.4753%     265.29 MiB/s    -0.4730%           none
from_string/p4th_wi7h.numb3r5                                                                        25.574 ns  +0.0650%     633.95 MiB/s    -0.0650%           none
from_string/"boop"                                                                                   28.943 ns  +0.5304%     197.70 MiB/s    -0.5276%           none
from_string/"boop"."snoot"                                                                           28.909 ns  -2.8687%     461.84 MiB/s    +2.9534%           none
from_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]  29.306 ns  +0.8719%     2.7648 GiB/s    -0.8644%           none
to_string/simple_string                                                                              114.83 ns  +0.4662%     107.97 MiB/s    -0.4641%           none
to_string/foo.bar.baz.bat[0]                                                                         114.27 ns  +0.0124%     150.23 MiB/s    -0.0124%           none
to_string/foo[0].bar[0].baz[0]                                                                       114.25 ns  -0.4903%     166.95 MiB/s    +0.4927%           none
to_string/foo[0].bar[0][0].baz                                                                       114.21 ns  -0.3538%     167.00 MiB/s    +0.3550%           none
to_string/foo[0]                                                                                     138.26 ns  +0.0657%     41.387 MiB/s    -0.0657%           none
to_string/"boo\\"p"                                                                                  114.72 ns  -0.1715%     66.503 MiB/s    +0.1718%           none
to_string/p4th_wi7h.numb3r5                                                                          114.10 ns  +0.3189%     142.09 MiB/s    -0.3179%           none
to_string/"boop"                                                                                     138.12 ns  -0.5676%     41.428 MiB/s    +0.5709%           none
to_string/"boop"."snoot"                                                                             114.73 ns  -0.4797%     116.37 MiB/s    +0.4820%           none
to_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]    117.14 ns  -0.1612%     708.31 MiB/s    +0.1614%           none
serialize/simple_string                                                                              177.10 ns  -0.0765%     70.004 MiB/s    +0.0766%           none
serialize/foo.bar.baz.bat[0]                                                                         178.65 ns  +0.9628%     96.088 MiB/s    -0.9537%           none
serialize/foo[0].bar[0].baz[0]                                                                       180.15 ns  +0.0111%     105.88 MiB/s    -0.0111%           none
serialize/foo[0].bar[0][0].baz                                                                       179.62 ns  +0.2285%     106.19 MiB/s    -0.2280%           none
serialize/foo[0]                                                                                     198.66 ns  -0.3313%     28.802 MiB/s    +0.3324%           none
serialize/"boo\\"p"                                                                                  199.11 ns  +0.5883%     38.317 MiB/s    -0.5848%           none
serialize/p4th_wi7h.numb3r5                                                                          178.64 ns  +1.8516%     90.756 MiB/s    -1.8179%           none
serialize/"boop"                                                                                     208.80 ns  +0.6402%     27.405 MiB/s    -0.6361%           none
serialize/"boop"."snoot"                                                                             206.88 ns  -0.5032%     64.538 MiB/s    +0.5058%           none
serialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]    267.86 ns  +0.4064%     309.75 MiB/s    -0.4048%           none
deserialize/simple_string                                                                            528.91 ns  +0.0766%     23.440 MiB/s    -0.0766%           none
deserialize/foo.bar.baz.bat[0]                                                                       1.2647 us  -0.0156%     13.573 MiB/s    +0.0156%           none
deserialize/foo[0].bar[0].baz[0]                                                                     1.2520 us  +0.2855%     15.235 MiB/s    -0.2847%           none
deserialize/foo[0].bar[0][0].baz                                                                     1.2593 us  +0.4458%     15.147 MiB/s    -0.4439%           none
deserialize/foo[0]                                                                                   582.92 ns  +0.2717%     9.8162 MiB/s    -0.2710%           none
deserialize/"boo\\"p"                                                                                646.33 ns  +0.0728%     11.804 MiB/s    -0.0727%           none
deserialize/p4th_wi7h.numb3r5                                                                        789.62 ns  +0.0057%     20.532 MiB/s    -0.0057%           none
deserialize/"boop"                                                                                   610.42 ns  +0.1102%     9.3740 MiB/s    -0.1101%           none
deserialize/"boop"."snoot"                                                                           985.34 ns  +0.8241%     13.550 MiB/s    -0.8174%           none
deserialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]  3.1338 us  -0.3207%     26.476 MiB/s    +0.3217%           none
lua_add_fields/native                                                                                1.0536 us  +0.5141%     949.16 Kelem/s  -0.5115%           none
lua_add_fields/v1                                                                                    3.4836 us  -1.5368%     287.06 Kelem/s  +1.5607%           improved
lua_add_fields/v2                                                                                    4.4344 us  -0.0176%     225.51 Kelem/s  +0.0176%           none
lua_field_filter/native                                                                              5.5279 us  -0.1229%     1.8090 Melem/s  +0.1231%           none
lua_field_filter/v1                                                                                  26.746 us  -0.6117%     373.89 Kelem/s  +0.6155%           none
lua_field_filter/v2                                                                                  184.10 us  -0.4372%     54.318 Kelem/s  +0.4391%           none
regex/regex                                                                                          65.307 us  -0.6500%     23.043 MiB/s    +0.6543%           none
elasticsearch_indexes/dynamic                                                                        713.13 ns  -0.2867%                                        none
elasticsearch_indexes/static                                                                         167.34 ns  +4.6310%                                        none
pipe/pipe_simple                                                                                     35.648 ms  +0.1728%     26.753 MiB/s    -0.1725%           none
pipe/pipe_small_lines                                                                                27.886 ms  -0.2368%     350.20 KiB/s    +0.2374%           none
pipe/pipe_big_lines                                                                                  164.34 ms  +0.1776%     116.06 MiB/s    -0.1772%           none
pipe/pipe_multiple_writers                                                                           36.800 ms  +0.2054%     2.5915 MiB/s    -0.2050%           none
interconnected/interconnected                                                                        100.50 ms  -0.0154%     18.978 MiB/s    +0.0154%           none
transforms/transforms                                                                                55.881 ms  -0.0232%     18.773 MiB/s    +0.0232%           none
complex/complex                                                                                      1.9204 s   +0.0851%                                        none
remap: add fields with remap                                                                         2.2485 us  +0.4724%                                        none
remap: add fields with add_fields                                                                    1.9589 us  +0.0983%                                        none
remap: parse JSON with remap                                                                         1.6926 us  +0.0050%                                        none
remap: parse JSON with json_parser                                                                   1.0709 us  +0.5198%                                        none
remap: coerce with remap                                                                             4.1799 us  +0.0447%                                        none
remap: coerce with coercer                                                                           1.8894 us  +0.1248%                                        none
upcase: literal_value                                                                                172.59 ns  +0.0166%                                        none
downcase: literal_value                                                                              175.44 ns  +0.0158%                                        none
parse_json: literal_value                                                                            300.81 ns  +0.1700%                                        none
parse_json: invalid_json_with_default                                                                660.22 ns  +0.1356%                                        none

I'm going to run this repeatedly overnight to see how consistent this effect is.

Here are the change results over 20 runs (first column is count):

     20 buffers/in-memory       none
     20 buffers/on-disk none
     20 complex/complex none
      1 create and insert array improved
     18 create and insert array none
      1 create and insert array regressed
     20 create and insert nested-keys   none
     20 create and insert single-level  none
     20 deserialize/"boo\\"p"   none
     20 deserialize/"boop"      none
     20 deserialize/"boop"."snoot"      none
     20 deserialize/foo[0].bar[0][0].baz        none
     20 deserialize/foo[0].bar[0].baz[0]        none
      1 deserialize/foo[0]      improved
     19 deserialize/foo[0]      none
     20 deserialize/foo.bar.baz.bat[0]  none
     20 deserialize/p4th_wi7h.numb3r5   none
     20 deserialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]     none
     20 deserialize/simple_string       none
     20 downcase: literal_value none
     20 elasticsearch_indexes/dynamic   none
     20 elasticsearch_indexes/static    none
     20 files/files_without_partitions  none
     20 from_string/"boo\\"p"   none
     20 from_string/"boop"      none
     20 from_string/"boop"."snoot"      none
     20 from_string/foo[0].bar[0][0].baz        none
      1 from_string/foo[0].bar[0].baz[0]        improved
     18 from_string/foo[0].bar[0].baz[0]        none
      1 from_string/foo[0].bar[0].baz[0]        regressed
     20 from_string/foo[0]      none
     20 from_string/foo.bar.baz.bat[0]  none
     20 from_string/p4th_wi7h.numb3r5   none
     20 from_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]     none
     20 from_string/simple_string       none
     20 http/compression/gzip(6)        none
     20 http/compression/none   none
     20 interconnected/interconnected   none
     20 isolated_buffers/channels/futures01     none
     20 isolated_buffers/channels/tokio none
     20 isolated_buffers/leveldb/both   none
     20 isolated_buffers/leveldb/reading        none
     20 isolated_buffers/leveldb/writing        none
     20 iterate all fields array        none
     20 iterate all fields nested-keys  none
     20 iterate all fields single-level none
     20 lua_add_fields/native   none
     19 lua_add_fields/v1       none
      1 lua_add_fields/v1       regressed
     19 lua_add_fields/v2       none
      1 lua_add_fields/v2       regressed
     20 lua_field_filter/native none
      1 lua_field_filter/v1     improved
     18 lua_field_filter/v1     none
      1 lua_field_filter/v1     regressed
      6 lua_field_filter/v2     improved
      8 lua_field_filter/v2     none
      6 lua_field_filter/v2     regressed
     20 parse_json: invalid_json_with_default   none
      2 parse_json: literal_value       improved
     16 parse_json: literal_value       none
      2 parse_json: literal_value       regressed
     20 partitioned_batching/batching_gzip(6)_2097152   none
     20 partitioned_batching/batching_gzip(6)_512000    none
      3 partitioned_batching/batching_none_2097152      improved
     15 partitioned_batching/batching_none_2097152      none
      2 partitioned_batching/batching_none_2097152      regressed
      3 partitioned_batching/batching_none_512000       improved
     17 partitioned_batching/batching_none_512000       none
     20 partitioned_batching/partitioned_batching_gzip(6)_2097152       none
     20 partitioned_batching/partitioned_batching_gzip(6)_512000        none
     20 partitioned_batching/partitioned_batching_none_2097152  none
     20 partitioned_batching/partitioned_batching_none_512000   none
     20 pipe/pipe_big_lines     none
     20 pipe/pipe_multiple_writers      none
     20 pipe/pipe_simple        none
     20 pipe/pipe_small_lines   none
      5 regex/regex     improved
      9 regex/regex     none
      6 regex/regex     regressed
     20 remap: add fields with add_fields       none
      1 remap: add fields with remap    improved
     18 remap: add fields with remap    none
      1 remap: add fields with remap    regressed
     20 remap: coerce with coercer      none
     20 remap: coerce with remap        none
     20 remap: parse JSON with json_parser      none
     20 remap: parse JSON with remap    none
     20 serialize/"boo\\"p"     none
     20 serialize/"boop"        none
     20 serialize/"boop"."snoot"        none
     20 serialize/foo[0].bar[0][0].baz  none
     20 serialize/foo[0].bar[0].baz[0]  none
     20 serialize/foo[0]        none
     20 serialize/foo.bar.baz.bat[0]    none
     20 serialize/p4th_wi7h.numb3r5     none
     20 serialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]       none
     20 serialize/simple_string none
     20 to_string/"boo\\"p"     none
     20 to_string/"boop"        none
     19 to_string/"boop"."snoot"        none
      1 to_string/"boop"."snoot"        regressed
     19 to_string/foo[0].bar[0][0].baz  none
      1 to_string/foo[0].bar[0][0].baz  regressed
      1 to_string/foo[0].bar[0].baz[0]  improved
     19 to_string/foo[0].bar[0].baz[0]  none
     20 to_string/foo[0]        none
      1 to_string/foo.bar.baz.bat[0]    improved
     19 to_string/foo.bar.baz.bat[0]    none
     19 to_string/p4th_wi7h.numb3r5     none
      1 to_string/p4th_wi7h.numb3r5     regressed
     20 to_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]       none
     20 to_string/simple_string none
     20 transforms/transforms   none
     20 upcase: literal_value   none

Or just the improved/regressed (out of 20 runs):

      1 create and insert array improved
      1 create and insert array regressed
      1 deserialize/foo[0]      improved
      1 from_string/foo[0].bar[0].baz[0]        improved
      1 from_string/foo[0].bar[0].baz[0]        regressed
      1 lua_add_fields/v1       regressed
      1 lua_add_fields/v2       regressed
      1 lua_field_filter/v1     improved
      1 lua_field_filter/v1     regressed
      6 lua_field_filter/v2     improved
      6 lua_field_filter/v2     regressed
      2 parse_json: literal_value       improved
      2 parse_json: literal_value       regressed
      3 partitioned_batching/batching_none_2097152      improved
      2 partitioned_batching/batching_none_2097152      regressed
      3 partitioned_batching/batching_none_512000       improved
      5 regex/regex     improved
      6 regex/regex     regressed
      1 remap: add fields with remap    improved
      1 remap: add fields with remap    regressed
      1 to_string/"boop"."snoot"        regressed
      1 to_string/foo[0].bar[0][0].baz  regressed
      1 to_string/foo[0].bar[0].baz[0]  improved
      1 to_string/foo.bar.baz.bat[0]    improved
      1 to_string/p4th_wi7h.numb3r5     regressed

Note that if we see a regression or improvement, we typically expect to see the reverse when it returns to the baseline unless the first baseline was the anomalous one.

The partitioned batching, lua_field_filter/v2 and regex ones seem a bit noise-y yet. We will likely want to dig into those a bit more. Somewhat surprisingly the high level topology and disk buffer benchmarks were consistent (likely with substantial stdev though).

Observed improvements / regressions:

create and insert array +1.1537%        regressed
create and insert array -1.3325%        improved
deserialize/foo[0]      -1.2242%        improved
from_string/foo[0].bar[0].baz[0]        -4.8200%        improved
from_string/foo[0].bar[0].baz[0]        +5.1245%        regressed
lua_add_fields/v1       +1.2977%        regressed
lua_add_fields/v2       +1.3488%        regressed
lua_field_filter/v1     -1.4196%        improved
lua_field_filter/v1     +1.6972%        regressed
lua_field_filter/v2     +1.0509%        regressed
lua_field_filter/v2     +1.0938%        regressed
lua_field_filter/v2     -1.1647%        improved
lua_field_filter/v2     +1.3569%        regressed
lua_field_filter/v2     +1.7262%        regressed
lua_field_filter/v2     -1.8848%        improved
lua_field_filter/v2     -2.0932%        improved
lua_field_filter/v2     -2.1620%        improved
lua_field_filter/v2     +2.1798%        regressed
lua_field_filter/v2     -2.2583%        improved
lua_field_filter/v2     +3.1216%        regressed
lua_field_filter/v2     -3.6570%        improved
parse_json: literal_value       -1.2808%        improved
parse_json: literal_value       +1.2854%        regressed
parse_json: literal_value       -1.5913%        improved
parse_json: literal_value       +1.6182%        regressed
partitioned_batching/batching_none_2097152      -5.0137%        improved
partitioned_batching/batching_none_2097152      +5.5792%        regressed
partitioned_batching/batching_none_2097152      -6.4220%        improved
partitioned_batching/batching_none_2097152      -6.6038%        improved
partitioned_batching/batching_none_2097152      +7.2048%        regressed
partitioned_batching/batching_none_512000       -3.7311%        improved
partitioned_batching/batching_none_512000       -4.5033%        improved
partitioned_batching/batching_none_512000       -5.7850%        improved
regex/regex     +1.8791%        regressed
regex/regex     -1.9155%        improved
regex/regex     +2.0136%        regressed
regex/regex     -2.1610%        improved
regex/regex     +2.2352%        regressed
regex/regex     -2.4888%        improved
regex/regex     -2.6008%        improved
regex/regex     +2.7651%        regressed
regex/regex     -2.9066%        improved
regex/regex     +3.3316%        regressed
regex/regex     +6.0097%        regressed
remap: add fields with remap    -1.4700%        improved
remap: add fields with remap    +1.5678%        regressed
to_string/"boop"."snoot"        +1.2658%        regressed
to_string/foo[0].bar[0][0].baz  +1.9819%        regressed
to_string/foo[0].bar[0].baz[0]  -1.9835%        improved
to_string/foo.bar.baz.bat[0]    -2.1058%        improved
to_string/p4th_wi7h.numb3r5     +3.7470%        regressed

My recommendation is to try to take a closer look at the noise to see if there is a change we could make to reduce it, otherwise set the noise thresholds for partitioned batching to 10% and regex + lua_field_filter to 5%.

I'm going to try running multiple instances on the same dedicated host to ensure they won't be stepping on each other.

More updates:

I spun up a new c5 dedicated host and provisioned 8 c5.2xlarge instances on it to run benchmarks in parallel.

In addition to the previously mentioned consistency changes, I also disabled hyperthreading (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-optimize-cpu.html) to give each instance 4 dedicated cores. I was concerned that vCPUs might be partitioned across cores across instances so that a given instance might share a core with another, but I didn't verify if this is actually what might happen. I had noted that, when I previously provisioned the large instance CPU assignment seemed randomly distributed across cores rather than adjacent CPUs, for example CPU0 and CPU1, being on the same core. In any event, this should remove another variable.

I again ran the downcase remap benchmark, due to its lack of inherent noise, 50 times on each of the 8 instances, in parallel:

parallel --jobs 0 --files --tmpdir 1 --tag --nonall --sshloginfile /tmp/hosts.txt --linebuffer 'cd vector ; for i in $(seq 1 50) ; do setarch x86_64 -R taskset -c 0 cargo bench --no-default-features --features "benches remap-benches" --bench remap downcase ; done'

CPU 0 was the isolated CPU on each instance. Note to self: there is a --no-run flag for cargo bench that we can use to compile the benchmarks using all available CPUs, before running them only on one.

I got the following counts for criterion's detection:

     80                         Change within noise threshold.
    320                         No change in performance detected.

I'm optimistic that this shows that there is not CPU interference for the instances.

I'm running a full benchmark run now to see how that looks. After that I'll set it up to run repeatedly overnight again.

I started putting some terraform + ansible config together simply as a way to facilitate my testing, but should be useful when we need to codify the benchmarking CI infrastructure assuming this approach continues panning out well.

I also noted that the partitioned batch benchmark that above is noted as showing noise is also the first benchmark that runs. It might be worth checking if changing the order has any effect.

EDIT I forgot to use taskset -c 0 so these weren't running on the isolated CPU.

One set of benchmarks (counts):

      8 buffers/in-memory       none
      8 buffers/on-disk none
      8 complex/complex none
      7 create and insert array none
      1 create and insert array regressed
      8 create and insert nested-keys   none
      7 create and insert single-level  none
      1 create and insert single-level  regressed
      8 deserialize/"boo\\"p"   none
      8 deserialize/"boop"      none
      8 deserialize/"boop"."snoot"      none
      8 deserialize/foo[0].bar[0][0].baz        none
      8 deserialize/foo[0].bar[0].baz[0]        none
      1 deserialize/foo[0]      improved
      7 deserialize/foo[0]      none
      8 deserialize/foo.bar.baz.bat[0]  none
      8 deserialize/p4th_wi7h.numb3r5   none
      8 deserialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]     none
      8 deserialize/simple_string       none
      5 downcase: literal_value none
      3 downcase: literal_value regressed
      8 elasticsearch_indexes/dynamic   none
      2 elasticsearch_indexes/static    improved
      5 elasticsearch_indexes/static    none
      1 elasticsearch_indexes/static    regressed
      8 files/files_without_partitions  none
      8 from_string/"boo\\"p"   none
      8 from_string/"boop"      none
      8 from_string/"boop"."snoot"      none
      8 from_string/foo[0].bar[0][0].baz        none
      8 from_string/foo[0].bar[0].baz[0]        none
      8 from_string/foo[0]      none
      8 from_string/foo.bar.baz.bat[0]  none
      8 from_string/p4th_wi7h.numb3r5   none
      8 from_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]     none
      8 from_string/simple_string       none
      8 http/compression/gzip(6)        none
      8 http/compression/none   none
      1 interconnected/interconnected   improved
      7 interconnected/interconnected   none
      8 isolated_buffers/channels/futures01     none
      8 isolated_buffers/channels/tokio none
      8 isolated_buffers/leveldb/both   none
      8 isolated_buffers/leveldb/reading        none
      8 isolated_buffers/leveldb/writing        none
      8 iterate all fields array        none
      8 iterate all fields nested-keys  none
      8 iterate all fields single-level none
      8 lua_add_fields/native   none
      8 lua_add_fields/v1       none
      8 lua_add_fields/v2       none
      1 lua_field_filter/native improved
      7 lua_field_filter/native none
      8 lua_field_filter/v1     none
      3 lua_field_filter/v2     improved
      1 lua_field_filter/v2     none
      4 lua_field_filter/v2     regressed
      1 parse_json: invalid_json_with_default   improved
      7 parse_json: invalid_json_with_default   none
      8 parse_json: literal_value       none
      8 partitioned_batching/batching_gzip(6)_2097152   none
      8 partitioned_batching/batching_gzip(6)_512000    none
      1 partitioned_batching/batching_none_2097152      improved
      7 partitioned_batching/batching_none_2097152      none
      8 partitioned_batching/batching_none_512000       none
      8 partitioned_batching/partitioned_batching_gzip(6)_2097152       none
      8 partitioned_batching/partitioned_batching_gzip(6)_512000        none
      8 partitioned_batching/partitioned_batching_none_2097152  none
      8 partitioned_batching/partitioned_batching_none_512000   none
      8 pipe/pipe_big_lines     none
      8 pipe/pipe_multiple_writers      none
      8 pipe/pipe_simple        none
      8 pipe/pipe_small_lines   none
      2 regex/regex     improved
      4 regex/regex     none
      2 regex/regex     regressed
      8 remap: add fields with add_fields       none
      8 remap: add fields with remap    none
      8 remap: coerce with coercer      none
      8 remap: coerce with remap        none
      1 remap: parse JSON with json_parser      improved
      7 remap: parse JSON with json_parser      none
      8 remap: parse JSON with remap    none
      8 serialize/"boo\\"p"     none
      8 serialize/"boop"        none
      8 serialize/"boop"."snoot"        none
      8 serialize/foo[0].bar[0][0].baz  none
      8 serialize/foo[0].bar[0].baz[0]  none
      8 serialize/foo[0]        none
      8 serialize/foo.bar.baz.bat[0]    none
      8 serialize/p4th_wi7h.numb3r5     none
      8 serialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]       none
      8 serialize/simple_string none
      8 to_string/"boo\\"p"     none
      8 to_string/"boop"        none
      8 to_string/"boop"."snoot"        none
      8 to_string/foo[0].bar[0][0].baz  none
      8 to_string/foo[0].bar[0].baz[0]  none
      8 to_string/foo[0]        none
      8 to_string/foo.bar.baz.bat[0]    none
      8 to_string/p4th_wi7h.numb3r5     none
      8 to_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]       none
      8 to_string/simple_string none
      8 transforms/transforms   none
      2 upcase: literal_value   improved
      6 upcase: literal_value   none

Time changed for the ones showing regressions or improvements:

      create and insert array +1.1396%        regressed
create and insert single-level  +1.2012%        regressed
deserialize/foo[0]      -1.3533%        improved
downcase: literal_value +1.2737%        regressed
downcase: literal_value +2.0112%        regressed
downcase: literal_value +2.0397%        regressed
elasticsearch_indexes/static    -10.050%        improved
elasticsearch_indexes/static    +11.517%        regressed
elasticsearch_indexes/static    -6.4488%        improved
interconnected/interconnected   -4.4868%        improved
lua_field_filter/native -2.6885%        improved
lua_field_filter/v2     -1.2712%        improved
lua_field_filter/v2     +1.5681%        regressed
lua_field_filter/v2     -1.6623%        improved
lua_field_filter/v2     -1.8654%        improved
lua_field_filter/v2     +2.8081%        regressed
lua_field_filter/v2     +2.8288%        regressed
lua_field_filter/v2     +3.6367%        regressed
parse_json: invalid_json_with_default   -1.0829%        improved
regex/regex     -1.7539%        improved
regex/regex     +1.7707%        regressed
regex/regex     +2.1820%        regressed
regex/regex     -3.9821%        improved
remap: parse JSON with json_parser      -3.9046%        improved
upcase: literal_value   -1.5229%        improved
upcase: literal_value   -1.8613%        improved

Here we see the same ones we saw before on an isolated instance, but some surprising new ones as well including downcase and upcase.

I'm setting it up to run overnight again.

Bah, I just realized I missed specifying to run on the isolated CPU for those so they can be ignored. I'll use it in the overnight ones though.

Command:

parallel --jobs 0 --tag --nonall --sshloginfile /tmp/hosts.txt 'cd vector ; rm -f ~/benches.out ; nohup bash -c "for i in \$(seq 1 50) ; do taskset -c 0 setarch x86_64 -R cargo bench --no-default-features --features \"benches remap-benches\" 2>&1 | tee -a ~/benches.out ; done" &'

For some reason it died after only 2 runs, this is what I saw though for benchmarks that showed changes:

buffers/in-memory       -1.2522%        improved
buffers/on-disk -2.3786%        improved
buffers/on-disk -3.2696%        improved
create and insert array +1.0353%        regressed
create and insert array -1.1738%        improved
create and insert array +1.2479%        regressed
create and insert array -1.2792%        improved
create and insert array -1.3966%        improved
create and insert array +1.8406%        regressed
create and insert single-level  -1.0600%        improved
create and insert single-level  +1.0981%        regressed
create and insert single-level  -1.4655%        improved
deserialize/"boo\\"p"   -2.2670%        improved
downcase: literal_value +1.4512%        regressed
downcase: literal_value -1.8926%        improved
elasticsearch_indexes/static    -11.430%        improved
elasticsearch_indexes/static    +9.7197%        regressed
files/files_without_partitions  -2.2181%        improved
files/files_without_partitions  +2.4195%        regressed
files/files_without_partitions  -4.4782%        improved
isolated_buffers/leveldb/writing        -1.3121%        improved
lua_field_filter/v2     +1.2010%        regressed
lua_field_filter/v2     -1.2824%        improved
lua_field_filter/v2     +1.3524%        regressed
lua_field_filter/v2     +1.4162%        regressed
lua_field_filter/v2     -1.6233%        improved
lua_field_filter/v2     +2.8390%        regressed
pipe/pipe_multiple_writers      +3.0693%        regressed
pipe/pipe_simple        +1.2415%        regressed
upcase: literal_value   +1.3713%        regressed
upcase: literal_value   +1.4149%        regressed
upcase: literal_value   +2.9005%        regressed

Still some surprising ones in there like upcase and downcase that do make me think maybe there is some interference. I'm going to try to set it up again to execute more runs today.

Running in tmux this time:

parallel --jobs 0 --tag --nonall --sshloginfile /tmp/hosts.txt 'cd vector ; rm -f ~/benches.out ; tmux new-session -d -s "benchmarks" "for i in \$(seq 1 50) ; do taskset -c 0 setarch x86_64 -R cargo bench --no-default-features --features \"benches remap-benches\" 2>&1 | tee -a ~/benches.out ; done"

With approximately 10 parallel runs in for 8 instances, I'm seeing:

      1 buffers/in-memory   improved
      1 create and insert nested-keys   regressed
      1 create and insert single-level  improved
      1 deserialize/foo[0].bar[0].baz[0]    improved
      1 elasticsearch_indexes/dynamic   improved
      1 from_string/foo.bar.baz.bat[0]  regressed
      1 interconnected/interconnected   regressed
      1 isolated_buffers/channels/futures01 regressed
      1 isolated_buffers/leveldb/writing    improved
      1 lua_add_fields/v2   improved
      1 lua_add_fields/v2   regressed
      1 lua_field_filter/native improved
      1 parse_json: invalid_json_with_default   improved
      1 parse_json: literal_value   improved
      1 partitioned_batching/batching_none_512000   regressed
      1 pipe/pipe_multiple_writers  improved
      2 buffers/in-memory   regressed
      2 isolated_buffers/leveldb/writing    regressed
      2 iterate all fields single-level improved
      2 iterate all fields single-level regressed
      2 lua_field_filter/v1 regressed
      2 partitioned_batching/batching_none_2097152  improved
      2 partitioned_batching/batching_none_2097152  regressed
      2 partitioned_batching/batching_none_512000   improved
      2 pipe/pipe_simple    improved
      3 pipe/pipe_multiple_writers  regressed
      3 remap: coerce with coercer  regressed
      4 create and insert nested-keys   improved
      4 lua_add_fields/v1   improved
      5 buffers/on-disk regressed
      5 lua_add_fields/v1   regressed
      6 deserialize/"boop"."snoot"  improved
      7 buffers/on-disk improved
      7 deserialize/"boo\\"p"   improved
      7 deserialize/p4th_wi7h.numb3r5   improved
      7 deserialize/simple_string   improved
      7 downcase: literal_value regressed
      7 lua_add_fields/native   regressed
      8 deserialize/"boop"  improved
      8 deserialize/foo[0]  improved
      8 downcase: literal_value improved
      8 elasticsearch_indexes/static    improved
      8 parse_json: literal_value   regressed
      9 create and insert array improved
      9 create and insert array regressed
      9 elasticsearch_indexes/static    regressed
     10 parse_json: invalid_json_with_default   regressed
     11 files/files_without_partitions  improved
     11 upcase: literal_value   improved
     12 lua_field_filter/v2 regressed
     12 upcase: literal_value   regressed
     13 files/files_without_partitions  regressed
     15 regex/regex regressed
     16 lua_field_filter/v2 improved
     17 regex/regex improved
     40 regex/regex none
     44 lua_field_filter/v2 none
     49 upcase: literal_value   none
     55 elasticsearch_indexes/static    none
     56 files/files_without_partitions  none
     57 downcase: literal_value none
     61 parse_json: invalid_json_with_default   none
     62 create and insert array none
     63 lua_add_fields/v1   none
     63 parse_json: literal_value   none
     64 deserialize/"boop"  none
     64 deserialize/foo[0]  none
     65 deserialize/"boo\\"p"   none
     65 deserialize/p4th_wi7h.numb3r5   none
     65 deserialize/simple_string   none
     65 lua_add_fields/native   none
     66 deserialize/"boop"."snoot"  none
     68 buffers/on-disk none
     68 pipe/pipe_multiple_writers  none
     69 remap: coerce with coercer  none
     70 lua_add_fields/v2   none
     70 lua_field_filter/v1 none
     70 pipe/pipe_simple    none
     71 deserialize/foo[0].bar[0].baz[0]    none
     71 elasticsearch_indexes/dynamic   none
     71 interconnected/interconnected   none
     71 lua_field_filter/native none
     72 complex/complex none
     72 deserialize/foo[0].bar[0][0].baz    none
     72 deserialize/foo.bar.baz.bat[0]  none
     72 deserialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
     72 pipe/pipe_big_lines none
     72 pipe/pipe_small_lines   none
     72 remap: add fields with add_fields   none
     72 remap: add fields with remap    none
     72 remap: coerce with remap    none
     72 remap: parse JSON with json_parser  none
     72 remap: parse JSON with remap    none
     72 serialize/"boo\\"p" none
     72 serialize/"boop"    none
     72 serialize/"boop"."snoot"    none
     72 serialize/foo[0].bar[0][0].baz  none
     72 serialize/foo[0]    none
     72 serialize/p4th_wi7h.numb3r5 none
     72 serialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]   none
     72 transforms/transforms   none
     73 serialize/foo[0].bar[0].baz[0]  none
     75 create and insert nested-keys   none
     76 iterate all fields single-level none
     76 partitioned_batching/batching_none_2097152  none
     77 buffers/in-memory   none
     77 isolated_buffers/leveldb/writing    none
     77 partitioned_batching/batching_none_512000   none
     78 serialize/foo.bar.baz.bat[0]    none
     79 create and insert single-level  none
     79 from_string/foo.bar.baz.bat[0]  none
     79 isolated_buffers/channels/futures01 none
     79 serialize/simple_string none
     79 to_string/"boo\\"p" none
     79 to_string/"boop"    none
     79 to_string/"boop"."snoot"    none
     79 to_string/foo[0]    none
     79 to_string/p4th_wi7h.numb3r5 none
     79 to_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0]   none
     80 from_string/"boo\\"p"   none
     80 from_string/"boop"  none
     80 from_string/"boop"."snoot"  none
     80 from_string/foo[0].bar[0][0].baz    none
     80 from_string/foo[0].bar[0].baz[0]    none
     80 from_string/foo[0]  none
     80 from_string/p4th_wi7h.numb3r5   none
     80 from_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
     80 from_string/simple_string   none
     80 http/compression/gzip(6)    none
     80 http/compression/none   none
     80 isolated_buffers/channels/tokio none
     80 isolated_buffers/leveldb/both   none
     80 isolated_buffers/leveldb/reading    none
     80 iterate all fields array    none
     80 iterate all fields nested-keys  none
     80 partitioned_batching/batching_gzip(6)_2097152   none
     80 partitioned_batching/batching_gzip(6)_512000    none
     80 partitioned_batching/partitioned_batching_gzip(6)_2097152   none
     80 partitioned_batching/partitioned_batching_gzip(6)_512000    none
     80 partitioned_batching/partitioned_batching_none_2097152  none
     80 partitioned_batching/partitioned_batching_none_512000   none
     80 to_string/foo[0].bar[0][0].baz  none
     80 to_string/foo[0].bar[0].baz[0]  none
     80 to_string/foo.bar.baz.bat[0]    none
     80 to_string/simple_string none

Noise for ones showing changes:

buffers/in-memory   -1.3711%    improved
buffers/in-memory   +1.5527%    regressed
buffers/in-memory   +1.8231%    regressed
buffers/on-disk -1.5213%    improved
buffers/on-disk -1.5325%    improved
buffers/on-disk +1.7111%    regressed
buffers/on-disk +1.9306%    regressed
buffers/on-disk -2.0263%    improved
buffers/on-disk +2.0283%    regressed
buffers/on-disk -2.5212%    improved
buffers/on-disk -2.6512%    improved
buffers/on-disk -2.9310%    improved
buffers/on-disk +3.2889%    regressed
buffers/on-disk +4.0243%    regressed
buffers/on-disk -4.2677%    improved
create and insert array +1.0349%    regressed
create and insert array -1.0901%    improved
create and insert array -1.0954%    improved
create and insert array +1.1982%    regressed
create and insert array +1.2649%    regressed
create and insert array -1.2713%    improved
create and insert array +1.2844%    regressed
create and insert array -1.4027%    improved
create and insert array +1.4793%    regressed
create and insert array +1.6641%    regressed
create and insert array -1.6796%    improved
create and insert array -2.0858%    improved
create and insert array -2.5681%    improved
create and insert array +2.7420%    regressed
create and insert array -3.0450%    improved
create and insert array -3.2899%    improved
create and insert array +3.7951%    regressed
create and insert array +4.3070%    regressed
create and insert nested-keys   -1.1215%    improved
create and insert nested-keys   +1.3393%    regressed
create and insert nested-keys   -1.4461%    improved
create and insert nested-keys   -1.5290%    improved
create and insert nested-keys   -1.7110%    improved
create and insert single-level  -1.0708%    improved
deserialize/"boo\\"p"   -1.8181%    improved
deserialize/"boo\\"p"   -1.8871%    improved
deserialize/"boo\\"p"   -2.0350%    improved
deserialize/"boop"  -2.0576%    improved
deserialize/"boo\\"p"   -2.0636%    improved
deserialize/"boo\\"p"   -2.0758%    improved
deserialize/"boop"  -2.0881%    improved
deserialize/"boop"  -2.1282%    improved
deserialize/"boop"  -2.1338%    improved
deserialize/"boo\\"p"   -2.2210%    improved
deserialize/"boop"  -2.2403%    improved
deserialize/"boo\\"p"   -2.2641%    improved
deserialize/"boop"  -2.4284%    improved
deserialize/"boop"  -2.5023%    improved
deserialize/"boop"  -2.6505%    improved
deserialize/"boop"."snoot"  -1.2351%    improved
deserialize/"boop"."snoot"  -1.5345%    improved
deserialize/"boop"."snoot"  -1.6256%    improved
deserialize/"boop"."snoot"  -1.6539%    improved
deserialize/"boop"."snoot"  -1.6968%    improved
deserialize/"boop"."snoot"  -2.9652%    improved
deserialize/foo[0]  -2.2525%    improved
deserialize/foo[0]  -2.3868%    improved
deserialize/foo[0]  -2.5564%    improved
deserialize/foo[0]  -2.5580%    improved
deserialize/foo[0]  -2.5777%    improved
deserialize/foo[0]  -2.5833%    improved
deserialize/foo[0]  -2.5848%    improved
deserialize/foo[0]  -2.6286%    improved
deserialize/foo[0].bar[0].baz[0]    -1.3599%    improved
deserialize/p4th_wi7h.numb3r5   -1.4786%    improved
deserialize/p4th_wi7h.numb3r5   -1.5360%    improved
deserialize/p4th_wi7h.numb3r5   -1.6322%    improved
deserialize/p4th_wi7h.numb3r5   -1.6541%    improved
deserialize/p4th_wi7h.numb3r5   -1.6701%    improved
deserialize/p4th_wi7h.numb3r5   -1.6793%    improved
deserialize/p4th_wi7h.numb3r5   -1.7310%    improved
deserialize/simple_string   -2.7546%    improved
deserialize/simple_string   -2.7697%    improved
deserialize/simple_string   -2.7820%    improved
deserialize/simple_string   -2.8296%    improved
deserialize/simple_string   -2.8673%    improved
deserialize/simple_string   -2.8758%    improved
deserialize/simple_string   -2.8874%    improved
downcase: literal_value -1.0828%    improved
downcase: literal_value +1.1394%    regressed
downcase: literal_value +1.2959%    regressed
downcase: literal_value -1.3090%    improved
downcase: literal_value -1.3122%    improved
downcase: literal_value +1.3147%    regressed
downcase: literal_value +1.3794%    regressed
downcase: literal_value -1.3832%    improved
downcase: literal_value -1.4240%    improved
downcase: literal_value -2.0778%    improved
downcase: literal_value -2.0806%    improved
downcase: literal_value +2.1245%    regressed
downcase: literal_value +2.1883%    regressed
downcase: literal_value +2.6215%    regressed
downcase: literal_value -2.6484%    improved
elasticsearch_indexes/dynamic   -1.5571%    improved
elasticsearch_indexes/static    +11.589%    regressed
elasticsearch_indexes/static    -11.714%    improved
elasticsearch_indexes/static    +13.580%    regressed
elasticsearch_indexes/static    +14.110%    regressed
elasticsearch_indexes/static    -16.122%    improved
elasticsearch_indexes/static    -16.453%    improved
elasticsearch_indexes/static    +17.450%    regressed
elasticsearch_indexes/static    -18.267%    improved
elasticsearch_indexes/static    +18.770%    regressed
elasticsearch_indexes/static    +5.5549%    regressed
elasticsearch_indexes/static    -6.1160%    improved
elasticsearch_indexes/static    +6.7624%    regressed
elasticsearch_indexes/static    +7.0514%    regressed
elasticsearch_indexes/static    -7.2085%    improved
elasticsearch_indexes/static    -8.1175%    improved
elasticsearch_indexes/static    -8.5284%    improved
elasticsearch_indexes/static    +8.9102%    regressed
files/files_without_partitions  +1.4317%    regressed
files/files_without_partitions  +1.5974%    regressed
files/files_without_partitions  -1.6904%    improved
files/files_without_partitions  +1.7598%    regressed
files/files_without_partitions  +1.8017%    regressed
files/files_without_partitions  -1.8675%    improved
files/files_without_partitions  +1.8964%    regressed
files/files_without_partitions  +1.9215%    regressed
files/files_without_partitions  -1.9583%    improved
files/files_without_partitions  +1.9855%    regressed
files/files_without_partitions  +1.9983%    regressed
files/files_without_partitions  -2.0847%    improved
files/files_without_partitions  +2.0872%    regressed
files/files_without_partitions  -2.1192%    improved
files/files_without_partitions  -2.1602%    improved
files/files_without_partitions  -2.2128%    improved
files/files_without_partitions  +2.2470%    regressed
files/files_without_partitions  -2.3917%    improved
files/files_without_partitions  +2.4962%    regressed
files/files_without_partitions  -3.1584%    improved
files/files_without_partitions  +3.3841%    regressed
files/files_without_partitions  -3.7645%    improved
files/files_without_partitions  -3.8745%    improved
files/files_without_partitions  +4.5545%    regressed
from_string/foo.bar.baz.bat[0]  +1.8566%    regressed
interconnected/interconnected   +1.2691%    regressed
isolated_buffers/channels/futures01 +1.1189%    regressed
isolated_buffers/leveldb/writing    -1.1027%    improved
isolated_buffers/leveldb/writing    +1.1437%    regressed
isolated_buffers/leveldb/writing    +1.3786%    regressed
iterate all fields single-level +2.4609%    regressed
iterate all fields single-level -2.9326%    improved
iterate all fields single-level -3.4496%    improved
iterate all fields single-level +3.6805%    regressed
lua_add_fields/native   +3.2675%    regressed
lua_add_fields/native   +3.7453%    regressed
lua_add_fields/native   +3.8168%    regressed
lua_add_fields/native   +3.9488%    regressed
lua_add_fields/native   +4.1162%    regressed
lua_add_fields/native   +4.3567%    regressed
lua_add_fields/native   +4.6620%    regressed
lua_add_fields/v1   +1.2655%    regressed
lua_add_fields/v1   +1.2728%    regressed
lua_add_fields/v1   -1.2986%    improved
lua_add_fields/v1   +1.3715%    regressed
lua_add_fields/v1   -1.4866%    improved
lua_add_fields/v1   -1.5394%    improved
lua_add_fields/v1   -1.5440%    improved
lua_add_fields/v1   +1.5839%    regressed
lua_add_fields/v1   +1.8893%    regressed
lua_add_fields/v2   -1.2108%    improved
lua_add_fields/v2   +1.3122%    regressed
lua_field_filter/native -2.1350%    improved
lua_field_filter/v1 +1.5440%    regressed
lua_field_filter/v1 +1.5729%    regressed
lua_field_filter/v2 -1.0960%    improved
lua_field_filter/v2 +1.2496%    regressed
lua_field_filter/v2 +1.3072%    regressed
lua_field_filter/v2 -1.3363%    improved
lua_field_filter/v2 +1.5011%    regressed
lua_field_filter/v2 +1.5103%    regressed
lua_field_filter/v2 -1.5331%    improved
lua_field_filter/v2 +1.5486%    regressed
lua_field_filter/v2 -1.5844%    improved
lua_field_filter/v2 -1.7806%    improved
lua_field_filter/v2 -1.7922%    improved
lua_field_filter/v2 -1.8646%    improved
lua_field_filter/v2 -1.9035%    improved
lua_field_filter/v2 -1.9589%    improved
lua_field_filter/v2 -2.0118%    improved
lua_field_filter/v2 +2.0366%    regressed
lua_field_filter/v2 +2.0472%    regressed
lua_field_filter/v2 +2.2891%    regressed
lua_field_filter/v2 -2.3003%    improved
lua_field_filter/v2 -2.3495%    improved
lua_field_filter/v2 -2.3962%    improved
lua_field_filter/v2 -2.5001%    improved
lua_field_filter/v2 -2.5706%    improved
lua_field_filter/v2 -2.6490%    improved
lua_field_filter/v2 +2.7958%    regressed
lua_field_filter/v2 +3.0666%    regressed
lua_field_filter/v2 +3.2312%    regressed
lua_field_filter/v2 +4.1520%    regressed
parse_json: invalid_json_with_default   +1.1281%    regressed
parse_json: invalid_json_with_default   +1.5002%    regressed
parse_json: invalid_json_with_default   -1.5975%    improved
parse_json: invalid_json_with_default   +1.8601%    regressed
parse_json: invalid_json_with_default   +1.9882%    regressed
parse_json: invalid_json_with_default   +2.0861%    regressed
parse_json: invalid_json_with_default   +2.1100%    regressed
parse_json: invalid_json_with_default   +2.1817%    regressed
parse_json: invalid_json_with_default   +2.5531%    regressed
parse_json: invalid_json_with_default   +2.6278%    regressed
parse_json: invalid_json_with_default   +3.9368%    regressed
parse_json: literal_value   -1.0954%    improved
parse_json: literal_value   +4.1013%    regressed
parse_json: literal_value   +4.1126%    regressed
parse_json: literal_value   +4.1406%    regressed
parse_json: literal_value   +4.1497%    regressed
parse_json: literal_value   +4.1513%    regressed
parse_json: literal_value   +4.2276%    regressed
parse_json: literal_value   +4.2758%    regressed
parse_json: literal_value   +5.0644%    regressed
pipe/pipe_multiple_writers  +1.9539%    regressed
pipe/pipe_multiple_writers  +1.9554%    regressed
pipe/pipe_multiple_writers  +2.4000%    regressed
pipe/pipe_multiple_writers  -3.6813%    improved
pipe/pipe_simple    -1.2423%    improved
pipe/pipe_simple    -1.2522%    improved
regex/regex +1.7689%    regressed
regex/regex -1.8119%    improved
regex/regex +1.8750%    regressed
regex/regex +2.0251%    regressed
regex/regex -2.0363%    improved
regex/regex -2.1759%    improved
regex/regex +2.1882%    regressed
regex/regex -2.1927%    improved
regex/regex -2.2076%    improved
regex/regex -2.3017%    improved
regex/regex -2.3748%    improved
regex/regex -2.4218%    improved
regex/regex -2.4537%    improved
regex/regex -2.4622%    improved
regex/regex -2.7060%    improved
regex/regex +2.7374%    regressed
regex/regex +2.7694%    regressed
regex/regex -2.8921%    improved
regex/regex +2.9263%    regressed
regex/regex -3.1635%    improved
regex/regex +3.2247%    regressed
regex/regex -3.3485%    improved
regex/regex +3.4294%    regressed
regex/regex +3.4924%    regressed
regex/regex +3.7009%    regressed
regex/regex +3.8341%    regressed
regex/regex -3.8743%    improved
regex/regex -4.0575%    improved
regex/regex +4.1375%    regressed
regex/regex -5.4246%    improved
regex/regex +5.8705%    regressed
regex/regex +5.8809%    regressed
remap: coerce with coercer  +1.4079%    regressed
remap: coerce with coercer  +1.5097%    regressed
remap: coerce with coercer  +1.5639%    regressed
upcase: literal_value   -1.3672%    improved
upcase: literal_value   +1.3979%    regressed
upcase: literal_value   -1.4281%    improved
upcase: literal_value   -1.4405%    improved
upcase: literal_value   -1.4600%    improved
upcase: literal_value   -1.5317%    improved
upcase: literal_value   +1.5466%    regressed
upcase: literal_value   +1.5896%    regressed
upcase: literal_value   +1.6005%    regressed
upcase: literal_value   +1.7872%    regressed
upcase: literal_value   -1.8252%    improved
upcase: literal_value   -1.9470%    improved
upcase: literal_value   +2.3237%    regressed
upcase: literal_value   -2.3707%    improved
upcase: literal_value   +2.4424%    regressed
upcase: literal_value   -2.5006%    improved
upcase: literal_value   +2.5185%    regressed
upcase: literal_value   +2.7054%    regressed
upcase: literal_value   -2.7090%    improved
upcase: literal_value   +3.4441%    regressed
upcase: literal_value   -3.7689%    improved
upcase: literal_value   +4.0006%    regressed
upcase: literal_value   +4.0665%    regressed

This is still more noise than I would have expected, especially for some of the remap and lookup benchmarks which should only exercise the CPU / memory as far as I can tell.

Even without any more changes, it does make me think we could probably use a noise threshold of 5% for most benchmarks.

I'll let it keep running to gather more data.

More updates:

It finished executing the 50 parallel runs * 8 runners = 400 runs.

Of the ones showing changes greater than +/- 5%, I saw the following counts:

      1 buffers/on-disk improved
      1 files/files_without_partitions  improved
      1 from_string/"boop"."snoot"      regressed
      1 from_string/foo[0]      regressed
      1 parse_json: literal_value       regressed
      1 upcase: literal_value   improved
      2 from_string/"boop"."snoot"      improved
      2 lua_field_filter/v2     improved
      3 upcase: literal_value   regressed
      6 regex/regex     improved
      8 regex/regex     regressed
     59 elasticsearch_indexes/static    improved
     64 elasticsearch_indexes/static    regressed

(again we expect to see regressions paired with improvements if there was really no change).

The elasticsearch_indexes/static and regex ones stand out as likely having a large amount of noise inherent to the benchmark. I'll lake a closer look at these two. For now, I'll mark the elasticsearch one as allowing for +/- 25% and regex allowing +/- 10% which would cover the maximum noise I saw. This should result in fairly few false positives until we reduce their noise.

I've uploaded the complete set of benchmark results in case we need to analyze them later. Each file in the archive is a separate runner.

benchmarks.tar.gz

For additional reference, this is the full count of changes I saw regardless of magnitude (default noise threshold is 1%):

      1 create and insert single-level  regressed
      1 deserialize/foo[0].bar[0].baz[0]        improved
      1 elasticsearch_indexes/dynamic   regressed
      1 from_string/"boop"."snoot"      regressed
      1 from_string/foo[0]      regressed
      1 from_string/simple_string       regressed
      1 isolated_buffers/channels/futures01     improved
      1 isolated_buffers/channels/tokio improved
      1 isolated_buffers/channels/tokio regressed
      1 isolated_buffers/leveldb/reading        regressed
      1 lua_add_fields/v2       improved
      1 lua_add_fields/v2       regressed
      1 parse_json: literal_value       improved
      1 pipe/pipe_big_lines     improved
      1 remap: add fields with remap    regressed
      1 transforms/transforms   regressed
      2 buffers/in-memory       regressed
      2 from_string/"boop"."snoot"      improved
      2 from_string/foo.bar.baz.bat[0]  improved
      2 interconnected/interconnected   improved
      2 interconnected/interconnected   regressed
      2 pipe/pipe_big_lines     regressed
      2 pipe/pipe_simple        regressed
      2 pipe/pipe_small_lines   improved
      2 pipe/pipe_small_lines   regressed
      2 transforms/transforms   improved
      3 create and insert single-level  improved
      3 from_string/foo.bar.baz.bat[0]  regressed
      3 isolated_buffers/channels/futures01     regressed
      3 iterate all fields single-level improved
      3 iterate all fields single-level regressed
      3 remap: coerce with coercer      regressed
      4 buffers/in-memory       improved
      4 elasticsearch_indexes/dynamic   improved
      4 lua_field_filter/v1     improved
      5 lua_field_filter/native improved
      6 deserialize/"boop"."snoot"      improved
      6 lua_field_filter/native regressed
      6 pipe/pipe_simple        improved
      7 deserialize/"boo\\"p"   improved
      7 deserialize/p4th_wi7h.numb3r5   improved
      7 deserialize/simple_string       improved
      7 lua_add_fields/native   regressed
      8 deserialize/"boop"      improved
      8 deserialize/foo[0]      improved
      8 lua_field_filter/v1     regressed
      8 parse_json: literal_value       regressed
      9 isolated_buffers/leveldb/writing        regressed
     11 create and insert nested-keys   regressed
     12 parse_json: invalid_json_with_default   improved
     14 create and insert nested-keys   improved
     14 isolated_buffers/leveldb/writing        improved
     14 pipe/pipe_multiple_writers      regressed
     18 lua_add_fields/v1       regressed
     19 parse_json: invalid_json_with_default   regressed
     19 pipe/pipe_multiple_writers      improved
     23 lua_add_fields/v1       improved
     26 buffers/on-disk improved
     28 buffers/on-disk regressed
     37 downcase: literal_value regressed
     38 downcase: literal_value improved
     41 files/files_without_partitions  improved
     47 files/files_without_partitions  regressed
     52 create and insert array improved
     52 create and insert array regressed
     61 elasticsearch_indexes/static    improved
     62 upcase: literal_value   regressed
     64 upcase: literal_value   improved
     66 elasticsearch_indexes/static    regressed
     78 regex/regex     improved
     83 regex/regex     regressed
     93 lua_field_filter/v2     improved
     99 lua_field_filter/v2     regressed

This may point to a smaller level of noise inherent in some of the benchmarks that show a lot of changes.

I'm going to close this issue as done for this pass. I'm sure we'll improve things here, but it's gotten into a pretty good state with configuring custom benchmark runners and configuring noise thresholds.

vectordotdev / vector

Identify approach for consistent benchmark environment #5394