Closed jszwedko closed 3 years ago
An alternative to this might be to use a different measurement that is less sensitive to noise:
In the cloud environment, the idea of comparing different runs collected at different times seems to have very few chances to succeed.
I think the only more or less accurate way would be to clone both master
and the PR code on the same run, bench both, and compare the results. This should eliminate as much noise related to the code being run on-different-nodes/with-different-neighbors as possible, but then it's still susceptible to noisy neighbors appearing/disappearing during the benches.
I've seen the ACM ICPC judge servers tweaked from physical hardware to the kernel and user-space to be able to run with almost no noize for accurate and reliable results - I'd say ideally, we'd want something on a similar level for our benchmarking environment. I recall there was an article on some tips and tricks on how to achieve it, I'll try to find it in case we decide to go this route.
UPD: found the article I had in mind, but it's not in English :laughing:
Notes so far:
I tried out running benchmarks twice for a given commit to get an idea of noise (here). I only ran it once so far, but, as somewhat expected, there is still substantive noise (-10-10% with a couple of outliers of 13% and 20%). I'll plan to run this some more to try to get a baseline of the noise we might expect with this strategy. It does have the advantage of simplicity if we are ok with setting noise thresholds high enough that we may miss some smaller performance degredations in CI (but that could later be caught with trend analysis).
I tried out criterion-cycles-per-byte in my AWS Workspace just to get an idea of it. I was still seeing a surprising amount of noise though. I'll aim to try this out in Github Actions to see what I see there.
I tried out Bruce's criterion-linux-perf
to see if I could count CPU instructions in Linux, at least, as, theoretically, this should be very consistent (or even exactly the same?). However, I ran into issues counting the hardware instructions in my AWS Workspace which led me to realize (with Bruce's help) this probably wouldn't be tractable in virtualized environments which usually restrict access to hardware. I plan to try it in Github Actions just to make sure that it doesn't work there, but I'm fairly confident that it won't. We can actually make use of it in AWS with dedicated hosts, which we could consider using as benchmark runners, but these end up being fairly pricey, on the order of $2-5k / month on-demand.
Current plan:
cycles-per-byte
in Github Actions to get an idea of the noise we'd see there with that measurement.criterion-linux-perf
in Github Actions just to verify that it won't run on GA's runners. Verify that it would run on an AWS dedicated host. Get an estimated cost for a dedicated AWS host to run these.Won't CPU counters still be susceptible to CPU cores reshuffling among the tasks? The in the cloud environment is shared, but we can work around that with dedicated hosts, as you suggested. However, is the situation better on a dedicated host? It might be that even running on physical hardware with a clear OS install, running benches still yields large amounts of noise. There are two of reasons for this:
Both can be solved with physical hardware though. Turbo can be turned off in BIOS, and it's possible to tell linux to reserve some CPU cores to not run any tasks on them. We can then run our benches explicitly on that reserved set of cores, eliminating any external noise.
I think it worth trying benches locally (in the simple scenario, on your local workstation) to estimate what noise levels to expect from physical hardware without any tweaks. Going into the efforts I described above might not worth it for us if the untweaked physical hardware results in good enough noise level.
Won't CPU counters still be susceptible to CPU cores reshuffling among the tasks?
Replying to myself: https://github.com/bruceg/criterion-linux-perf uses perf_event_open apparently, and it can scope perf events to the process - so we can trust the kernel to provide us with the correct data.
I did some more testing today, primarily on a dedicated AWS host (c5
).
Some more findings:
I was able to access perf
hardware events on the dedicated host.
Counting instructions using criterion-linux-perf
is very consistent, as expected, but, with Ana's input, I'm starting to feel like that measurement isn't what we are really interested in. Time / throughput is a better measure.
Some things I did to make the environment more consistent:
isolcpus=1
for grub)intel_idle.max_cstate=1
for grub)taskset
to run the benchmarks just on one CPUI didn't compare/contrast to see which ones actually had an effect though.
Example benchmark suite run on the dedicated host with these changes:
name time time change throughput throughput change change
partitioned_batching/partitioned_batching_none_2097152 9.5431 ms +1.6168% 999.33 MiB/s -1.5911% none
partitioned_batching/batching_none_2097152 5.0203 ms +1.7071% 1.8551 GiB/s -1.6784% none
partitioned_batching/partitioned_batching_none_512000 9.5116 ms -2.2736% 1002.6 MiB/s +2.3265% none
partitioned_batching/batching_none_512000 5.2054 ms +0.7299% 1.7892 GiB/s -0.7246% none
partitioned_batching/partitioned_batching_gzip(6)_2097152 109.29 ms -0.2260% 87.262 MiB/s +0.2265% none
partitioned_batching/batching_gzip(6)_2097152 104.47 ms +0.1496% 91.288 MiB/s -0.1494% none
partitioned_batching/partitioned_batching_gzip(6)_512000 109.59 ms -0.3178% 87.019 MiB/s +0.3188% none
partitioned_batching/batching_gzip(6)_512000 104.80 ms -0.0684% 90.996 MiB/s +0.0685% none
buffers/in-memory 36.169 ms +2.3278% 26.367 MiB/s -2.2749% regressed
buffers/on-disk 86.998 ms +1.5791% 10.962 MiB/s -1.5546% regressed
create and insert single-level 712.31 ns +9.6315% regressed
iterate all fields single-level 205.26 ns -0.2483% none
create and insert nested-keys 1.1669 us +6.6561% regressed
iterate all fields nested-keys 541.32 ns -0.4195% none
create and insert array 1.1763 us +0.9123% none
iterate all fields array 515.43 ns -0.3937% none
files/files_without_partitions 13.282 ms -0.0233% 71.803 MiB/s +0.0233% none
http/compression/none 1.0061 s +0.0066% 97.067 KiB/s -0.0066% none
http/compression/gzip(6) 1.0065 s +0.0079% 97.026 KiB/s -0.0079% none
isolated_buffers/channels/futures01 19.753 ms +1.8233% 96.562 MiB/s -1.7906% regressed
isolated_buffers/channels/tokio 20.650 ms +1.9936% 92.366 MiB/s -1.9547% regressed
isolated_buffers/leveldb/writing 30.634 ms +1.4034% 62.263 MiB/s -1.3840% regressed
isolated_buffers/leveldb/reading 192.43 ms +2.3015% 9.9118 MiB/s -2.2498% regressed
isolated_buffers/leveldb/both 223.10 ms +2.5604% 8.5493 MiB/s -2.4965% regressed
from_string/simple_string 28.626 ns -0.8216% 433.09 MiB/s +0.8284% none
from_string/foo.bar.baz.bat[0] 25.550 ns -1.0600% 671.87 MiB/s +1.0713% none
from_string/foo[0].bar[0].baz[0] 25.351 ns -1.0171% 752.36 MiB/s +1.0276% none
from_string/foo[0].bar[0][0].baz 25.405 ns -0.9134% 750.77 MiB/s +0.9219% none
from_string/foo[0] 28.562 ns -0.6366% 200.34 MiB/s +0.6407% none
from_string/"boo\\"p" 28.676 ns -0.9025% 266.05 MiB/s +0.9107% none
from_string/p4th_wi7h.numb3r5 25.379 ns -1.6139% 638.81 MiB/s +1.6404% none
from_string/"boop" 28.604 ns -0.4653% 200.04 MiB/s +0.4675% none
from_string/"boop"."snoot" 28.582 ns -1.3965% 467.13 MiB/s +1.4162% none
from_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] 28.864 ns -1.0382% 2.8072 GiB/s +1.0491% none
to_string/simple_string 112.73 ns +0.3256% 109.98 MiB/s -0.3246% none
to_string/foo.bar.baz.bat[0] 113.10 ns -1.0173% 151.77 MiB/s +1.0277% none
to_string/foo[0].bar[0].baz[0] 113.21 ns +0.5080% 168.48 MiB/s -0.5055% none
to_string/foo[0].bar[0][0].baz 113.20 ns +0.2344% 168.49 MiB/s -0.2338% none
to_string/foo[0] 134.86 ns -0.3630% 42.430 MiB/s +0.3643% none
to_string/"boo\\"p" 112.69 ns -0.0072% 67.701 MiB/s +0.0072% none
to_string/p4th_wi7h.numb3r5 113.19 ns +0.3556% 143.24 MiB/s -0.3543% none
to_string/"boop" 134.77 ns -0.6546% 42.458 MiB/s +0.6589% none
to_string/"boop"."snoot" 112.66 ns -0.7355% 118.51 MiB/s +0.7410% none
to_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] 114.24 ns +0.4230% 726.28 MiB/s -0.4212% none
serialize/simple_string 175.93 ns -0.3258% 70.470 MiB/s +0.3269% none
serialize/foo.bar.baz.bat[0] 178.30 ns +1.2346% 96.276 MiB/s -1.2195% none
serialize/foo[0].bar[0].baz[0] 179.60 ns -0.0472% 106.20 MiB/s +0.0472% none
serialize/foo[0].bar[0][0].baz 179.01 ns +0.3964% 106.55 MiB/s -0.3948% none
serialize/foo[0] 197.77 ns -1.7488% 28.932 MiB/s +1.7800% none
serialize/"boo\\"p" 200.67 ns -0.9922% 38.020 MiB/s +1.0021% none
serialize/p4th_wi7h.numb3r5 178.06 ns +0.5377% 91.049 MiB/s -0.5348% none
serialize/"boop" 209.14 ns -0.8693% 27.360 MiB/s +0.8769% none
serialize/"boop"."snoot" 210.06 ns +0.1330% 63.560 MiB/s -0.1328% none
serialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] 266.34 ns +0.0571% 311.52 MiB/s -0.0571% none
deserialize/simple_string 523.88 ns +0.9933% 23.665 MiB/s -0.9836% none
deserialize/foo.bar.baz.bat[0] 1.2821 us +1.7879% 13.389 MiB/s -1.7565% regressed
deserialize/foo[0].bar[0].baz[0] 1.2671 us +1.8442% 15.052 MiB/s -1.8108% regressed
deserialize/foo[0].bar[0][0].baz 1.2641 us +1.9227% 15.088 MiB/s -1.8864% regressed
deserialize/foo[0] 581.29 ns +0.9371% 9.8438 MiB/s -0.9284% none
deserialize/"boo\\"p" 654.89 ns +2.0462% 11.650 MiB/s -2.0052% regressed
deserialize/p4th_wi7h.numb3r5 788.81 ns +0.7398% 20.553 MiB/s -0.7344% none
deserialize/"boop" 620.27 ns +2.4946% 9.2251 MiB/s -2.4339% regressed
deserialize/"boop"."snoot" 999.42 ns +2.2332% 13.359 MiB/s -2.1845% regressed
deserialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] 3.1361 us +1.2704% 26.456 MiB/s -1.2545% none
lua_add_fields/native 1.0733 us +1.1452% 931.69 Kelem/s -1.1322% none
lua_add_fields/v1 3.2298 us +0.9373% 309.62 Kelem/s -0.9286% none
lua_add_fields/v2 4.2890 us +0.9080% 233.15 Kelem/s -0.8998% none
lua_field_filter/native 5.4191 us +1.0746% 1.8453 Melem/s -1.0632% none
lua_field_filter/v1 26.601 us +3.2960% 375.93 Kelem/s -3.1908% regressed
lua_field_filter/v2 187.66 us +1.1531% 53.287 Kelem/s -1.1400% regressed
regex/regex 69.067 us +0.9482% 21.665 MiB/s -0.9393% none
elasticsearch_indexes/dynamic 722.21 ns -0.3143% none
elasticsearch_indexes/static 171.63 ns +5.8829% none
pipe/pipe_simple 36.471 ms +2.1015% 26.149 MiB/s -2.0582% regressed
pipe/pipe_small_lines 28.588 ms +2.6563% 341.59 KiB/s -2.5875% regressed
pipe/pipe_big_lines 163.64 ms +0.0761% 116.56 MiB/s -0.0760% none
pipe/pipe_multiple_writers 37.640 ms +2.3016% 2.5337 MiB/s -2.2498% regressed
interconnected/interconnected 102.49 ms +1.3589% 18.611 MiB/s -1.3407% regressed
transforms/transforms 55.955 ms +0.8562% 18.748 MiB/s -0.8489% none
complex/complex 1.9195 s +0.7190% none
Regression detected. Note that any regressions should be verified.
Here we can see the noise is much less compared to running in GA. I think the benchmarks that do continue to show noise may actually need some tweaks to themselves to reduce noise. For example, Mike noted this morning that the topology benchmarks (pipe/*
) are creating their inputs in the benchmark iteration itself when they should be created outside. As Mike noted, the high level benchmarks that engage in network activity are also likely to be noisier in-general so we can tune criterion thresholds to require larger changes for them to flag while still catching smaller regressions to the more focused benchmarks.
The AWS dedicated hosts are fairly expensive but they are quite large; the one I was running (c5
at $2423.52 / month on-demand) had 36 cores, that we could partition into, probably, 4-6 instances. Ana pointed out some other providers that offer entire physical hosts for much cheaper ($100-$200/mo): Hetzner and OVH.
My feelings currently are:
Some reference materials I found today:
criterion
does warm up CPUs so hopefully wouldn't hit CPU scaling issues.Another random thought is that we could defer performance analysis to releases and just do a before / after from the previous release. If we notice a difference, we can use git bisect
to find where regressions are introduced. This would let us use the more expensive hardware less frequently, but may mean that fixes are harder as we don't notice them until later. Also people running nightlies may experience regressions.
Nice work! I would prefer to run the benchmarks with pull requests. I am not concerned about the cost you quoted since getting ahead of performance requests, at the PR stage, will be much cheaper overall. A reserved instance will also reduce the cost, but I would prefer to start with on-demand until we have more confidence in the setup. So do what you need to make these accurate and useful.
Some more findings:
I did have to disable address randomization (ASLR) to get more consistent benchmark, which I did via: setarch $(uname -m) -R ....
, along with the aforementioned changes of CPU isolation, running the benchmarks on a single CPU, running on a dedicated AWS host, etc.
To verify this, I chose a benchmark that doesn't have a lot of inherent noise and ran it 100 times:
for i in $(seq 1 100) ; do setarch `uname -m` -R taskset -c 2 cargo bench --no-default-features --features "benches remap-benches" --bench remap downcase | tee -a /tmp/criterion.out.2 ; done
Disabling ASMR took this from:
32 Change within noise threshold.
6 No change in performance detected.
32 Performance has improved.
30 Performance has regressed.
To
24 Change within noise threshold.
74 No change in performance detected.
1 Performance has improved.
1 Performance has regressed
I believe the ones marked as regressed/improved were actually me accidentally running a job on the same core (different hyper thread) as the timing lines up.
I found that Github Actions' self-hosted runners actually only run one job at a time (https://github.community/t/parallelism-in-self-hosted-runners/17000/2) which I verified by noting that our Linux unit test job seems to only have had one executing at the time I looked (the others were queued). I think this means we could use a dedicated AWS Host that we partition into, say, 6 instances to allow 6 simultaneous, CPU isolated, benchmark runs.
Next steps:
Assuming the shared host test goes well, we can put together some Terraform config for spinning up one, configuring it to reduce the benchmark noise, and install / run the Github actions worker to start scheduling the jobs on there. The CI workflow itself will need some adjustments to use taskset
as part of its execution.
If running multiple benchmark runs on the same AWS dedicated host, in different instances, proves to cause interference, we can always consider using OVH or Hetzner to provision smaller dedicated hosts that shouldn't share any resources.
Disabling ASLR seems to generally have had a good effect. Here are two full, no code change, benchmark runs compared:
name time time change throughput throughput change change
partitioned_batching/partitioned_batching_none_2097152 9.5418 ms -0.6307% 999.47 MiB/s +0.6347% none
partitioned_batching/batching_none_2097152 5.1245 ms +0.6204% 1.8174 GiB/s -0.6166% none
partitioned_batching/partitioned_batching_none_512000 9.8532 ms +1.4346% 967.88 MiB/s -1.4143% none
partitioned_batching/batching_none_512000 5.3224 ms -0.6772% 1.7498 GiB/s +0.6818% none
partitioned_batching/partitioned_batching_gzip(6)_2097152 108.76 ms -0.0607% 87.689 MiB/s +0.0607% none
partitioned_batching/batching_gzip(6)_2097152 104.01 ms +0.1758% 91.692 MiB/s -0.1754% none
partitioned_batching/partitioned_batching_gzip(6)_512000 109.28 ms -0.1229% 87.265 MiB/s +0.1230% none
partitioned_batching/batching_gzip(6)_512000 104.40 ms +0.0481% 91.347 MiB/s -0.0480% none
buffers/in-memory 35.603 ms +0.4614% 26.786 MiB/s -0.4592% none
buffers/on-disk 88.006 ms +0.0559% 10.836 MiB/s -0.0558% none
create and insert single-level 655.84 ns +0.0239% none
iterate all fields single-level 210.16 ns +0.3099% none
create and insert nested-keys 1.1058 us -0.1336% none
iterate all fields nested-keys 544.33 ns -0.1342% none
create and insert array 1.1426 us +0.9559% none
iterate all fields array 506.95 ns -0.0390% none
files/files_without_partitions 13.334 ms -0.0475% 71.524 MiB/s +0.0475% none
http/compression/none 1.0059 s +0.0154% 97.086 KiB/s -0.0154% none
http/compression/gzip(6) 1.0062 s +0.0032% 97.051 KiB/s -0.0032% none
isolated_buffers/channels/futures01 19.598 ms -0.0435% 97.324 MiB/s +0.0436% none
isolated_buffers/channels/tokio 20.541 ms +0.1982% 92.855 MiB/s -0.1978% none
isolated_buffers/leveldb/writing 30.338 ms +0.5288% 62.869 MiB/s -0.5260% none
isolated_buffers/leveldb/reading 191.59 ms -0.1147% 9.9552 MiB/s +0.1148% none
isolated_buffers/leveldb/both 221.98 ms +0.2083% 8.5924 MiB/s -0.2078% none
from_string/simple_string 28.946 ns -2.3315% 428.31 MiB/s +2.3872% none
from_string/foo.bar.baz.bat[0] 25.563 ns -0.0205% 671.52 MiB/s +0.0205% none
from_string/foo[0].bar[0].baz[0] 25.599 ns -0.1995% 745.09 MiB/s +0.1999% none
from_string/foo[0].bar[0][0].baz 25.591 ns -1.3118% 745.32 MiB/s +1.3292% none
from_string/foo[0] 28.858 ns -0.1480% 198.28 MiB/s +0.1482% none
from_string/"boo\\"p" 28.759 ns +0.4753% 265.29 MiB/s -0.4730% none
from_string/p4th_wi7h.numb3r5 25.574 ns +0.0650% 633.95 MiB/s -0.0650% none
from_string/"boop" 28.943 ns +0.5304% 197.70 MiB/s -0.5276% none
from_string/"boop"."snoot" 28.909 ns -2.8687% 461.84 MiB/s +2.9534% none
from_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] 29.306 ns +0.8719% 2.7648 GiB/s -0.8644% none
to_string/simple_string 114.83 ns +0.4662% 107.97 MiB/s -0.4641% none
to_string/foo.bar.baz.bat[0] 114.27 ns +0.0124% 150.23 MiB/s -0.0124% none
to_string/foo[0].bar[0].baz[0] 114.25 ns -0.4903% 166.95 MiB/s +0.4927% none
to_string/foo[0].bar[0][0].baz 114.21 ns -0.3538% 167.00 MiB/s +0.3550% none
to_string/foo[0] 138.26 ns +0.0657% 41.387 MiB/s -0.0657% none
to_string/"boo\\"p" 114.72 ns -0.1715% 66.503 MiB/s +0.1718% none
to_string/p4th_wi7h.numb3r5 114.10 ns +0.3189% 142.09 MiB/s -0.3179% none
to_string/"boop" 138.12 ns -0.5676% 41.428 MiB/s +0.5709% none
to_string/"boop"."snoot" 114.73 ns -0.4797% 116.37 MiB/s +0.4820% none
to_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] 117.14 ns -0.1612% 708.31 MiB/s +0.1614% none
serialize/simple_string 177.10 ns -0.0765% 70.004 MiB/s +0.0766% none
serialize/foo.bar.baz.bat[0] 178.65 ns +0.9628% 96.088 MiB/s -0.9537% none
serialize/foo[0].bar[0].baz[0] 180.15 ns +0.0111% 105.88 MiB/s -0.0111% none
serialize/foo[0].bar[0][0].baz 179.62 ns +0.2285% 106.19 MiB/s -0.2280% none
serialize/foo[0] 198.66 ns -0.3313% 28.802 MiB/s +0.3324% none
serialize/"boo\\"p" 199.11 ns +0.5883% 38.317 MiB/s -0.5848% none
serialize/p4th_wi7h.numb3r5 178.64 ns +1.8516% 90.756 MiB/s -1.8179% none
serialize/"boop" 208.80 ns +0.6402% 27.405 MiB/s -0.6361% none
serialize/"boop"."snoot" 206.88 ns -0.5032% 64.538 MiB/s +0.5058% none
serialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] 267.86 ns +0.4064% 309.75 MiB/s -0.4048% none
deserialize/simple_string 528.91 ns +0.0766% 23.440 MiB/s -0.0766% none
deserialize/foo.bar.baz.bat[0] 1.2647 us -0.0156% 13.573 MiB/s +0.0156% none
deserialize/foo[0].bar[0].baz[0] 1.2520 us +0.2855% 15.235 MiB/s -0.2847% none
deserialize/foo[0].bar[0][0].baz 1.2593 us +0.4458% 15.147 MiB/s -0.4439% none
deserialize/foo[0] 582.92 ns +0.2717% 9.8162 MiB/s -0.2710% none
deserialize/"boo\\"p" 646.33 ns +0.0728% 11.804 MiB/s -0.0727% none
deserialize/p4th_wi7h.numb3r5 789.62 ns +0.0057% 20.532 MiB/s -0.0057% none
deserialize/"boop" 610.42 ns +0.1102% 9.3740 MiB/s -0.1101% none
deserialize/"boop"."snoot" 985.34 ns +0.8241% 13.550 MiB/s -0.8174% none
deserialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] 3.1338 us -0.3207% 26.476 MiB/s +0.3217% none
lua_add_fields/native 1.0536 us +0.5141% 949.16 Kelem/s -0.5115% none
lua_add_fields/v1 3.4836 us -1.5368% 287.06 Kelem/s +1.5607% improved
lua_add_fields/v2 4.4344 us -0.0176% 225.51 Kelem/s +0.0176% none
lua_field_filter/native 5.5279 us -0.1229% 1.8090 Melem/s +0.1231% none
lua_field_filter/v1 26.746 us -0.6117% 373.89 Kelem/s +0.6155% none
lua_field_filter/v2 184.10 us -0.4372% 54.318 Kelem/s +0.4391% none
regex/regex 65.307 us -0.6500% 23.043 MiB/s +0.6543% none
elasticsearch_indexes/dynamic 713.13 ns -0.2867% none
elasticsearch_indexes/static 167.34 ns +4.6310% none
pipe/pipe_simple 35.648 ms +0.1728% 26.753 MiB/s -0.1725% none
pipe/pipe_small_lines 27.886 ms -0.2368% 350.20 KiB/s +0.2374% none
pipe/pipe_big_lines 164.34 ms +0.1776% 116.06 MiB/s -0.1772% none
pipe/pipe_multiple_writers 36.800 ms +0.2054% 2.5915 MiB/s -0.2050% none
interconnected/interconnected 100.50 ms -0.0154% 18.978 MiB/s +0.0154% none
transforms/transforms 55.881 ms -0.0232% 18.773 MiB/s +0.0232% none
complex/complex 1.9204 s +0.0851% none
remap: add fields with remap 2.2485 us +0.4724% none
remap: add fields with add_fields 1.9589 us +0.0983% none
remap: parse JSON with remap 1.6926 us +0.0050% none
remap: parse JSON with json_parser 1.0709 us +0.5198% none
remap: coerce with remap 4.1799 us +0.0447% none
remap: coerce with coercer 1.8894 us +0.1248% none
upcase: literal_value 172.59 ns +0.0166% none
downcase: literal_value 175.44 ns +0.0158% none
parse_json: literal_value 300.81 ns +0.1700% none
parse_json: invalid_json_with_default 660.22 ns +0.1356% none
I'm going to run this repeatedly overnight to see how consistent this effect is.
Here are the change results over 20 runs (first column is count):
20 buffers/in-memory none
20 buffers/on-disk none
20 complex/complex none
1 create and insert array improved
18 create and insert array none
1 create and insert array regressed
20 create and insert nested-keys none
20 create and insert single-level none
20 deserialize/"boo\\"p" none
20 deserialize/"boop" none
20 deserialize/"boop"."snoot" none
20 deserialize/foo[0].bar[0][0].baz none
20 deserialize/foo[0].bar[0].baz[0] none
1 deserialize/foo[0] improved
19 deserialize/foo[0] none
20 deserialize/foo.bar.baz.bat[0] none
20 deserialize/p4th_wi7h.numb3r5 none
20 deserialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
20 deserialize/simple_string none
20 downcase: literal_value none
20 elasticsearch_indexes/dynamic none
20 elasticsearch_indexes/static none
20 files/files_without_partitions none
20 from_string/"boo\\"p" none
20 from_string/"boop" none
20 from_string/"boop"."snoot" none
20 from_string/foo[0].bar[0][0].baz none
1 from_string/foo[0].bar[0].baz[0] improved
18 from_string/foo[0].bar[0].baz[0] none
1 from_string/foo[0].bar[0].baz[0] regressed
20 from_string/foo[0] none
20 from_string/foo.bar.baz.bat[0] none
20 from_string/p4th_wi7h.numb3r5 none
20 from_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
20 from_string/simple_string none
20 http/compression/gzip(6) none
20 http/compression/none none
20 interconnected/interconnected none
20 isolated_buffers/channels/futures01 none
20 isolated_buffers/channels/tokio none
20 isolated_buffers/leveldb/both none
20 isolated_buffers/leveldb/reading none
20 isolated_buffers/leveldb/writing none
20 iterate all fields array none
20 iterate all fields nested-keys none
20 iterate all fields single-level none
20 lua_add_fields/native none
19 lua_add_fields/v1 none
1 lua_add_fields/v1 regressed
19 lua_add_fields/v2 none
1 lua_add_fields/v2 regressed
20 lua_field_filter/native none
1 lua_field_filter/v1 improved
18 lua_field_filter/v1 none
1 lua_field_filter/v1 regressed
6 lua_field_filter/v2 improved
8 lua_field_filter/v2 none
6 lua_field_filter/v2 regressed
20 parse_json: invalid_json_with_default none
2 parse_json: literal_value improved
16 parse_json: literal_value none
2 parse_json: literal_value regressed
20 partitioned_batching/batching_gzip(6)_2097152 none
20 partitioned_batching/batching_gzip(6)_512000 none
3 partitioned_batching/batching_none_2097152 improved
15 partitioned_batching/batching_none_2097152 none
2 partitioned_batching/batching_none_2097152 regressed
3 partitioned_batching/batching_none_512000 improved
17 partitioned_batching/batching_none_512000 none
20 partitioned_batching/partitioned_batching_gzip(6)_2097152 none
20 partitioned_batching/partitioned_batching_gzip(6)_512000 none
20 partitioned_batching/partitioned_batching_none_2097152 none
20 partitioned_batching/partitioned_batching_none_512000 none
20 pipe/pipe_big_lines none
20 pipe/pipe_multiple_writers none
20 pipe/pipe_simple none
20 pipe/pipe_small_lines none
5 regex/regex improved
9 regex/regex none
6 regex/regex regressed
20 remap: add fields with add_fields none
1 remap: add fields with remap improved
18 remap: add fields with remap none
1 remap: add fields with remap regressed
20 remap: coerce with coercer none
20 remap: coerce with remap none
20 remap: parse JSON with json_parser none
20 remap: parse JSON with remap none
20 serialize/"boo\\"p" none
20 serialize/"boop" none
20 serialize/"boop"."snoot" none
20 serialize/foo[0].bar[0][0].baz none
20 serialize/foo[0].bar[0].baz[0] none
20 serialize/foo[0] none
20 serialize/foo.bar.baz.bat[0] none
20 serialize/p4th_wi7h.numb3r5 none
20 serialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
20 serialize/simple_string none
20 to_string/"boo\\"p" none
20 to_string/"boop" none
19 to_string/"boop"."snoot" none
1 to_string/"boop"."snoot" regressed
19 to_string/foo[0].bar[0][0].baz none
1 to_string/foo[0].bar[0][0].baz regressed
1 to_string/foo[0].bar[0].baz[0] improved
19 to_string/foo[0].bar[0].baz[0] none
20 to_string/foo[0] none
1 to_string/foo.bar.baz.bat[0] improved
19 to_string/foo.bar.baz.bat[0] none
19 to_string/p4th_wi7h.numb3r5 none
1 to_string/p4th_wi7h.numb3r5 regressed
20 to_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
20 to_string/simple_string none
20 transforms/transforms none
20 upcase: literal_value none
Or just the improved/regressed (out of 20 runs):
1 create and insert array improved
1 create and insert array regressed
1 deserialize/foo[0] improved
1 from_string/foo[0].bar[0].baz[0] improved
1 from_string/foo[0].bar[0].baz[0] regressed
1 lua_add_fields/v1 regressed
1 lua_add_fields/v2 regressed
1 lua_field_filter/v1 improved
1 lua_field_filter/v1 regressed
6 lua_field_filter/v2 improved
6 lua_field_filter/v2 regressed
2 parse_json: literal_value improved
2 parse_json: literal_value regressed
3 partitioned_batching/batching_none_2097152 improved
2 partitioned_batching/batching_none_2097152 regressed
3 partitioned_batching/batching_none_512000 improved
5 regex/regex improved
6 regex/regex regressed
1 remap: add fields with remap improved
1 remap: add fields with remap regressed
1 to_string/"boop"."snoot" regressed
1 to_string/foo[0].bar[0][0].baz regressed
1 to_string/foo[0].bar[0].baz[0] improved
1 to_string/foo.bar.baz.bat[0] improved
1 to_string/p4th_wi7h.numb3r5 regressed
Note that if we see a regression or improvement, we typically expect to see the reverse when it returns to the baseline unless the first baseline was the anomalous one.
The partitioned batching, lua_field_filter/v2
and regex
ones seem a bit noise-y yet. We will likely want to dig into those a bit more. Somewhat surprisingly the high level topology and disk buffer benchmarks were consistent (likely with substantial stdev though).
Observed improvements / regressions:
create and insert array +1.1537% regressed
create and insert array -1.3325% improved
deserialize/foo[0] -1.2242% improved
from_string/foo[0].bar[0].baz[0] -4.8200% improved
from_string/foo[0].bar[0].baz[0] +5.1245% regressed
lua_add_fields/v1 +1.2977% regressed
lua_add_fields/v2 +1.3488% regressed
lua_field_filter/v1 -1.4196% improved
lua_field_filter/v1 +1.6972% regressed
lua_field_filter/v2 +1.0509% regressed
lua_field_filter/v2 +1.0938% regressed
lua_field_filter/v2 -1.1647% improved
lua_field_filter/v2 +1.3569% regressed
lua_field_filter/v2 +1.7262% regressed
lua_field_filter/v2 -1.8848% improved
lua_field_filter/v2 -2.0932% improved
lua_field_filter/v2 -2.1620% improved
lua_field_filter/v2 +2.1798% regressed
lua_field_filter/v2 -2.2583% improved
lua_field_filter/v2 +3.1216% regressed
lua_field_filter/v2 -3.6570% improved
parse_json: literal_value -1.2808% improved
parse_json: literal_value +1.2854% regressed
parse_json: literal_value -1.5913% improved
parse_json: literal_value +1.6182% regressed
partitioned_batching/batching_none_2097152 -5.0137% improved
partitioned_batching/batching_none_2097152 +5.5792% regressed
partitioned_batching/batching_none_2097152 -6.4220% improved
partitioned_batching/batching_none_2097152 -6.6038% improved
partitioned_batching/batching_none_2097152 +7.2048% regressed
partitioned_batching/batching_none_512000 -3.7311% improved
partitioned_batching/batching_none_512000 -4.5033% improved
partitioned_batching/batching_none_512000 -5.7850% improved
regex/regex +1.8791% regressed
regex/regex -1.9155% improved
regex/regex +2.0136% regressed
regex/regex -2.1610% improved
regex/regex +2.2352% regressed
regex/regex -2.4888% improved
regex/regex -2.6008% improved
regex/regex +2.7651% regressed
regex/regex -2.9066% improved
regex/regex +3.3316% regressed
regex/regex +6.0097% regressed
remap: add fields with remap -1.4700% improved
remap: add fields with remap +1.5678% regressed
to_string/"boop"."snoot" +1.2658% regressed
to_string/foo[0].bar[0][0].baz +1.9819% regressed
to_string/foo[0].bar[0].baz[0] -1.9835% improved
to_string/foo.bar.baz.bat[0] -2.1058% improved
to_string/p4th_wi7h.numb3r5 +3.7470% regressed
My recommendation is to try to take a closer look at the noise to see if there is a change we could make to reduce it, otherwise set the noise thresholds for partitioned batching to 10% and regex + lua_field_filter to 5%.
I'm going to try running multiple instances on the same dedicated host to ensure they won't be stepping on each other.
More updates:
I spun up a new c5 dedicated host and provisioned 8 c5.2xlarge instances on it to run benchmarks in parallel.
In addition to the previously mentioned consistency changes, I also disabled hyperthreading (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-optimize-cpu.html) to give each instance 4 dedicated cores. I was concerned that vCPUs might be partitioned across cores across instances so that a given instance might share a core with another, but I didn't verify if this is actually what might happen. I had noted that, when I previously provisioned the large instance CPU assignment seemed randomly distributed across cores rather than adjacent CPUs, for example CPU0 and CPU1, being on the same core. In any event, this should remove another variable.
I again ran the downcase
remap benchmark, due to its lack of inherent noise, 50 times on each of the 8 instances, in parallel:
parallel --jobs 0 --files --tmpdir 1 --tag --nonall --sshloginfile /tmp/hosts.txt --linebuffer 'cd vector ; for i in $(seq 1 50) ; do setarch x86_64 -R taskset -c 0 cargo bench --no-default-features --features "benches remap-benches" --bench remap downcase ; done'
CPU 0 was the isolated CPU on each instance. Note to self: there is a --no-run
flag for cargo bench
that we can use to compile the benchmarks using all available CPUs, before running them only on one.
I got the following counts for criterion's detection:
80 Change within noise threshold.
320 No change in performance detected.
I'm optimistic that this shows that there is not CPU interference for the instances.
I'm running a full benchmark run now to see how that looks. After that I'll set it up to run repeatedly overnight again.
I started putting some terraform + ansible config together simply as a way to facilitate my testing, but should be useful when we need to codify the benchmarking CI infrastructure assuming this approach continues panning out well.
I also noted that the partitioned batch benchmark that above is noted as showing noise is also the first benchmark that runs. It might be worth checking if changing the order has any effect.
EDIT I forgot to use taskset -c 0
so these weren't running on the isolated CPU.
One set of benchmarks (counts):
8 buffers/in-memory none
8 buffers/on-disk none
8 complex/complex none
7 create and insert array none
1 create and insert array regressed
8 create and insert nested-keys none
7 create and insert single-level none
1 create and insert single-level regressed
8 deserialize/"boo\\"p" none
8 deserialize/"boop" none
8 deserialize/"boop"."snoot" none
8 deserialize/foo[0].bar[0][0].baz none
8 deserialize/foo[0].bar[0].baz[0] none
1 deserialize/foo[0] improved
7 deserialize/foo[0] none
8 deserialize/foo.bar.baz.bat[0] none
8 deserialize/p4th_wi7h.numb3r5 none
8 deserialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
8 deserialize/simple_string none
5 downcase: literal_value none
3 downcase: literal_value regressed
8 elasticsearch_indexes/dynamic none
2 elasticsearch_indexes/static improved
5 elasticsearch_indexes/static none
1 elasticsearch_indexes/static regressed
8 files/files_without_partitions none
8 from_string/"boo\\"p" none
8 from_string/"boop" none
8 from_string/"boop"."snoot" none
8 from_string/foo[0].bar[0][0].baz none
8 from_string/foo[0].bar[0].baz[0] none
8 from_string/foo[0] none
8 from_string/foo.bar.baz.bat[0] none
8 from_string/p4th_wi7h.numb3r5 none
8 from_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
8 from_string/simple_string none
8 http/compression/gzip(6) none
8 http/compression/none none
1 interconnected/interconnected improved
7 interconnected/interconnected none
8 isolated_buffers/channels/futures01 none
8 isolated_buffers/channels/tokio none
8 isolated_buffers/leveldb/both none
8 isolated_buffers/leveldb/reading none
8 isolated_buffers/leveldb/writing none
8 iterate all fields array none
8 iterate all fields nested-keys none
8 iterate all fields single-level none
8 lua_add_fields/native none
8 lua_add_fields/v1 none
8 lua_add_fields/v2 none
1 lua_field_filter/native improved
7 lua_field_filter/native none
8 lua_field_filter/v1 none
3 lua_field_filter/v2 improved
1 lua_field_filter/v2 none
4 lua_field_filter/v2 regressed
1 parse_json: invalid_json_with_default improved
7 parse_json: invalid_json_with_default none
8 parse_json: literal_value none
8 partitioned_batching/batching_gzip(6)_2097152 none
8 partitioned_batching/batching_gzip(6)_512000 none
1 partitioned_batching/batching_none_2097152 improved
7 partitioned_batching/batching_none_2097152 none
8 partitioned_batching/batching_none_512000 none
8 partitioned_batching/partitioned_batching_gzip(6)_2097152 none
8 partitioned_batching/partitioned_batching_gzip(6)_512000 none
8 partitioned_batching/partitioned_batching_none_2097152 none
8 partitioned_batching/partitioned_batching_none_512000 none
8 pipe/pipe_big_lines none
8 pipe/pipe_multiple_writers none
8 pipe/pipe_simple none
8 pipe/pipe_small_lines none
2 regex/regex improved
4 regex/regex none
2 regex/regex regressed
8 remap: add fields with add_fields none
8 remap: add fields with remap none
8 remap: coerce with coercer none
8 remap: coerce with remap none
1 remap: parse JSON with json_parser improved
7 remap: parse JSON with json_parser none
8 remap: parse JSON with remap none
8 serialize/"boo\\"p" none
8 serialize/"boop" none
8 serialize/"boop"."snoot" none
8 serialize/foo[0].bar[0][0].baz none
8 serialize/foo[0].bar[0].baz[0] none
8 serialize/foo[0] none
8 serialize/foo.bar.baz.bat[0] none
8 serialize/p4th_wi7h.numb3r5 none
8 serialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
8 serialize/simple_string none
8 to_string/"boo\\"p" none
8 to_string/"boop" none
8 to_string/"boop"."snoot" none
8 to_string/foo[0].bar[0][0].baz none
8 to_string/foo[0].bar[0].baz[0] none
8 to_string/foo[0] none
8 to_string/foo.bar.baz.bat[0] none
8 to_string/p4th_wi7h.numb3r5 none
8 to_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
8 to_string/simple_string none
8 transforms/transforms none
2 upcase: literal_value improved
6 upcase: literal_value none
Time changed for the ones showing regressions or improvements:
create and insert array +1.1396% regressed
create and insert single-level +1.2012% regressed
deserialize/foo[0] -1.3533% improved
downcase: literal_value +1.2737% regressed
downcase: literal_value +2.0112% regressed
downcase: literal_value +2.0397% regressed
elasticsearch_indexes/static -10.050% improved
elasticsearch_indexes/static +11.517% regressed
elasticsearch_indexes/static -6.4488% improved
interconnected/interconnected -4.4868% improved
lua_field_filter/native -2.6885% improved
lua_field_filter/v2 -1.2712% improved
lua_field_filter/v2 +1.5681% regressed
lua_field_filter/v2 -1.6623% improved
lua_field_filter/v2 -1.8654% improved
lua_field_filter/v2 +2.8081% regressed
lua_field_filter/v2 +2.8288% regressed
lua_field_filter/v2 +3.6367% regressed
parse_json: invalid_json_with_default -1.0829% improved
regex/regex -1.7539% improved
regex/regex +1.7707% regressed
regex/regex +2.1820% regressed
regex/regex -3.9821% improved
remap: parse JSON with json_parser -3.9046% improved
upcase: literal_value -1.5229% improved
upcase: literal_value -1.8613% improved
Here we see the same ones we saw before on an isolated instance, but some surprising new ones as well including downcase and upcase.
I'm setting it up to run overnight again.
Bah, I just realized I missed specifying to run on the isolated CPU for those so they can be ignored. I'll use it in the overnight ones though.
Command:
parallel --jobs 0 --tag --nonall --sshloginfile /tmp/hosts.txt 'cd vector ; rm -f ~/benches.out ; nohup bash -c "for i in \$(seq 1 50) ; do taskset -c 0 setarch x86_64 -R cargo bench --no-default-features --features \"benches remap-benches\" 2>&1 | tee -a ~/benches.out ; done" &'
For some reason it died after only 2 runs, this is what I saw though for benchmarks that showed changes:
buffers/in-memory -1.2522% improved
buffers/on-disk -2.3786% improved
buffers/on-disk -3.2696% improved
create and insert array +1.0353% regressed
create and insert array -1.1738% improved
create and insert array +1.2479% regressed
create and insert array -1.2792% improved
create and insert array -1.3966% improved
create and insert array +1.8406% regressed
create and insert single-level -1.0600% improved
create and insert single-level +1.0981% regressed
create and insert single-level -1.4655% improved
deserialize/"boo\\"p" -2.2670% improved
downcase: literal_value +1.4512% regressed
downcase: literal_value -1.8926% improved
elasticsearch_indexes/static -11.430% improved
elasticsearch_indexes/static +9.7197% regressed
files/files_without_partitions -2.2181% improved
files/files_without_partitions +2.4195% regressed
files/files_without_partitions -4.4782% improved
isolated_buffers/leveldb/writing -1.3121% improved
lua_field_filter/v2 +1.2010% regressed
lua_field_filter/v2 -1.2824% improved
lua_field_filter/v2 +1.3524% regressed
lua_field_filter/v2 +1.4162% regressed
lua_field_filter/v2 -1.6233% improved
lua_field_filter/v2 +2.8390% regressed
pipe/pipe_multiple_writers +3.0693% regressed
pipe/pipe_simple +1.2415% regressed
upcase: literal_value +1.3713% regressed
upcase: literal_value +1.4149% regressed
upcase: literal_value +2.9005% regressed
Still some surprising ones in there like upcase and downcase that do make me think maybe there is some interference. I'm going to try to set it up again to execute more runs today.
Running in tmux this time:
parallel --jobs 0 --tag --nonall --sshloginfile /tmp/hosts.txt 'cd vector ; rm -f ~/benches.out ; tmux new-session -d -s "benchmarks" "for i in \$(seq 1 50) ; do taskset -c 0 setarch x86_64 -R cargo bench --no-default-features --features \"benches remap-benches\" 2>&1 | tee -a ~/benches.out ; done"
With approximately 10 parallel runs in for 8 instances, I'm seeing:
1 buffers/in-memory improved
1 create and insert nested-keys regressed
1 create and insert single-level improved
1 deserialize/foo[0].bar[0].baz[0] improved
1 elasticsearch_indexes/dynamic improved
1 from_string/foo.bar.baz.bat[0] regressed
1 interconnected/interconnected regressed
1 isolated_buffers/channels/futures01 regressed
1 isolated_buffers/leveldb/writing improved
1 lua_add_fields/v2 improved
1 lua_add_fields/v2 regressed
1 lua_field_filter/native improved
1 parse_json: invalid_json_with_default improved
1 parse_json: literal_value improved
1 partitioned_batching/batching_none_512000 regressed
1 pipe/pipe_multiple_writers improved
2 buffers/in-memory regressed
2 isolated_buffers/leveldb/writing regressed
2 iterate all fields single-level improved
2 iterate all fields single-level regressed
2 lua_field_filter/v1 regressed
2 partitioned_batching/batching_none_2097152 improved
2 partitioned_batching/batching_none_2097152 regressed
2 partitioned_batching/batching_none_512000 improved
2 pipe/pipe_simple improved
3 pipe/pipe_multiple_writers regressed
3 remap: coerce with coercer regressed
4 create and insert nested-keys improved
4 lua_add_fields/v1 improved
5 buffers/on-disk regressed
5 lua_add_fields/v1 regressed
6 deserialize/"boop"."snoot" improved
7 buffers/on-disk improved
7 deserialize/"boo\\"p" improved
7 deserialize/p4th_wi7h.numb3r5 improved
7 deserialize/simple_string improved
7 downcase: literal_value regressed
7 lua_add_fields/native regressed
8 deserialize/"boop" improved
8 deserialize/foo[0] improved
8 downcase: literal_value improved
8 elasticsearch_indexes/static improved
8 parse_json: literal_value regressed
9 create and insert array improved
9 create and insert array regressed
9 elasticsearch_indexes/static regressed
10 parse_json: invalid_json_with_default regressed
11 files/files_without_partitions improved
11 upcase: literal_value improved
12 lua_field_filter/v2 regressed
12 upcase: literal_value regressed
13 files/files_without_partitions regressed
15 regex/regex regressed
16 lua_field_filter/v2 improved
17 regex/regex improved
40 regex/regex none
44 lua_field_filter/v2 none
49 upcase: literal_value none
55 elasticsearch_indexes/static none
56 files/files_without_partitions none
57 downcase: literal_value none
61 parse_json: invalid_json_with_default none
62 create and insert array none
63 lua_add_fields/v1 none
63 parse_json: literal_value none
64 deserialize/"boop" none
64 deserialize/foo[0] none
65 deserialize/"boo\\"p" none
65 deserialize/p4th_wi7h.numb3r5 none
65 deserialize/simple_string none
65 lua_add_fields/native none
66 deserialize/"boop"."snoot" none
68 buffers/on-disk none
68 pipe/pipe_multiple_writers none
69 remap: coerce with coercer none
70 lua_add_fields/v2 none
70 lua_field_filter/v1 none
70 pipe/pipe_simple none
71 deserialize/foo[0].bar[0].baz[0] none
71 elasticsearch_indexes/dynamic none
71 interconnected/interconnected none
71 lua_field_filter/native none
72 complex/complex none
72 deserialize/foo[0].bar[0][0].baz none
72 deserialize/foo.bar.baz.bat[0] none
72 deserialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
72 pipe/pipe_big_lines none
72 pipe/pipe_small_lines none
72 remap: add fields with add_fields none
72 remap: add fields with remap none
72 remap: coerce with remap none
72 remap: parse JSON with json_parser none
72 remap: parse JSON with remap none
72 serialize/"boo\\"p" none
72 serialize/"boop" none
72 serialize/"boop"."snoot" none
72 serialize/foo[0].bar[0][0].baz none
72 serialize/foo[0] none
72 serialize/p4th_wi7h.numb3r5 none
72 serialize/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
72 transforms/transforms none
73 serialize/foo[0].bar[0].baz[0] none
75 create and insert nested-keys none
76 iterate all fields single-level none
76 partitioned_batching/batching_none_2097152 none
77 buffers/in-memory none
77 isolated_buffers/leveldb/writing none
77 partitioned_batching/batching_none_512000 none
78 serialize/foo.bar.baz.bat[0] none
79 create and insert single-level none
79 from_string/foo.bar.baz.bat[0] none
79 isolated_buffers/channels/futures01 none
79 serialize/simple_string none
79 to_string/"boo\\"p" none
79 to_string/"boop" none
79 to_string/"boop"."snoot" none
79 to_string/foo[0] none
79 to_string/p4th_wi7h.numb3r5 none
79 to_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
80 from_string/"boo\\"p" none
80 from_string/"boop" none
80 from_string/"boop"."snoot" none
80 from_string/foo[0].bar[0][0].baz none
80 from_string/foo[0].bar[0].baz[0] none
80 from_string/foo[0] none
80 from_string/p4th_wi7h.numb3r5 none
80 from_string/regular."quoted"."quoted but spaces"."quoted.but.periods".lookup[0].nested_lookup[0][0] none
80 from_string/simple_string none
80 http/compression/gzip(6) none
80 http/compression/none none
80 isolated_buffers/channels/tokio none
80 isolated_buffers/leveldb/both none
80 isolated_buffers/leveldb/reading none
80 iterate all fields array none
80 iterate all fields nested-keys none
80 partitioned_batching/batching_gzip(6)_2097152 none
80 partitioned_batching/batching_gzip(6)_512000 none
80 partitioned_batching/partitioned_batching_gzip(6)_2097152 none
80 partitioned_batching/partitioned_batching_gzip(6)_512000 none
80 partitioned_batching/partitioned_batching_none_2097152 none
80 partitioned_batching/partitioned_batching_none_512000 none
80 to_string/foo[0].bar[0][0].baz none
80 to_string/foo[0].bar[0].baz[0] none
80 to_string/foo.bar.baz.bat[0] none
80 to_string/simple_string none
Noise for ones showing changes:
buffers/in-memory -1.3711% improved
buffers/in-memory +1.5527% regressed
buffers/in-memory +1.8231% regressed
buffers/on-disk -1.5213% improved
buffers/on-disk -1.5325% improved
buffers/on-disk +1.7111% regressed
buffers/on-disk +1.9306% regressed
buffers/on-disk -2.0263% improved
buffers/on-disk +2.0283% regressed
buffers/on-disk -2.5212% improved
buffers/on-disk -2.6512% improved
buffers/on-disk -2.9310% improved
buffers/on-disk +3.2889% regressed
buffers/on-disk +4.0243% regressed
buffers/on-disk -4.2677% improved
create and insert array +1.0349% regressed
create and insert array -1.0901% improved
create and insert array -1.0954% improved
create and insert array +1.1982% regressed
create and insert array +1.2649% regressed
create and insert array -1.2713% improved
create and insert array +1.2844% regressed
create and insert array -1.4027% improved
create and insert array +1.4793% regressed
create and insert array +1.6641% regressed
create and insert array -1.6796% improved
create and insert array -2.0858% improved
create and insert array -2.5681% improved
create and insert array +2.7420% regressed
create and insert array -3.0450% improved
create and insert array -3.2899% improved
create and insert array +3.7951% regressed
create and insert array +4.3070% regressed
create and insert nested-keys -1.1215% improved
create and insert nested-keys +1.3393% regressed
create and insert nested-keys -1.4461% improved
create and insert nested-keys -1.5290% improved
create and insert nested-keys -1.7110% improved
create and insert single-level -1.0708% improved
deserialize/"boo\\"p" -1.8181% improved
deserialize/"boo\\"p" -1.8871% improved
deserialize/"boo\\"p" -2.0350% improved
deserialize/"boop" -2.0576% improved
deserialize/"boo\\"p" -2.0636% improved
deserialize/"boo\\"p" -2.0758% improved
deserialize/"boop" -2.0881% improved
deserialize/"boop" -2.1282% improved
deserialize/"boop" -2.1338% improved
deserialize/"boo\\"p" -2.2210% improved
deserialize/"boop" -2.2403% improved
deserialize/"boo\\"p" -2.2641% improved
deserialize/"boop" -2.4284% improved
deserialize/"boop" -2.5023% improved
deserialize/"boop" -2.6505% improved
deserialize/"boop"."snoot" -1.2351% improved
deserialize/"boop"."snoot" -1.5345% improved
deserialize/"boop"."snoot" -1.6256% improved
deserialize/"boop"."snoot" -1.6539% improved
deserialize/"boop"."snoot" -1.6968% improved
deserialize/"boop"."snoot" -2.9652% improved
deserialize/foo[0] -2.2525% improved
deserialize/foo[0] -2.3868% improved
deserialize/foo[0] -2.5564% improved
deserialize/foo[0] -2.5580% improved
deserialize/foo[0] -2.5777% improved
deserialize/foo[0] -2.5833% improved
deserialize/foo[0] -2.5848% improved
deserialize/foo[0] -2.6286% improved
deserialize/foo[0].bar[0].baz[0] -1.3599% improved
deserialize/p4th_wi7h.numb3r5 -1.4786% improved
deserialize/p4th_wi7h.numb3r5 -1.5360% improved
deserialize/p4th_wi7h.numb3r5 -1.6322% improved
deserialize/p4th_wi7h.numb3r5 -1.6541% improved
deserialize/p4th_wi7h.numb3r5 -1.6701% improved
deserialize/p4th_wi7h.numb3r5 -1.6793% improved
deserialize/p4th_wi7h.numb3r5 -1.7310% improved
deserialize/simple_string -2.7546% improved
deserialize/simple_string -2.7697% improved
deserialize/simple_string -2.7820% improved
deserialize/simple_string -2.8296% improved
deserialize/simple_string -2.8673% improved
deserialize/simple_string -2.8758% improved
deserialize/simple_string -2.8874% improved
downcase: literal_value -1.0828% improved
downcase: literal_value +1.1394% regressed
downcase: literal_value +1.2959% regressed
downcase: literal_value -1.3090% improved
downcase: literal_value -1.3122% improved
downcase: literal_value +1.3147% regressed
downcase: literal_value +1.3794% regressed
downcase: literal_value -1.3832% improved
downcase: literal_value -1.4240% improved
downcase: literal_value -2.0778% improved
downcase: literal_value -2.0806% improved
downcase: literal_value +2.1245% regressed
downcase: literal_value +2.1883% regressed
downcase: literal_value +2.6215% regressed
downcase: literal_value -2.6484% improved
elasticsearch_indexes/dynamic -1.5571% improved
elasticsearch_indexes/static +11.589% regressed
elasticsearch_indexes/static -11.714% improved
elasticsearch_indexes/static +13.580% regressed
elasticsearch_indexes/static +14.110% regressed
elasticsearch_indexes/static -16.122% improved
elasticsearch_indexes/static -16.453% improved
elasticsearch_indexes/static +17.450% regressed
elasticsearch_indexes/static -18.267% improved
elasticsearch_indexes/static +18.770% regressed
elasticsearch_indexes/static +5.5549% regressed
elasticsearch_indexes/static -6.1160% improved
elasticsearch_indexes/static +6.7624% regressed
elasticsearch_indexes/static +7.0514% regressed
elasticsearch_indexes/static -7.2085% improved
elasticsearch_indexes/static -8.1175% improved
elasticsearch_indexes/static -8.5284% improved
elasticsearch_indexes/static +8.9102% regressed
files/files_without_partitions +1.4317% regressed
files/files_without_partitions +1.5974% regressed
files/files_without_partitions -1.6904% improved
files/files_without_partitions +1.7598% regressed
files/files_without_partitions +1.8017% regressed
files/files_without_partitions -1.8675% improved
files/files_without_partitions +1.8964% regressed
files/files_without_partitions +1.9215% regressed
files/files_without_partitions -1.9583% improved
files/files_without_partitions +1.9855% regressed
files/files_without_partitions +1.9983% regressed
files/files_without_partitions -2.0847% improved
files/files_without_partitions +2.0872% regressed
files/files_without_partitions -2.1192% improved
files/files_without_partitions -2.1602% improved
files/files_without_partitions -2.2128% improved
files/files_without_partitions +2.2470% regressed
files/files_without_partitions -2.3917% improved
files/files_without_partitions +2.4962% regressed
files/files_without_partitions -3.1584% improved
files/files_without_partitions +3.3841% regressed
files/files_without_partitions -3.7645% improved
files/files_without_partitions -3.8745% improved
files/files_without_partitions +4.5545% regressed
from_string/foo.bar.baz.bat[0] +1.8566% regressed
interconnected/interconnected +1.2691% regressed
isolated_buffers/channels/futures01 +1.1189% regressed
isolated_buffers/leveldb/writing -1.1027% improved
isolated_buffers/leveldb/writing +1.1437% regressed
isolated_buffers/leveldb/writing +1.3786% regressed
iterate all fields single-level +2.4609% regressed
iterate all fields single-level -2.9326% improved
iterate all fields single-level -3.4496% improved
iterate all fields single-level +3.6805% regressed
lua_add_fields/native +3.2675% regressed
lua_add_fields/native +3.7453% regressed
lua_add_fields/native +3.8168% regressed
lua_add_fields/native +3.9488% regressed
lua_add_fields/native +4.1162% regressed
lua_add_fields/native +4.3567% regressed
lua_add_fields/native +4.6620% regressed
lua_add_fields/v1 +1.2655% regressed
lua_add_fields/v1 +1.2728% regressed
lua_add_fields/v1 -1.2986% improved
lua_add_fields/v1 +1.3715% regressed
lua_add_fields/v1 -1.4866% improved
lua_add_fields/v1 -1.5394% improved
lua_add_fields/v1 -1.5440% improved
lua_add_fields/v1 +1.5839% regressed
lua_add_fields/v1 +1.8893% regressed
lua_add_fields/v2 -1.2108% improved
lua_add_fields/v2 +1.3122% regressed
lua_field_filter/native -2.1350% improved
lua_field_filter/v1 +1.5440% regressed
lua_field_filter/v1 +1.5729% regressed
lua_field_filter/v2 -1.0960% improved
lua_field_filter/v2 +1.2496% regressed
lua_field_filter/v2 +1.3072% regressed
lua_field_filter/v2 -1.3363% improved
lua_field_filter/v2 +1.5011% regressed
lua_field_filter/v2 +1.5103% regressed
lua_field_filter/v2 -1.5331% improved
lua_field_filter/v2 +1.5486% regressed
lua_field_filter/v2 -1.5844% improved
lua_field_filter/v2 -1.7806% improved
lua_field_filter/v2 -1.7922% improved
lua_field_filter/v2 -1.8646% improved
lua_field_filter/v2 -1.9035% improved
lua_field_filter/v2 -1.9589% improved
lua_field_filter/v2 -2.0118% improved
lua_field_filter/v2 +2.0366% regressed
lua_field_filter/v2 +2.0472% regressed
lua_field_filter/v2 +2.2891% regressed
lua_field_filter/v2 -2.3003% improved
lua_field_filter/v2 -2.3495% improved
lua_field_filter/v2 -2.3962% improved
lua_field_filter/v2 -2.5001% improved
lua_field_filter/v2 -2.5706% improved
lua_field_filter/v2 -2.6490% improved
lua_field_filter/v2 +2.7958% regressed
lua_field_filter/v2 +3.0666% regressed
lua_field_filter/v2 +3.2312% regressed
lua_field_filter/v2 +4.1520% regressed
parse_json: invalid_json_with_default +1.1281% regressed
parse_json: invalid_json_with_default +1.5002% regressed
parse_json: invalid_json_with_default -1.5975% improved
parse_json: invalid_json_with_default +1.8601% regressed
parse_json: invalid_json_with_default +1.9882% regressed
parse_json: invalid_json_with_default +2.0861% regressed
parse_json: invalid_json_with_default +2.1100% regressed
parse_json: invalid_json_with_default +2.1817% regressed
parse_json: invalid_json_with_default +2.5531% regressed
parse_json: invalid_json_with_default +2.6278% regressed
parse_json: invalid_json_with_default +3.9368% regressed
parse_json: literal_value -1.0954% improved
parse_json: literal_value +4.1013% regressed
parse_json: literal_value +4.1126% regressed
parse_json: literal_value +4.1406% regressed
parse_json: literal_value +4.1497% regressed
parse_json: literal_value +4.1513% regressed
parse_json: literal_value +4.2276% regressed
parse_json: literal_value +4.2758% regressed
parse_json: literal_value +5.0644% regressed
pipe/pipe_multiple_writers +1.9539% regressed
pipe/pipe_multiple_writers +1.9554% regressed
pipe/pipe_multiple_writers +2.4000% regressed
pipe/pipe_multiple_writers -3.6813% improved
pipe/pipe_simple -1.2423% improved
pipe/pipe_simple -1.2522% improved
regex/regex +1.7689% regressed
regex/regex -1.8119% improved
regex/regex +1.8750% regressed
regex/regex +2.0251% regressed
regex/regex -2.0363% improved
regex/regex -2.1759% improved
regex/regex +2.1882% regressed
regex/regex -2.1927% improved
regex/regex -2.2076% improved
regex/regex -2.3017% improved
regex/regex -2.3748% improved
regex/regex -2.4218% improved
regex/regex -2.4537% improved
regex/regex -2.4622% improved
regex/regex -2.7060% improved
regex/regex +2.7374% regressed
regex/regex +2.7694% regressed
regex/regex -2.8921% improved
regex/regex +2.9263% regressed
regex/regex -3.1635% improved
regex/regex +3.2247% regressed
regex/regex -3.3485% improved
regex/regex +3.4294% regressed
regex/regex +3.4924% regressed
regex/regex +3.7009% regressed
regex/regex +3.8341% regressed
regex/regex -3.8743% improved
regex/regex -4.0575% improved
regex/regex +4.1375% regressed
regex/regex -5.4246% improved
regex/regex +5.8705% regressed
regex/regex +5.8809% regressed
remap: coerce with coercer +1.4079% regressed
remap: coerce with coercer +1.5097% regressed
remap: coerce with coercer +1.5639% regressed
upcase: literal_value -1.3672% improved
upcase: literal_value +1.3979% regressed
upcase: literal_value -1.4281% improved
upcase: literal_value -1.4405% improved
upcase: literal_value -1.4600% improved
upcase: literal_value -1.5317% improved
upcase: literal_value +1.5466% regressed
upcase: literal_value +1.5896% regressed
upcase: literal_value +1.6005% regressed
upcase: literal_value +1.7872% regressed
upcase: literal_value -1.8252% improved
upcase: literal_value -1.9470% improved
upcase: literal_value +2.3237% regressed
upcase: literal_value -2.3707% improved
upcase: literal_value +2.4424% regressed
upcase: literal_value -2.5006% improved
upcase: literal_value +2.5185% regressed
upcase: literal_value +2.7054% regressed
upcase: literal_value -2.7090% improved
upcase: literal_value +3.4441% regressed
upcase: literal_value -3.7689% improved
upcase: literal_value +4.0006% regressed
upcase: literal_value +4.0665% regressed
This is still more noise than I would have expected, especially for some of the remap and lookup benchmarks which should only exercise the CPU / memory as far as I can tell.
Even without any more changes, it does make me think we could probably use a noise threshold of 5% for most benchmarks.
I'll let it keep running to gather more data.
More updates:
It finished executing the 50 parallel runs * 8 runners = 400 runs.
Of the ones showing changes greater than +/- 5%, I saw the following counts:
1 buffers/on-disk improved
1 files/files_without_partitions improved
1 from_string/"boop"."snoot" regressed
1 from_string/foo[0] regressed
1 parse_json: literal_value regressed
1 upcase: literal_value improved
2 from_string/"boop"."snoot" improved
2 lua_field_filter/v2 improved
3 upcase: literal_value regressed
6 regex/regex improved
8 regex/regex regressed
59 elasticsearch_indexes/static improved
64 elasticsearch_indexes/static regressed
(again we expect to see regressions paired with improvements if there was really no change).
The elasticsearch_indexes/static
and regex
ones stand out as likely having a large amount of noise inherent to the benchmark. I'll lake a closer look at these two. For now, I'll mark the elasticsearch one as allowing for +/- 25% and regex allowing +/- 10% which would cover the maximum noise I saw. This should result in fairly few false positives until we reduce their noise.
I've uploaded the complete set of benchmark results in case we need to analyze them later. Each file in the archive is a separate runner.
For additional reference, this is the full count of changes I saw regardless of magnitude (default noise threshold is 1%):
1 create and insert single-level regressed
1 deserialize/foo[0].bar[0].baz[0] improved
1 elasticsearch_indexes/dynamic regressed
1 from_string/"boop"."snoot" regressed
1 from_string/foo[0] regressed
1 from_string/simple_string regressed
1 isolated_buffers/channels/futures01 improved
1 isolated_buffers/channels/tokio improved
1 isolated_buffers/channels/tokio regressed
1 isolated_buffers/leveldb/reading regressed
1 lua_add_fields/v2 improved
1 lua_add_fields/v2 regressed
1 parse_json: literal_value improved
1 pipe/pipe_big_lines improved
1 remap: add fields with remap regressed
1 transforms/transforms regressed
2 buffers/in-memory regressed
2 from_string/"boop"."snoot" improved
2 from_string/foo.bar.baz.bat[0] improved
2 interconnected/interconnected improved
2 interconnected/interconnected regressed
2 pipe/pipe_big_lines regressed
2 pipe/pipe_simple regressed
2 pipe/pipe_small_lines improved
2 pipe/pipe_small_lines regressed
2 transforms/transforms improved
3 create and insert single-level improved
3 from_string/foo.bar.baz.bat[0] regressed
3 isolated_buffers/channels/futures01 regressed
3 iterate all fields single-level improved
3 iterate all fields single-level regressed
3 remap: coerce with coercer regressed
4 buffers/in-memory improved
4 elasticsearch_indexes/dynamic improved
4 lua_field_filter/v1 improved
5 lua_field_filter/native improved
6 deserialize/"boop"."snoot" improved
6 lua_field_filter/native regressed
6 pipe/pipe_simple improved
7 deserialize/"boo\\"p" improved
7 deserialize/p4th_wi7h.numb3r5 improved
7 deserialize/simple_string improved
7 lua_add_fields/native regressed
8 deserialize/"boop" improved
8 deserialize/foo[0] improved
8 lua_field_filter/v1 regressed
8 parse_json: literal_value regressed
9 isolated_buffers/leveldb/writing regressed
11 create and insert nested-keys regressed
12 parse_json: invalid_json_with_default improved
14 create and insert nested-keys improved
14 isolated_buffers/leveldb/writing improved
14 pipe/pipe_multiple_writers regressed
18 lua_add_fields/v1 regressed
19 parse_json: invalid_json_with_default regressed
19 pipe/pipe_multiple_writers improved
23 lua_add_fields/v1 improved
26 buffers/on-disk improved
28 buffers/on-disk regressed
37 downcase: literal_value regressed
38 downcase: literal_value improved
41 files/files_without_partitions improved
47 files/files_without_partitions regressed
52 create and insert array improved
52 create and insert array regressed
61 elasticsearch_indexes/static improved
62 upcase: literal_value regressed
64 upcase: literal_value improved
66 elasticsearch_indexes/static regressed
78 regex/regex improved
83 regex/regex regressed
93 lua_field_filter/v2 improved
99 lua_field_filter/v2 regressed
This may point to a smaller level of noise inherent in some of the benchmarks that show a lot of changes.
I'm going to close this issue as done for this pass. I'm sure we'll improve things here, but it's gotten into a pretty good state with configuring custom benchmark runners and configuring noise thresholds.
In order to be able to more accurately compare benchmarks over time and between PRs and
master
, we'd like to ensure that the environment the benches run are as similar as possible to avoid measurement changes due to differences in the benchmark environment.Some options: