Open heckj opened 3 months ago
huh, lemme try running this on x86_64 when i get a chance
my results are quite different. with the exception of disk2d, most of the benchmarks show a modest improvement. i don’t know what’s going on with the 99th percentiles though. maybe it was run in a noisier environment.
Host '832f7bfa3820' with 12 'x86_64' processors with 30 GB memory, running:
#35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2
| Time (wall clock) (ns) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 1207 | 1258 | 1261 | 1273 | 1309 | 1559 | 40876 | 608656 | | Current_run | 1084 | 1135 | 1142 | 1156 | 1255 | 4287 | 51909 | 588363 | | Δ | -123 | -123 | -119 | -117 | -54 | 2728 | 11033 | -20293 | | Improvement % | 10 | 10 | 9 | 9 | 4 | -175 | -27 | -20293 |
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 828 | 795 | 793 | 786 | 764 | 642 | 24 | 608656 | | Current_run | 923 | 881 | 876 | 865 | 797 | 233 | 19 | 588363 | | Δ | 95 | 86 | 83 | 79 | 33 | -409 | -5 | -20293 | | Improvement % | 11 | 11 | 10 | 10 | 4 | -64 | -21 | -20293 |
| Time (wall clock) (ns) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 1237 | 1257 | 1261 | 1274 | 1337 | 4057 | 53729 | 576524 | | Current_run | 1092 | 1136 | 1143 | 1153 | 1223 | 3831 | 44478 | 626635 | | Δ | -145 | -121 | -118 | -121 | -114 | -226 | -9251 | 50111 | | Improvement % | 12 | 10 | 9 | 9 | 9 | 6 | 17 | 50111 |
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 808 | 796 | 793 | 785 | 748 | 247 | 19 | 576524 | | Current_run | 916 | 881 | 875 | 867 | 818 | 261 | 22 | 626635 | | Δ | 108 | 85 | 82 | 82 | 70 | 14 | 3 | 50111 | | Improvement % | 13 | 11 | 10 | 10 | 9 | 6 | 16 | 50111 |
| Time (wall clock) (ns) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 1237 | 1260 | 1264 | 1281 | 1314 | 4017 | 74913 | 589676 | | Current_run | 1110 | 1140 | 1148 | 1169 | 1251 | 4219 | 95476 | 602537 | | Δ | -127 | -120 | -116 | -112 | -63 | 202 | 20563 | 12861 | | Improvement % | 10 | 10 | 9 | 9 | 5 | -5 | -27 | 12861 |
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 808 | 794 | 792 | 781 | 761 | 249 | 13 | 589676 | | Current_run | 901 | 878 | 871 | 856 | 800 | 237 | 10 | 602537 | | Δ | 93 | 84 | 79 | 75 | 39 | -12 | -3 | 12861 | | Improvement % | 12 | 11 | 10 | 10 | 5 | -5 | -23 | 12861 |
| Time (wall clock) (μs) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 10 | 10 | 10 | 10 | 10 | 14 | 78 | 94960 | | Current_run | 10 | 10 | 10 | 10 | 10 | 38 | 125 | 91262 | | Δ | 0 | 0 | 0 | 0 | 0 | 24 | 47 | -3698 | | Improvement % | 0 | 0 | 0 | 0 | 0 | -171 | -60 | -3698 |
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 102 | 102 | 101 | 100 | 98 | 70 | 13 | 94960 | | Current_run | 104 | 103 | 103 | 101 | 98 | 26 | 8 | 91262 | | Δ | 2 | 1 | 2 | 1 | 0 | -44 | -5 | -3698 | | Improvement % | 2 | 1 | 2 | 1 | 0 | -63 | -38 | -3698 |
| Time (wall clock) (ns) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 1239 | 1261 | 1264 | 1275 | 1304 | 1489 | 40901 | 614952 | | Current_run | 1110 | 1142 | 1149 | 1160 | 1230 | 3821 | 41111 | 638164 | | Δ | -129 | -119 | -115 | -115 | -74 | 2332 | 210 | 23212 | | Improvement % | 10 | 9 | 9 | 9 | 6 | -157 | -1 | 23212 |
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 807 | 793 | 792 | 784 | 767 | 672 | 24 | 614952 | | Current_run | 901 | 876 | 870 | 862 | 813 | 262 | 24 | 638164 | | Δ | 94 | 83 | 78 | 78 | 46 | -410 | 0 | 23212 | | Improvement % | 12 | 10 | 10 | 10 | 6 | -61 | 0 | 23212 |
| Time (wall clock) (μs) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 10 | 10 | 10 | 10 | 11 | 29 | 128 | 90948 | | Current_run | 10 | 10 | 10 | 10 | 10 | 37 | 115 | 89509 | | Δ | 0 | 0 | 0 | 0 | -1 | 8 | -13 | -1439 | | Improvement % | 0 | 0 | 0 | 0 | 9 | -28 | 10 | -1439 |
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 100 | 99 | 99 | 98 | 95 | 35 | 8 | 90948 | | Current_run | 102 | 101 | 101 | 99 | 96 | 27 | 9 | 89509 | | Δ | 2 | 2 | 2 | 1 | 1 | -8 | 1 | -1439 | | Improvement % | 2 | 2 | 2 | 1 | 1 | -23 | 12 | -1439 |
| Time (wall clock) (ms) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 11 | 11 | 11 | 12 | 12 | 17 | 17 | 88 | | Current_run | 16 | 16 | 16 | 16 | 16 | 18 | 18 | 63 | | Δ | 5 | 5 | 5 | 4 | 4 | 1 | 1 | -25 | | Improvement % | -45 | -45 | -45 | -33 | -33 | -6 | -6 | -25 |
| Throughput (# / s) (#) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 94 | 91 | 90 | 87 | 84 | 59 | 59 | 88 | | Current_run | 64 | 63 | 62 | 62 | 61 | 56 | 56 | 63 | | Δ | -30 | -28 | -28 | -25 | -23 | -3 | -3 | -25 | | Improvement % | -32 | -31 | -31 | -29 | -27 | -5 | -5 | -25 |
| Time (wall clock) (μs) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 10 | 10 | 10 | 10 | 10 | 38 | 102 | 88539 | | Current_run | 10 | 10 | 10 | 10 | 11 | 37 | 104 | 86683 | | Δ | 0 | 0 | 0 | 0 | 1 | -1 | 2 | -1856 | | Improvement % | 0 | 0 | 0 | 0 | -10 | 3 | -2 | -1856 |
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 102 | 101 | 101 | 100 | 97 | 27 | 10 | 88539 | | Current_run | 103 | 100 | 99 | 97 | 93 | 27 | 10 | 86683 | | Δ | 1 | -1 | -2 | -3 | -4 | 0 | 0 | -1856 | | Improvement % | 1 | -1 | -2 | -3 | -4 | 0 | 0 | -1856 |
| Time (wall clock) (μs) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 10 | 10 | 10 | 10 | 10 | 28 | 103 | 93008 | | Current_run | 10 | 10 | 10 | 10 | 10 | 30 | 95 | 93813 | | Δ | 0 | 0 | 0 | 0 | 0 | 2 | -8 | 805 | | Improvement % | 0 | 0 | 0 | 0 | 0 | -7 | 8 | 805 |
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08.x86_64 | 102 | 102 | 101 | 100 | 98 | 36 | 10 | 93008 | | Current_run | 103 | 103 | 102 | 101 | 99 | 33 | 11 | 93813 | | Δ | 1 | 1 | 1 | 1 | 1 | -3 | 1 | 805 | | Improvement % | 1 | 1 | 1 | 1 | 1 | -8 | 10 | 805 |
Well that's positive at least! Just to check - did you make your own baseline to compare against, or did you compare against the built-in one, because I generated that on an M1, so I wouldn't expect it to be terribly accurate for Intel?
In any case, I thought I'd do a little profiling using the generate-noise (what had previously been in TestNoise) executable and see what hot spots existed, comparing the two from that perspective. I'm not experienced at optimizing at this level, so this'll be a good learning experience for me.
yes, i had created a second baseline locally named bdb4ef08.x86_64
Well, that's reassuring then. It shows the SIMD stuff is actually adding value (at least a little) on Intel, even if it seems to be a pretty annoying regression on Arm. I'll keep looking.
Image showing a comparison of generate-noise
- SIMD version (the one that's slightly slower) on top, original code below.
11 vs 15 ms for the Math.multi
vs SIMD3 &*
is the biggest thing that stands out to me, but almost seems counterintuitive. And since this is with Instruments, I've no explanation (or path to understanding at the moment) why the same code-path is notably faster on Intel hardware, but slower on ARM64.
hmm. maybe extract one of the functions that uses Math.mult
and look at its Godbolt to see if anything stands out? if i remember correctly during the Swift 3 days, SIMD frequently performed worse than scalar multiplication because LLVM was constantly packing and unpacking the scalars to and from the SIMD registers. but then again, these are not the Swift 3 days anymore so maybe it’s a completely different issue.
barring that, we could ask for help from a numerics expert but i’m not sure if calling in the cavalry is warranted yet
I'll give it a look after a bit - more I'm just confounded with this unexpected result. Godbolt seems like a really good idea, I'll give a look
Since we were talking about this, I took the time to set it up - but after all the conversions, it turns out thats only HURT performance (based on benchmark comparison).
swift package benchmark baseline compare bdb4ef08 --format markdown
:Comparing results between 'bdb4ef08' and 'Current_run'
ExternalBenchmarks
cell2d metrics
Time (wall clock): results within specified thresholds, fold down for details.
| Time (wall clock) (ns) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 458 | 583 | 583 | 584 | 625 | 625 | 38292 | 1048576 | | Current_run | 708 | 792 | 792 | 833 | 834 | 875 | 54416 | 923477 | | Δ | 250 | 209 | 209 | 249 | 209 | 250 | 16124 | -125099 | | Improvement % | -55 | -36 | -36 | -43 | -33 | -40 | -42 | -125099 |
Throughput (# / s): results within specified thresholds, fold down for details.
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 2183 | 1716 | 1716 | 1713 | 1601 | 1601 | 26 | 1048576 | | Current_run | 1412 | 1264 | 1264 | 1201 | 1199 | 1144 | 18 | 923477 | | Δ | -771 | -452 | -452 | -512 | -402 | -457 | -8 | -125099 | | Improvement % | -35 | -26 | -26 | -30 | -25 | -29 | -31 | -125099 |
cell3d metrics
Time (wall clock): results within specified thresholds, fold down for details.
| Time (wall clock) (ns) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 458 | 542 | 583 | 583 | 584 | 625 | 32667 | 1048576 | | Current_run | 708 | 792 | 792 | 833 | 834 | 875 | 51083 | 922510 | | Δ | 250 | 250 | 209 | 250 | 250 | 250 | 18416 | -126066 | | Improvement % | -55 | -46 | -36 | -43 | -43 | -40 | -56 | -126066 |
Throughput (# / s): results within specified thresholds, fold down for details.
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 2183 | 1845 | 1716 | 1716 | 1713 | 1601 | 31 | 1048576 | | Current_run | 1412 | 1264 | 1264 | 1201 | 1199 | 1144 | 20 | 922510 | | Δ | -771 | -581 | -452 | -515 | -514 | -457 | -11 | -126066 | | Improvement % | -35 | -31 | -26 | -30 | -30 | -29 | -35 | -126066 |
cell_tiling3d metrics
Time (wall clock): results within specified thresholds, fold down for details.
| Time (wall clock) (ns) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 458 | 583 | 583 | 584 | 625 | 666 | 66667 | 1048576 | | Current_run | 708 | 792 | 792 | 833 | 834 | 875 | 51625 | 916598 | | Δ | 250 | 209 | 209 | 249 | 209 | 209 | -15042 | -131978 | | Improvement % | -55 | -36 | -36 | -43 | -33 | -31 | 23 | -131978 |
Throughput (# / s): results within specified thresholds, fold down for details.
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 2183 | 1716 | 1716 | 1713 | 1601 | 1502 | 15 | 1048576 | | Current_run | 1412 | 1264 | 1264 | 1201 | 1199 | 1144 | 19 | 916598 | | Δ | -771 | -452 | -452 | -512 | -402 | -358 | 4 | -131978 | | Improvement % | -35 | -26 | -26 | -30 | -25 | -24 | 27 | -131978 |
classic3d metrics
Time (wall clock): results within specified thresholds, fold down for details.
| Time (wall clock) (μs) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 6750 | 6875 | 6919 | 7127 | 7211 | 7543 | 72959 | 138634 | | Current_run | 9875 | 10047 | 10087 | 10087 | 10127 | 10295 | 60750 | 95852 | | Δ | 3125 | 3172 | 3168 | 2960 | 2916 | 2752 | -12209 | -42782 | | Improvement % | -46 | -46 | -46 | -42 | -40 | -36 | 17 | -42782 |
Throughput (# / s): results within specified thresholds, fold down for details.
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 148 | 146 | 145 | 140 | 139 | 133 | 14 | 138634 | | Current_run | 101 | 100 | 99 | 99 | 99 | 97 | 16 | 95852 | | Δ | -47 | -46 | -46 | -41 | -40 | -36 | 2 | -42782 | | Improvement % | -32 | -32 | -32 | -29 | -29 | -27 | 14 | -42782 |
classic_tiling3d metrics
Time (wall clock): results within specified thresholds, fold down for details.
| Time (wall clock) (ns) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 708 | 792 | 833 | 833 | 834 | 916 | 35958 | 1009758 | | Current_run | 1083 | 1208 | 1208 | 1209 | 1250 | 1292 | 47791 | 669209 | | Δ | 375 | 416 | 375 | 376 | 416 | 376 | 11833 | -340549 | | Improvement % | -53 | -53 | -45 | -45 | -50 | -41 | -33 | -340549 |
Throughput (# / s): results within specified thresholds, fold down for details.
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 1412 | 1264 | 1201 | 1201 | 1199 | 1093 | 28 | 1009758 | | Current_run | 923 | 828 | 828 | 827 | 800 | 774 | 21 | 669209 | | Δ | -489 | -436 | -373 | -374 | -399 | -319 | -7 | -340549 | | Improvement % | -35 | -34 | -31 | -31 | -33 | -29 | -25 | -340549 |
classic_tiling_fbm3d metrics
Time (wall clock): results within specified thresholds, fold down for details.
| Time (wall clock) (μs) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 6833 | 6959 | 7003 | 7003 | 7087 | 7583 | 47125 | 138651 | | Current_run | 9958 | 10127 | 10167 | 10167 | 10215 | 10335 | 57084 | 95148 | | Δ | 3125 | 3168 | 3164 | 3164 | 3128 | 2752 | 9959 | -43503 | | Improvement % | -46 | -46 | -45 | -45 | -44 | -36 | -21 | -43503 |
Throughput (# / s): results within specified thresholds, fold down for details.
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 146 | 144 | 143 | 143 | 141 | 132 | 21 | 138651 | | Current_run | 100 | 99 | 98 | 98 | 98 | 97 | 18 | 95148 | | Δ | -46 | -45 | -45 | -45 | -43 | -35 | -3 | -43503 | | Improvement % | -32 | -31 | -31 | -31 | -30 | -27 | -14 | -43503 |
disk2d metrics
Time (wall clock): results within specified thresholds, fold down for details.
| Time (wall clock) (ms) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 9314 | 9339 | 9347 | 9388 | 9486 | 9609 | 9630 | 107 | | Current_run | 24508 | 24576 | 24707 | 24969 | 30228 | 43058 | 43058 | 39 | | Δ | 15194 | 15237 | 15360 | 15581 | 20742 | 33449 | 33428 | -68 | | Improvement % | -163 | -163 | -164 | -166 | -219 | -348 | -347 | -68 |
Throughput (# / s): results within specified thresholds, fold down for details.
| Throughput (# / s) (#) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 107 | 107 | 107 | 107 | 105 | 104 | 104 | 107 | | Current_run | 41 | 41 | 40 | 40 | 33 | 23 | 23 | 39 | | Δ | -66 | -66 | -67 | -67 | -72 | -81 | -81 | -68 | | Improvement % | -62 | -62 | -63 | -63 | -69 | -78 | -78 | -68 |
gradient2d metrics
Time (wall clock): results within specified thresholds, fold down for details.
| Time (wall clock) (μs) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 6750 | 6875 | 6919 | 6959 | 7003 | 7503 | 42417 | 140146 | | Current_run | 9916 | 10047 | 10087 | 10127 | 10167 | 10295 | 96958 | 95770 | | Δ | 3166 | 3172 | 3168 | 3168 | 3164 | 2792 | 54541 | -44376 | | Improvement % | -47 | -46 | -46 | -46 | -45 | -37 | -129 | -44376 |
Throughput (# / s): results within specified thresholds, fold down for details.
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 148 | 146 | 145 | 144 | 143 | 133 | 24 | 140146 | | Current_run | 101 | 100 | 99 | 99 | 98 | 97 | 10 | 95770 | | Δ | -47 | -46 | -46 | -45 | -45 | -36 | -14 | -44376 | | Improvement % | -32 | -32 | -32 | -31 | -31 | -27 | -58 | -44376 |
gradient3d metrics
Time (wall clock): results within specified thresholds, fold down for details.
| Time (wall clock) (μs) * | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 6791 | 6919 | 6919 | 6959 | 7003 | 7503 | 44709 | 139935 | | Current_run | 9916 | 10047 | 10087 | 10127 | 10167 | 10295 | 75500 | 95903 | | Δ | 3125 | 3128 | 3168 | 3168 | 3164 | 2792 | 30791 | -44032 | | Improvement % | -46 | -45 | -46 | -46 | -45 | -37 | -69 | -44032 |
Throughput (# / s): results within specified thresholds, fold down for details.
| Throughput (# / s) (K) | p0 | p25 | p50 | p75 | p90 | p99 | p100 | Samples | |:----------------------------------------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:| | bdb4ef08 | 147 | 145 | 145 | 144 | 143 | 133 | 22 | 139935 | | Current_run | 101 | 100 | 99 | 99 | 98 | 97 | 13 | 95903 | | Δ | -46 | -45 | -46 | -45 | -45 | -36 | -9 | -44032 | | Improvement % | -31 | -31 | -32 | -31 | -31 | -27 | -41 | -44032 |