Introduce Overflow Trackers, with features to select the desired variant.
Introduce Displacements, conditional on the Overflow Tracker variant tracking removals.
Adjust insertion/removal of items in RawTable to properly track overflow and displacement.
Adjust find in RawTable to short-circuit probe sequence when overflow tracking ensure there is no need to probe further.
OF NOTE: enforce group alignment.
Motivation:
Overflow tracking allows cutting a probing sequence short, which may be beneficial.
The use of a multitude of variants makes it easier to test and benchmark all variants, thus making it easier to pick the right one... or not pick any.
The groups are now forcibly aligned because overflow tracking is performed on a group basis, and does not work with "floating" groups.
Design:
Overflow trackers and displacements are tacked at the end of the allocation, and their access is minimized, so that their performance impact is minimized.
In particular:
An element which does not overflow on insertion need not trigger a write to any overflow tracker, nor to its displacement.
Only if removals are tracked is the displacement read on removal.
Only if removals are tracked and the displacement is non-0 are overflow trackers written to on removal.
This follows the philosophy of "You Don't Pay For What You Don't Use", and makes the impact as minimal as can be.
Benchmarks:
Methodology: each variant was benchmarked 3 times, and for each benchmark the best result was picked. Then all results were normalized on the current master for ease of comparison.
Benchmark
master
none
bloom-1-u8
bloom-1-u16
counter-u8
hybrid
clone_from_large
100% (+/-19.77%)
+0.00% (+/-0.20%)
+0.17% (+/-0.10%)
+0.00% (+/-0.20%)
-0.94% (+/-0.00%)
+1.18% (+/-0.18%)
clone_from_small
100% (+/-6.82%)
+0.00% (+/-0.07%)
+2.27% (+/-0.20%)
+2.27% (+/-0.04%)
+0.00% (+/-0.25%)
+0.00% (+/-0.05%)
clone_large
100% (+/-8.86%)
+0.00% (+/-0.09%)
+1.24% (+/-0.14%)
-0.66% (+/-0.07%)
-0.86% (+/-0.09%)
-1.04% (+/-0.07%)
clone_small
100% (+/-9.09%)
+0.00% (+/-0.09%)
+3.64% (+/-0.05%)
+1.82% (+/-0.07%)
+0.00% (+/-0.07%)
+1.82% (+/-0.04%)
grow_insert_ahash_highbits
100% (+/-4.54%)
+0.00% (+/-0.05%)
+0.24% (+/-0.03%)
-0.65% (+/-0.00%)
-0.51% (+/-0.05%)
+2.29% (+/-0.00%)
grow_insert_ahash_random
100% (+/-0.02%)
+0.00% (+/-0.00%)
+2.83% (+/-0.00%)
+0.88% (+/-0.00%)
+0.53% (+/-0.00%)
+1.58% (+/-0.00%)
grow_insert_ahash_serial
100% (+/-0.01%)
+0.00% (+/-0.00%)
+0.85% (+/-0.05%)
+0.22% (+/-0.00%)
+1.46% (+/-0.00%)
+4.13% (+/-0.00%)
grow_insert_std_highbits
100% (+/-0.00%)
+0.00% (+/-0.00%)
+0.81% (+/-0.00%)
+1.54% (+/-0.00%)
+0.14% (+/-0.00%)
+0.93% (+/-0.00%)
grow_insert_std_random
100% (+/-1.61%)
+0.00% (+/-0.02%)
+4.05% (+/-0.00%)
+2.37% (+/-0.00%)
+3.96% (+/-0.00%)
+3.10% (+/-0.00%)
grow_insert_std_serial
100% (+/-0.00%)
+0.00% (+/-0.00%)
+4.50% (+/-0.00%)
+3.71% (+/-0.00%)
+1.83% (+/-0.00%)
+5.21% (+/-0.00%)
insert_ahash_highbits
100% (+/-0.01%)
+0.00% (+/-0.00%)
+2.64% (+/-0.00%)
+1.21% (+/-0.00%)
+2.07% (+/-0.00%)
+1.45% (+/-0.00%)
insert_ahash_random
100% (+/-0.01%)
+0.00% (+/-0.00%)
+6.36% (+/-0.00%)
+0.48% (+/-0.00%)
+0.62% (+/-0.00%)
+0.38% (+/-0.00%)
insert_ahash_serial
100% (+/-3.56%)
+0.00% (+/-0.04%)
+5.62% (+/-0.00%)
+5.34% (+/-0.00%)
-0.12% (+/-0.00%)
+0.20% (+/-0.00%)
insert_erase_ahash_highbits
100% (+/-4.64%)
+0.00% (+/-0.05%)
+2.98% (+/-0.05%)
+3.52% (+/-0.00%)
+3.19% (+/-0.04%)
+7.18% (+/-0.00%)
insert_erase_ahash_random
100% (+/-0.01%)
+0.00% (+/-0.00%)
+2.59% (+/-0.00%)
+3.44% (+/-0.00%)
+2.80% (+/-0.00%)
+4.72% (+/-0.03%)
insert_erase_ahash_serial
100% (+/-0.01%)
+0.00% (+/-0.00%)
+0.50% (+/-0.06%)
+0.83% (+/-0.00%)
+5.17% (+/-0.00%)
+3.54% (+/-0.02%)
insert_erase_std_highbits
100% (+/-0.01%)
+0.00% (+/-0.00%)
+2.06% (+/-0.00%)
+2.07% (+/-0.00%)
+0.14% (+/-0.00%)
+0.40% (+/-0.03%)
insert_erase_std_random
100% (+/-0.01%)
+0.00% (+/-0.00%)
-0.06% (+/-0.00%)
+0.84% (+/-0.00%)
-1.83% (+/-0.00%)
+0.95% (+/-0.00%)
insert_erase_std_serial
100% (+/-1.97%)
+0.00% (+/-0.02%)
+4.26% (+/-0.00%)
+4.75% (+/-0.00%)
-0.75% (+/-0.00%)
+2.14% (+/-0.00%)
insert_std_highbits
100% (+/-0.00%)
+0.00% (+/-0.00%)
+0.35% (+/-0.00%)
-0.69% (+/-0.00%)
-1.61% (+/-0.04%)
-1.21% (+/-0.00%)
insert_std_random
100% (+/-0.00%)
+0.00% (+/-0.00%)
-2.34% (+/-0.00%)
-0.57% (+/-0.00%)
-0.69% (+/-0.00%)
+0.45% (+/-0.00%)
insert_std_serial
100% (+/-2.18%)
+0.00% (+/-0.02%)
-2.24% (+/-0.00%)
-2.86% (+/-0.05%)
+0.69% (+/-0.00%)
+1.62% (+/-0.00%)
iter_ahash_highbits
100% (+/-10.23%)
+0.00% (+/-0.10%)
+3.41% (+/-0.12%)
-1.46% (+/-0.07%)
-0.32% (+/-0.11%)
-0.97% (+/-0.06%)
iter_ahash_random
100% (+/-3.57%)
+0.00% (+/-0.04%)
+1.95% (+/-0.08%)
-0.97% (+/-0.06%)
-0.65% (+/-0.07%)
-0.81% (+/-0.05%)
iter_ahash_serial
100% (+/-8.93%)
+0.00% (+/-0.09%)
+2.60% (+/-0.09%)
-0.97% (+/-0.06%)
-0.81% (+/-0.04%)
-0.49% (+/-0.05%)
iter_std_highbits
100% (+/-4.52%)
+0.00% (+/-0.05%)
+2.42% (+/-0.09%)
-0.48% (+/-0.06%)
+0.65% (+/-0.13%)
-0.16% (+/-0.06%)
iter_std_random
100% (+/-5.47%)
+0.00% (+/-0.05%)
-0.16% (+/-0.12%)
-0.80% (+/-0.07%)
+0.64% (+/-0.08%)
+0.32% (+/-0.06%)
iter_std_serial
100% (+/-6.44%)
+0.00% (+/-0.06%)
+1.77% (+/-0.07%)
+0.64% (+/-0.08%)
+1.93% (+/-0.02%)
+0.16% (+/-0.05%)
lookup_ahash_highbits
100% (+/-4.26%)
+0.00% (+/-0.04%)
+4.47% (+/-0.12%)
+1.63% (+/-0.10%)
-1.20% (+/-0.07%)
+1.02% (+/-0.07%)
lookup_ahash_random
100% (+/-5.24%)
+0.00% (+/-0.05%)
+8.50% (+/-0.08%)
+7.26% (+/-0.09%)
-0.50% (+/-0.05%)
+7.41% (+/-0.13%)
lookup_ahash_serial
100% (+/-4.51%)
+0.00% (+/-0.05%)
+8.28% (+/-0.05%)
+6.62% (+/-0.07%)
+0.25% (+/-0.14%)
+8.25% (+/-0.13%)
lookup_fail_ahash_highbits
100% (+/-7.58%)
+0.00% (+/-0.08%)
+10.95% (+/-0.18%)
+7.62% (+/-0.03%)
+1.89% (+/-0.05%)
+9.13% (+/-0.06%)
lookup_fail_ahash_random
100% (+/-7.33%)
+0.00% (+/-0.07%)
+13.83% (+/-0.16%)
+9.87% (+/-0.08%)
-0.34% (+/-0.05%)
+12.93% (+/-0.12%)
lookup_fail_ahash_serial
100% (+/-6.37%)
+0.00% (+/-0.06%)
+7.33% (+/-0.05%)
+11.93% (+/-0.20%)
+1.36% (+/-0.06%)
+10.31% (+/-0.05%)
lookup_fail_std_highbits
100% (+/-7.78%)
+0.00% (+/-0.08%)
+3.68% (+/-0.06%)
+5.35% (+/-0.03%)
+0.60% (+/-0.05%)
+4.09% (+/-0.05%)
lookup_fail_std_random
100% (+/-5.59%)
+0.00% (+/-0.06%)
+5.37% (+/-0.11%)
+6.13% (+/-0.04%)
+1.06% (+/-0.00%)
+5.11% (+/-0.08%)
lookup_fail_std_serial
100% (+/-4.02%)
+0.00% (+/-0.04%)
+1.58% (+/-0.06%)
+4.38% (+/-0.11%)
+0.55% (+/-0.00%)
+3.10% (+/-0.05%)
lookup_std_highbits
100% (+/-3.36%)
+0.00% (+/-0.03%)
+5.24% (+/-0.00%)
+7.26% (+/-0.00%)
+1.65% (+/-0.00%)
+4.80% (+/-0.09%)
lookup_std_random
100% (+/-2.47%)
+0.00% (+/-0.02%)
+3.76% (+/-0.03%)
+3.32% (+/-0.06%)
+3.57% (+/-0.11%)
+3.22% (+/-0.06%)
lookup_std_serial
100% (+/-9.09%)
+0.00% (+/-0.09%)
+8.38% (+/-0.04%)
+7.50% (+/-0.08%)
+7.86% (+/-0.09%)
+8.46% (+/-0.09%)
rehash_in_place
100% (+/-0.01%)
+0.00% (+/-0.00%)
+2.49% (+/-0.00%)
-1.66% (+/-0.00%)
+1.48% (+/-0.00%)
+5.18% (+/-0.00%)
insert
100% (+/-0.01%)
+0.00% (+/-0.00%)
+0.25% (+/-0.11%)
-1.51% (+/-0.07%)
+4.53% (+/-0.13%)
+2.96% (+/-0.00%)
insert_unique_unchecked
100% (+/-6.95%)
+0.00% (+/-0.07%)
-5.59% (+/-0.08%)
-10.45% (+/-0.06%)
-0.36% (+/-0.16%)
-4.54% (+/-0.05%)
Remarks:
The none variant is completely neutral, which means that enforcing group alignment did not affect performance.
The other variants show some promise, but the results vary quite a bit depending on micro-optimization. Aggressive (always) inlining of key methods seemed to help, for example, but I am not so sure whether may_have_overflowed should be inlined since it's expected to be rare.
Whether the benchmark "suffer" from high probe counts is unknown to me. Overflow tracking is only helpful to cut probing sequences short, and thus pure overhead if there's no quadratic probing.
In any case, at least with the scaffolding in place it should be possible to experiment further if there's any will to.
Changes:
Motivation:
Overflow tracking allows cutting a probing sequence short, which may be beneficial.
The use of a multitude of variants makes it easier to test and benchmark all variants, thus making it easier to pick the right one... or not pick any.
The groups are now forcibly aligned because overflow tracking is performed on a group basis, and does not work with "floating" groups.
Design:
Overflow trackers and displacements are tacked at the end of the allocation, and their access is minimized, so that their performance impact is minimized.
In particular:
This follows the philosophy of "You Don't Pay For What You Don't Use", and makes the impact as minimal as can be.
Benchmarks:
Methodology: each variant was benchmarked 3 times, and for each benchmark the best result was picked. Then all results were normalized on the current master for ease of comparison.
Remarks:
none
variant is completely neutral, which means that enforcing group alignment did not affect performance.may_have_overflowed
should be inlined since it's expected to be rare.In any case, at least with the scaffolding in place it should be possible to experiment further if there's any will to.