Closed ketanv3 closed 9 months ago
1 org.opensearch.index.translog.RemoteFSTranslogTests.testMetadataFileDeletion
Merging #8788 (58d3394) into main (a4024e7) will increase coverage by
0.14%
. The diff coverage is90.14%
.
@@ Coverage Diff @@
## main #8788 +/- ##
============================================
+ Coverage 71.01% 71.15% +0.14%
- Complexity 57420 57497 +77
============================================
Files 4778 4779 +1
Lines 270922 270995 +73
Branches 39585 39589 +4
============================================
+ Hits 192403 192840 +437
+ Misses 62338 61938 -400
- Partials 16181 16217 +36
Files Changed | Coverage Δ | |
---|---|---|
...g/opensearch/common/util/ReorganizingLongHash.java | 93.65% <75.00%> (-1.67%) |
:arrow_down: |
.../java/org/opensearch/common/util/BytesRefHash.java | 89.18% <89.70%> (+1.31%) |
:arrow_up: |
...rc/main/java/org/opensearch/common/hash/T1ha1.java | 90.76% <90.76%> (ø) |
|
...n/src/main/java/org/opensearch/common/Numbers.java | 89.69% <100.00%> (+1.14%) |
:arrow_up: |
...ggregations/bucket/terms/BytesKeyedBucketOrds.java | 87.50% <100.00%> (ø) |
|
.../aggregations/bucket/terms/SignificanceLookup.java | 96.49% <100.00%> (ø) |
|
...ations/bucket/terms/StringRareTermsAggregator.java | 92.42% <100.00%> (ø) |
1 org.opensearch.remotestore.RemoteStoreIT.testStaleCommitDeletionWithInvokeFlush
@backslasht @dblock Need your inputs!
BytesRefHash
instead of keeping two implementations (like before). Regardless of our choice between (A) and (B), both are better across the board over the existing implementation.What do you think?
@ketanv3 - Would the benefit of CPU cache lines applicable when large numbers of keys are used in ReorganizingBytesRefHash
and also assuming the keys arrive out of order. Considering that is the worst case would it still out perform CompactBytesRefHash
?
@ketanv3 - Would the benefit of CPU cache lines applicable when large numbers of keys are used in
ReorganizingLongHash
and also assuming the keys arrive out of order. Considering that is the worst case would it still out performCompactBytesRefHash
?
Recency criteria (i.e. correlated adds) isn't considered for BytesRefHash; it made more sense for LongHash where similar timestamps across consecutive hits made that optimization possible. Only the PSL criteria is considered which makes CPU caches more effective. Performance should be similar no matter whether keys arrive in or out of order.
Theoretically, ReorganizingBytesRefHash
should perform better than CompactBytesRefHash
for lookups at the slight expense of inserts. But another factor to consider is the difference in byte packing:
[32 fingerprint, 32 ordinal]
bits.[1 discard, 15 PSL, 16 fingerprint, 32 ordinal]
bits.Since the number of fingerprint bits is more in (1) (over 65K times more), it's not a 1-1 comparison between the two. With the current byte-packing scheme, performance is balanced between the two. But we can experiment with [1 discard, 7 PSL, 24 fingerprint, 32 ordinal]
bits to see if this helps us speed up lookups even more.
@backslasht Here's some follow up:
I wrote an alternative benchmark that highlights the overhead of "new inserts" and the improvement it brings to "subsequent lookups". I ran it for large table sizes (10K to 100K) interleaved across 50 hash tables.
On average, pure inserts were 8.68% slower and pure lookups were 3.47% faster when re-organizing. Inserts were about 30% more expensive than lookups, so a key needs to be repeated at least 4 times for the lookup improvements to compensate for the insertion overhead.
Note: Take these numbers with a grain of salt; I faced a lot of variance while benchmarking these large tables.
In hindsight, this overhead wasn't noticeable with ReorganizingLongHash
as the performance was being compared to the baseline implementation. Instead, here our comparison is against CompactBytesRefHash
which itself has several enhancements on top of the baseline implementation (better locality, reduced branching, fingerprinting, etc.). Would be interesting to see how an equivalent CompactLongHash
stacks up (and possibly replace LongHash
).
Thanks @ketanv3 for the experiments.
I think we can go with CompactBytesRefHash
for the below reasons.
Given the number of operations for each key is going to be 2 (1 insert and 1 lookup), the benefit CompactBytesRefHash
provides during insertion is marginally better than the benefit ReorganizingBytesRefHash
provides during the lookup. Also, implementation wise, CompactBytesRefHash
is relatively simpler when compared to ReorganizingBytesRefHash
.
Given the number of operations for each key is going to be 2 (1 insert and 1 lookup) ...
@backslasht
This isn't fully true; let me explain why. Several hits may produce the same key, so during aggregation, the BytesRefHash::add(key)
method will simply return the ordinal of the key when it was first added (i.e. a lookup). In other words, the number of buckets should be much less than the number of hits (i.e. lookups >> inserts) for re-organization to be effective.
To summarise:
~This presents me with an idea: what if we re-organize only when the table grows? It will give us a permutation that sits between simple linear probing and Robin Hood hashing.~ On a second thought, this wont help us either. Growing the table cuts down the load factor in half, thus, breaking down the clusters and reducing the PSL. A more complicated approach would be to measure "how many times was a key inserted too far away from its home slot", and then use that heuristic to trigger re-organization. But all this to gain only a little. Let's stick with CompactBytesRefHash
! :)
Thanks for the detailed explanation @ketanv3
I agree, it boils down to size of the table and the number of repeated keys where ReorganizingBytesRefHash
performs marginally better. Looking at the additional complexity and very marginal benefits ReorganizingBytesRefHash
brings, it makes sense to stick to CompactBytesRefHash
and revisit ReorganizingBytesRefHash
in future when compelling use case arise.
@dblock Would like to know your thoughts too.
@dblock Would like to know your thoughts too.
I am a bit out of my depth here - would love to see you and @backslasht agree on what should be a mergable state of this PR, first. If you need someone to break a tie I can give it a try for sure!
@backslasht @dblock Ready for review!
@reta @backslasht
Here's a detailed performance analysis of portable, fast, non-cryptographic hash functions from well-known libraries: Zero-Allocation-Hashing, xxhash-java, and hash4j.
Winner: xxh3; assuming the internal usage of Unsafe is removed.
Winner: FarmHash and wyhash
Winner: Hash4J
This is a relatively new hash function featuring both high large-block hashing performance and a high hashing throughput for small inputs, mainly designed for hash-tables. It is close to the class of hash functions like wyhash, so identical performance is not surprising.
The raw performance results can be misleading, especially on small inputs. It may seem that the baseline is 2x faster than FarmHash for 1 byte keys, but it would only result in an overhead of 4 ms per 1 million keys hashed. Here's how the latency improvements scale with input size.
Latency improvement to hash 1 million keys wrt input size:
I found FarmHash, wyhash, and komihash to have suitable performance for use in a hash-table. They are simple in construction, instruction-cache friendly, portable, and fast on small inputs, and second best on large inputs. They are well-known hash functions that produce an avalanche effect, so I haven't tested their quality (i.e. collisions in the hash table).
Here's a detailed performance analysis of portable, fast, non-cryptographic hash functions from well-known libraries: Zero-Allocation-Hashing, xxhash-java, and hash4j.
Thanks @ketanv3 , very comprehensive and educating analysis, I am wondering we you still have the measure data to check the GC allocation activity (if you don't, not a big deal). Regarding hash4j/komihash, OpenNFT/FarmHash, OpenNFT/wyhash, I think we would need to collect a bit more data for these three regarding GC allocations (OpenNFT is notoriously good at not generating too much garbage). What do you think?
Hi @reta, here a comparison of "gc.alloc.rate.norm" at different input lengths. The baseline/lz4 hash functions have significantly higher allocations, but between OpenHFT and hash4j it's a tie. Despite, the absolute values for all of them are quite low for GC to be a concern.
Key length (bytes) | 1 | 10 | 100 | 1000 | 10,000 | 100,000 | 1,000,000 |
---|---|---|---|---|---|---|---|
BASELINE_MURMUR3 | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.001 B/op | 0.008 B/op | 6.987 B/op |
LZ4_XXHASH | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.001 B/op | 0.011 B/op | 9.350 B/op |
OPENHFT_UNSAFE_XXHASH | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁶ B/op | ≈ 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.001 B/op | 0.016 B/op |
OPENHFT_VH_XXHASH | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁵ B/op | ≈ 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.002 B/op | 0.021 B/op |
OPENHFT_UNSAFE_XXH3 | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁶ B/op | ≈ 10⁻⁴ B/op | ≈ 10⁻⁴ B/op | 0.004 B/op | 0.030 B/op |
OPENHFT_VH_XXH3 | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.001 B/op | 0.006 B/op | 0.052 B/op |
OPENHFT_UNSAFE_FARMHASH | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁶ B/op | ≈ 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.002 B/op | 0.021 B/op |
OPENHFT_VH_FARMHASH | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁵ B/op | ≈ 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.003 B/op | 0.025 B/op |
OPENHFT_UNSAFE_WYHASH | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁶ B/op | ≈ 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.003 B/op | 0.025 B/op |
OPENHFT_VH_WYHASH | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | ≈ 10⁻³ B/op | 0.004 B/op | 0.036 B/op |
HASH4J_WYHASH | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁵ B/op | ≈ 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.002 B/op | 0.019 B/op |
OPENHFT_UNSAFE_METROHASH | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁶ B/op | ≈ 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.002 B/op | 0.016 B/op |
OPENHFT_VH_METROHASH | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁵ B/op | ≈ 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.002 B/op | 0.021 B/op |
HASH4J_KOMIHASH | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁵ B/op | ≈ 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.003 B/op | 0.028 B/op |
Thanks @ketanv3 for the detailed benchmarking exercise, this is very insightful.
Overall, I see FarmHash, Wyhash, and Komihash having similar performance and heap allocation.
I'm leaning towards Komihash for reasons below:
I'm leaning towards Komihash for reasons below:
Side with @backslasht here, thanks @ketanv3
@backslasht @reta I'm more inclined towards FarmHash for the following reasons:
There's another fast hash function out there: https://github.com/erthink/t1ha. Unfortunately, there's no Java port available. Seems like it too depends on fast unsigned multiplication of two longs (https://github.com/OpenHFT/Zero-Allocation-Hashing/issues/23).
FYI, I'm running benchmarks with the short-listed hash functions when used in a hash table. It should holistically answer for both quality and performance. Early results favour FarmHash, but I'll let it complete fully. :)
Hi @reta @backslasht, here's the performance within the hash table.
I ran the experiments with a subset of arbitrarily chosen key lengths with growing table sizes. As expected, since all hash functions have an avalanche effect, the impact of collisions was negligible and the performance was mainly driven by the performance of the hash function.
For keys less than 30 bytes, FarmHash has the best performance.
For keys between 30 and 100 bytes, FarmHash performs better on small tables but wyhash performs better on large tables.
For keys larger than 100 bytes, wyhash has the best performance followed by komihash. The difference was significant compared to FarmHash.
I'm impressed by the performance of wyhash/komihash on large inputs, and I expect their performance to get better on small inputs once we enforce the minimum version to JDK 18+ where unsignedMultiplyHigh can be replaced with Math.unsignedMultiplyHigh which has x86_64 and aarch64 intrinsics.
Thanks @ketanv3 for the experiment.
Based on the results, it is clear that farmhash is out of the options. Between komihash and wyhash, I'm still leaning towards komihash but open for wyhash as well. I'll let you decide between them.
@reta @backslasht
Out of curiosity, I decided to write my own Java port of t1ha hash function. The results are shockingly good!
It's faster all around—faster than FarmHash on small inputs and wyhash/komihash/xxh3/MetroHash on long inputs. Unlike FarmHash where performance drops in steps, here it's more gradual as the input length increases.
Here's a summary of "gc.alloc.rate.norm" at different input lengths. One of the best again.
Key length (bytes) | 1 | 10 | 100 | 1000 | 10,000 | 100,000 | 1,000,000 |
---|---|---|---|---|---|---|---|
T1HA | ≈ 10⁻⁶ B/op | ≈ 10⁻⁶ B/op | 10⁻⁶ B/op | ≈ 10⁻⁵ B/op | ≈ 10⁻⁴ B/op | 0.002 B/op | 0.015 B/op |
Performance at arbitrarily chosen key lengths with growing table sizes. t1ha performed best in all tests.
The above results seem too good to be true; so are there any trade-offs? Yes. To overcome language and performance limitations, my implementation differs slightly from the reference implementation in C++, so the returned values will vary.
But does it matter? Probably not. For our use-case, all we need a fast hash function that achieves avalanche. Here's how my implementation compares to other well known hash functions:
For context, smhasher considers a hash function to pass the avalanche test if the bias is less than 1% (here). It can be seen how poorly the inbuilt Arrays.hashCode is defined. | Sum of squared errors | Bias | Diffusion | |
---|---|---|---|---|
XXH3 | 0.64 | 0.0065% | 99.9935% | |
FarmHash | 0.87 | 0.0052% | 99.9948% | |
wyhash | 0.64 | 0.0015% | 99.9985% | |
komihash | 0.65 | 0.0028% | 99.9972% | |
t1ha | 0.64 | 0.0056% | 99.9944% | |
Arrays.hashCode | 2852.51 | 16.2308% | 83.7692% |
Here's a comparison between Arrays.hashCode and t1ha's avalanche diagram. It shows the probability of an output bit flipping (y-axis) when a single input bit (x-axis) is flipped. Ideally, we want to be close to 0.5 to achieve avalanche.
Repeated the experiment with 24, 32, 40, 48, 56, 64, 72, 80, 96, 112, 128, 160, 512, 1024-bit keys (as defined in smhasher) and was able to achieve avalanche every time. Let's roll this out?
@ketanv3 - The results looks great! Any considerations on the correctness and maintainability of the java port?
Any considerations on the correctness and maintainability of the java port?
I don't expect the returned values to match the reference implementation in C++ due to slight differences in my implementation to overcome the language limitations. We'll be explicit about these caveats in the implementation so it's not mistaken for the original implementation.
We'll have tests to verify the quality of the returned values, using sum of squared errors and bias metrics. It's worth pointing out that, in a hash table, improvements to a hash function's quality has diminishing returns. A 64-bit hash value will be modded with the table size (which is usually pretty small), and birthday paradox ensures that collisions happen no matter how good the hash function is, as long as it's not completely flawed.
Update: I have verified the correctness using the tests and expected results defined in the reference implementation.
I don't expect the returned values to match the reference implementation in C++ due to slight differences in my implementation to overcome the language limitations. We'll be explicit about these caveats in the implementation so it's not mistaken for the original implementation.
I feel uneasy about this statement: definitely not an expert on hash functions but the implementors spend a lot of time on developing those, so I would feel much more comfortable when ours and reference implementation matches (even when charts look good)
I'm impressed by the performance of wyhash/komihash on large inputs, and I expect their performance to get better on small inputs once we enforce the minimum version to JDK 18+ where unsignedMultiplyHigh can be replaced with Math.unsignedMultiplyHigh which has https://github.com/openjdk/jdk/pull/5933 and https://github.com/openjdk/jdk/pull/8840 intrinsics.
Does your t1ha
implementation rely on that as well? What JDK version we use to run these tests?
@reta I understand your concern; I too would have preferred to match the reference implementation if our use-case was fingerprinting where we would have used all 64 bits of the hash value. But in a hash table, where the hash value is modded with the table size, we lose a lot of entropy anyways and collisions are unavoidable (for dynamic workloads). Creating a perfect hash function is impractical and resolving collisions in an open-addressed table is often cheaper.
Here's the explained by experts from this field:
The only requirement is that the hash function isn't fundamentally flawed; which I have verified using the avalanche tests. I have also included them as part of unit-tests.
Yes, t1ha also requires unsignedMultiplyHigh and that's where my implementation differs. I'll push my code, that'll make things more clear. The results I've shared on this PR are using OpenJDK 17.
Here's the benchmark when matching the output of the reference implementation. It's slightly slower but still faster than all other hash functions, albeit slower than FarmHash on some small inputs. Collision-wise, it didn't make much difference so it was slower when used in a hash table.
Yes, t1ha also requires unsignedMultiplyHigh and that's where my implementation differs. I'll push my code, that'll make things more clear. The results I've shared on this PR are using OpenJDK 17.
:+1. thanks @ketanv3
Compatibility status:
> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git]
Compatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git]
BUILD SUCCESSFUL in 31m 1s
Compatibility status:
> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/reporting.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]
BUILD SUCCESSFUL in 32m 52s
1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDropPrimaryDuringReplication
Checks if related components are compatible with change 4871e25
Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/security-analytics.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]
Checks if related components are compatible with change 4871e25
Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/security-analytics.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]
Gradle Check (Jenkins) Run Completed with:
- RESULT: FAILURE ❌
- URL: https://build.ci.opensearch.org/job/gradle-check/22871/
- CommitID: bc4fdd2 Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
Unrelated flaky test: https://github.com/opensearch-project/OpenSearch/issues/9115
Checks if related components are compatible with change 4871e25
Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]
Checks if related components are compatible with change 8d153ab
Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]
Previous check failed as #9389 was just merged to the main
branch with updated version checks. Such commits break BWC for any other PRs not having the latest changes in the main branch. Rebasing and retrying.
1 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testIndexDeletionDuringSnapshotCreationInQueue
Checks if related components are compatible with change 8d153ab
Incompatible components: [https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/asynchronous-search.git]
Compatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git]
Checks if related components are compatible with change 8255050
Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]
Looks good to me. Thanks @ketanv3 for all the experiments and the PR.
Second that, I think I understood the idea behind BytesRefHash
(it is too clever :D) but it looks pretty good, the numbers in particular are convincing, thanks @ketanv3
Description
Performance improvements for BytesRefHash which includes:
Related Issues
Meta issue: #8710
JMH Benchmarks
These results indicate the time to perform 1 million
add(...)
operations with varying number of unique keys. These operations were repeated and interleaved between 20 hash tables in order to make caches less effective. The following implementations are being compared:Peak improvement
Whether or not to minimize the longest probe sequence length
The following implementations are being compared:
(A) is simpler and marginally faster for insert-heavy workloads. It also packs 32-bits of fingerprint information which is 65,536 times of (B), but false positives are already rare with (B). On the other hand, (B) provides marginally faster worst-case lookups due to better use of CPU cache lines.
Benchmarks show no appreciable performance difference between (A) and (B) since the latency is dominated by the time to copy the key.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.