Open khushbr opened 3 weeks ago
50th percentile latency (ms) | 90th percentile latency (ms) | 99th percentile latency (ms) | 99.99th percentile latency (ms) | 100th percentile latency (ms) | |
---|---|---|---|---|---|
Baseline | 0.00145 | 0.00168 | 0.00234 | 0.04285 | 36.12002 |
Brute Force | 0.067 | 0.14999 | 0.19022 | 0.3226 | 28.77094 |
Pre-Generated DocID Store (Background thread) | 0.00019 | 0.00027 | 0.0006 | 0.01702 | 6.5273 |
Pre-Generated DocID Store (CompletableFuture) | 0.07115 | 0.18344 | 0.24779 | 1.11933 | 105.91981 |
Pre-Generated DocID Store (Sync) | 0.00539 | 0.01137 | 0.09186 | 0.22667 | 20.83127 |
Min Throughput (docs/s) | Mean Throughput (docs/s) | Median Throughput (docs/s) | Max Throughput (docs/s) | error rate (%) | Store size (GB) | 50th percentile latency (ms) | 90th percentile latency (ms) | 99th percentile latency (ms) | 99.99th percentile latency (ms) | 100th percentile latency (ms) | Cumulative indexing time of primary shards (min) | Max cumulative indexing time across primary shards(min) | Total Young Gen GC time (s) | Total Young Gen GC count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Default | 53276.7 | 63182.9 | 62268.1 | 76517.7 | 0.11 | 39.7833 | 315.284 | 1457.56 | 5548.24 | 10108.3 | 10802.4 | 105.244 | 21.3867 | 3.251 | 124 |
With Cache - Synchronized Block | 42141.2 | 48362.4 | 47869 | 53992.4 | 0.03 | 35.5269 | 621.462 | 1018.8 | 6042.76 | 8983.25 | 10197 | 73.0625 | 15.1691 | 2.318 | 100 |
With Non-blocking Queue | 43178.7 | 47667.3 | 47371.7 | 50017.2 | 0.02 | 36.3795 | 617.615 | 1027.93 | 4956.33 | 9716.62 | 10361.4 | 72.6281 | 16.1898 | 4.471 | 171 |
With CompletableFuture Refill | 26981.1 | 29596.6 | 29531.3 | 30921.2 | 0.32 | 35.612 | 1132.23 | 1637.44 | 5979.19 | 10556.1 | 10770.7 | 72.0051 | 14.7818 | 43.547 | 1430 |
Sync flow Refill | 40375.2 | 43389.9 | 43513 | 45694.8 | 0.11 | 35.7327 | 677.158 | 1118.33 | 6275.53 | 10183 | 10869.5 | 73.4668 | 16.064 | 3.356 | 134 |
Comments:
@khushbr please link the POC branch code for reference to co-relate with nos.
@shwetathareja Adding the links for Pre-Generated DocID Store Cache:
Is your feature request related to a problem? Please describe
OpenSearch Bulk API executes multiple indexing/update/delete operations in a single call. For each of these operation, the name of index/stream or the alias is required and user can additionally also provide a custom doc ID. In case the doc ID is not provided, OpenSearch auto-generates a docID, a 128 bit UUID, which for practical purposes is unique. In absence of custom routing, the doc ID value is used to determine the shard routing info for the document through a function of Mod murmur3 hash on doc ID. Post this,TransportBulkAction on co-ordinator node generates per-Shard TransportShardBulkAction and sends them to the corresponding primaries. The co-ordinator node waits for response from all shards before sending the response back to the client.
In case of a slow shard/node scenario (say, shard in INITIALIZING state or node undergoing Garbage Collection) - the shard ends up becoming a bottleneck in bulk flow, increasing the tail latencies and holding up the resources, queue on the co-ordinator, potentially causing rejections.
The goal of the project is to tweak the document routing logic for auto-generated Doc IDs to better handle slow shard/node, provide better latencies by saving on network roundtrip and reduce the chatter b/w the co-ordinator and nodes.
The above optimizations should work within following constraints:
Describe the solution you'd like
In this section, we discuss the approache to solve the [Part-1].
Pre-generated DocIDs Cache/Store: Maintain a store with doc IDs tagged per-shard. In the TransportShardBulkAction.doRun() execution, instead of computing the docIDs on the fly, the pre-computed values are assigned. A background thread periodically refills the cache store (we can also explore using async futures to execute the cache refill), the per key (ShardID) refill count is a function of shard throughput. In case the store doesn’t have IDs for a shard, the algorithm falls back to the brute force approach. In the bulk request, for ‘m’ DocWriteRequests and ‘n’ Routing Shards, generate (n * m) docIDs and reject the docIDs not mapping to our randomly selected Target Shard. Minimal locking is used to get/refill/evict the store in a thread-safe manner using ConcurrentLinkedQueue and semaphores.
The above approaches need to be Benchmark for one shard is slow vs many shards are slow. Measure for indexing speed, cpu and memory usage (coordinator node), storage efficiency and lookup speed. Another dimension, measure for cluster throughput and rejects.
Related component
Indexing:Performance
Describe alternatives you've considered
Biased Hash Function: Currently in OpenSearch, the routing Shard is closely coupled with docID. The docID generated is a UUID with requirement of no collision for practical purposed. This docID is ran through a Murmur3 Hash (non-cryptographic) and then a mod function to generate the shard ID integer value to uniformly distribute the documents across the shards. We want to explore if it is possible to maintain these 2 primitives.
One of the simplest approach is to encode the shardId (randomly selected) information in the docID, along with the UUID and forgo the Murmur3 Hash Mod for routing shard calculation
<encode_version>:<base36_shard_id>:<document_id>
.The update by doc _id or get by ids query can be handled at the client communication layer by returning the doc ID as the concatenated string value. At co-ordinator node on transport, a decoder extracts the version, shard ID and routes the doc ID to the specific shard ID. Since there is no hash calculation, the document routing is fast, with no footprint on the CPU cycles and JVM. However, there are drawbacks to this approach:
The below table provides the performance trade-off for the various approaches mentioned:
Additional context
No response