Open nknize opened 2 years ago
I'll note that I'm very new to these features, but the issue is tagged with the "discuss" label so I'd figure I'd share my thoughts :)
Given that the stated goal of Shard Idle is to "automatically optimize bulk indexing in the default case when no searches are performed" then it seems like the root of the problem here is that the node made the decision to transition its shards to "search idle" based purely off of local information when in fact those shards were still taking search requests (they were just going to other nodes). Is that right? Is there a way for a node to know or have a heuristic to suggest that it is consistently losing requests due to its Adaptive Replica Selection score and therefore defer any Shard Idle transitions?
That being said, it seems like you will inherently trade off some amount of performance consistency by enabling Shard Idle, so the suggestion to recommend disabling it when consistency is important doesn't seem unreasonable.
Is there a way for a node to know or have a heuristic to suggest that it is consistently losing requests due to its Adaptive Replica Selection score and therefore defer any Shard Idle transitions?
Hmm.. this would need to be added to the coordinating node logic.
It would be interesting to add something like lastSearchRequestNodeTime
to the routing table and when computing ARS send a "ping" to all nodes pending idle transition nodes if lastSearchRequestTime - lastSearchRequestNodeTime
is between some time window (e.g., < 30s > 15s). Of course this adds overhead when recommending an explicit refresh_interval
achieves something similar. ¯\(ツ)/¯
I'm looking into this.
I think that Shard Idle feature by itself is causing inconsistent search performance even without the adaptive replica selection (when cluster has low search traffic or when _routing
is used and that happened to me in the past and I had hard time figuring out that explicitly setting refresh_interval
solve high percentile latency ).
With combination of Adaptive Replica Selection, it is causing same effect even when cluster has consistent traffic so it is a another side effect of Shard Idle feature which was created to enhance indexing throughput and trade it off with search latency.
Since Shard Idle is a trade off feature where some users need it and others don't, I think that explicitly documenting this trade off will work for all and let users decide based on their requirements. I think optimizing Adaptive Replica Selection to take into consideration latest search time per node per shard globally (across all coordinator nodes) or locally (each coordinator node has it's view on latest search time which will still keep room for latency randomness depending on which coordinator node serving the request? ) will help reducing the effect of Shard Idle when cluster is not really "idle" but won't help when the cluster has low traffic so I'm not sure if we should invest in this optimization. What do you think ?
Additionally, Shard Idle feature will affect the freshness of data and the randomness of latency in the ongoing effort of Segment Replication, which in a nutshell uses primary shards refreshes to trigger the segment replication to replicas. When segment replication is enabled then only the refresh of the primary shard is what matter and if shard idle is enabled and primary shard goes into idle, we might have these cases:
Enabling Adaptive Replica Selection while Shard Idle feature is on and segment replication is on might cause more interesting effect because Adaptive Replica Selection takes into consideration queueSize, responseDuration and serviceTimeEWMA when computing ranks and these factors might be affected for node which holds the primary shard because it has more overhead ? that need to be confirmed. If this is the case then adaptive replica selection will route search requests away from primary shard which will impact data freshness because refresh might not be triggered until OpenSearch's flush threshold is crossed. This might be still beneficial for indexing throughput and some users might need it.
So here is my suggestion:
refresh_interval
config is confusing because it serves double purposes (1st is the value itself and 2nd the existence of the value as a flag). I think it will be more clear if shard idle feature is enabled only when index.search.idle.after
is explicitly set otherwise shard idle will be turned offWhat do you think @nknize, @andrross and @mch2 ?
Adaptive Replica Selection (ARS) is a default feature where search requests are routed to an optimal node based on a computed "Node Score" (function of queue size, response time, and service time). The intent of the behavior is to avoid routing search requests to busy nodes with the idea of improving search response time.
Shard Idle is a default feature where shards that have not received a search request after
30s
transition to an "idle" state and all future refreshes are skipped unless there are any registered refresh listeners.Ironically, putting the two "optimization" features together (which are both enabled by default) introduces an interesting periodic degradation in search performance.
If a node is experiencing heavy resource usage such that the ARS computed "score" prevents search requests from being routed longer than 30s then indexes will transition to "search idle" and future refreshes will be skipped. Once the node's resources are finally freed and a search request is received the search request may be queued until the refresh completes again. If an index has received a significant amount of updates (inserts, deletes, etc) this search request could be put on hold for quite some time. If the search request
time_out
is set below the actual refresh completion time, then the shard may time out. Either way, this situation introduces a non-negligible performance hit.This issue is open to determine if there is an improvement that can be made in these scenarios or if the best course of action is to update the documentation to simply recommend users explicitly set the
index.refresh_interval
to prevent shards from ever entering the idle state if "real time" search is a QoS requirement.While this has been seen before the user never fully understood the reasoning for seemingly "random" search degradation. What's more concerning are the number of users that may be unaware of this degradation.