Increase `index.search.idle.after` setting default from 30s to 10 minutes (or more)

msfroh commented 10 months ago

Is your feature request related to a problem? Please describe. Elasticsearch 7.0 introduced a "search idle" feature (https://github.com/elastic/elasticsearch/pull/27500) to avoid refreshing an index that isn't receiving any search traffic. This helps remove the unnecessary effort of refreshing during large bulk load operations. For example, for a "rebuild the index overnight and serve traffic during the day" use-case, it's apparently a big help.

Unfortunately, I've seen at least a few cases where users end up with shards going idle and then block on refresh on their next query:

I saw a cross-cluster replication case where the follower would receive a burst of updates during an idle period, then the next query would take a long pause.
If explicit routing is used, some copies of some shards are able to go idle even though the index is receiving continuous traffic, just because their specific shards haven't received traffic recently.

Describe the solution you'd like In my opinion, the default 30 second shard idle timeout is far too aggressive.

For the "big overnight re-index job" use-case or "index logs constantly and only search them when something breaks" use-case, not seeing any query traffic for 10 minutes should still be a fine threshold -- sure you're doing unnecessary refreshes for an extra 9.5 minutes, but that's not likely to be too cost prohibitive.

Describe alternatives you've considered We could change the default behavior for a search on idle shards to a background refresh, rather than blocking the first search(es). Searches could run quickly using the last (pre-idle) IndexReader. Unfortunately, that would be a major change for users who might be surprised by (potentially very) stale results.

We also have a workaround where users who search and update their index all the time (but sometimes have sparser search traffic) can disable the search idle feature altogether by explicitly setting index.refresh_interval. In my opinion, it's still a good idea to do that, but I'd like the default behavior to be less aggressive.

Additional context N/A

anasalkouz commented 10 months ago

Hi @msfroh, + to increase the default settings, but why 10 mins? do you have data to proof that 10 mins is the right/best timeout value?

msfroh commented 10 months ago

do you have data to proof that 10 mins is the right/best timeout value?

Absolutely not! It's another totally arbitrary limit that should be debated, ideally based on data.

In some ways, I would almost prefer a situation where there is no default in order to force users to think about what's right for their use-case, but I realize that's probably not helpful. (In particular, the current 30 second idle time mostly seems to be hurting the users who just want to use the out-of-the-box defaults.)

I think of the problem as follows:

Updates to a Lucene directory are visible when you open a new IndexReader. So tell people to open a new IndexReader (i.e. POST /<index>/_refresh) when they want to pick up the latest updates. Easy!
People don't like the added effort of explicit refreshes, so let's refresh automatically for them -- every second by default. (Again, an arbitrary default interval that users should probably think about, but it's probably safe for most users, most of the time.)
Unnecessary automatic refreshes slow down indexing-heavy workloads. Let's (by default) disable automatic refreshes on shards that don't receive search traffic for... 30 seconds?
Let's introduce a new invasive species to prey on the previous invasive species. (Ref: https://www.youtube.com/watch?v=LuiK7jcC1fY)

msfroh commented 10 months ago

I had an absolutely terrible idea for an alternative solution:

We can keep track of how frequently a given index receives search requests while idle. If the index receives more than N requests to idle shards within time period T (where N and T are configurable but have arbitrary default values), then we disable the shard idle behavior for the index.

kkhatua commented 8 months ago

I like the "terrible" idea. Another bunch of ideas to throw in the open if we don't want to increase idle time...

Track the number of idle shards at a given time, like aggregate it through an API at cluster or index level for visibility
Track whenever a request hits an idle shard that's being "woken up" to refresh and increment a counter to indicate the problem.

This probably warrants its own issue, but I'm curious to hear comments on all these.

reta commented 8 months ago

@msfroh I think the adaptive idle policy is what you (and @kkhatua) are heading towards, I would see it to be a better option instead of setting arbitrary hardcoded values.

msfroh commented 8 months ago

I was just talking with @ruai0511 and we came up with another possible option.

What if shard idle didn't stop refreshes altogether, but rather just made them more sparse.

Right now, during the first 30 seconds that a shard (with all default settings) exists, it refreshes every second. If it doesn't see any searches within those 30 seconds, refreshes stop altogether. When a search request comes in, the shard does a blocking refresh, then goes back to refreshing every second for the next 30 seconds (assuming no more search requests come).

What if, instead, being idle for 30 seconds doubled the implicit refresh interval. So, after the first 30 seconds of 1 second refreshes, we back off to 2 second refreshes, then 4 seconds, then 8, ... up a max of (say) 64 seconds. If a search request comes in, we drop the the implicit refresh interval back to 1 second. (Note that this is only the implicit refresh interval -- if the user has explicitly set a refresh interval, we continue to honor that and there is not shard idle behavior, just like now.)

The one major risk that I could see is in the scenario where e.g. someone is sending logs in a "write-only" workload, then something bad happens, so they decide to search their logs, but they don't see logs from the past 64 seconds on that first search. (Of course, that first search will drop the refresh interval back to 1 second, so a followup search would probably find the relevant results.) If they don't do a followup search, it could be a nasty surprise. Of course, there are workarounds -- explicit refresh before search, sending another search request, etc.

kkhatua commented 8 months ago

I think we can open this as a separate issue and tackle it there, or tackle it here and abandon the need to bump up default to 10 mins.

But circling back to the tackling options, we should see what the intended goal is in the first place.

If it is just tracking at a cluster level, the counters for hitting an idle shard to wake up & refresh or tracking active idle shards should suffice.

If it is to do an adaptive idle policy, then the question is whether whether you want to do an "ideal idle" policy threshold discovery.

Former tells us what to look for if we see oddly high but rate latencies. Latter ensure the system adaptively learns and we (hopefully) never have to worry about finding that balance between ingesting and refreshing.

msfroh commented 8 months ago

The discussion I had with @ruai0511 was mostly about "What is search idle really trying to solve?"

Essentially, if you have a "write-only" logging use-case (or an overnight rebuild), the default 1s refresh will:

Flush segments every second, producing lots of small segments which need to be merged.
Reopen index readers every second, possibly refreshing global ordinals and stuff, even though nobody is going to use that reader.

So -- frequent refresh when indexing only means small segments and wasted IndexReader opening. What if the alternative isn't "no refreshing", but rather "less refreshing"?

I would be curious to try benchmarking an indexing workload where instead of the existing shard idle behavior, we do the exponentially-decaying refresh rate, and see if there's any noticeable impact on the indexing speed. My hunch is that refreshing every minute (or minute and 4 seconds) would be infrequent enough to have little impact on indexing performance under load.

reta commented 8 months ago

Flush segments every second, producing lots of small segments which need to be merged.

We have integrated Lucene's merge-on-refresh policy a while back, that should help with "write-only" logging use-case (and alike), right?

I would be curious to try benchmarking an indexing workload where instead of the existing shard idle behavior, we do the exponentially-decaying refresh rate, and see if there's any noticeable impact on the indexing speed.

Seems like worth trying

msfroh commented 8 months ago

Flush segments every second, producing lots of small segments which need to be merged.

We have integrated Lucene's merge-on-refresh policy a while back, that should help with "write-only" logging use-case (and alike), right?

Merge-on-refresh would deal with the small segments, but we still pay the merge cost I believe. I'm not 100% certain, but I think it's still better to write larger segments in the first place, up to a point. (Eventually, you're going to bump into the RAM buffer limit and will end up flushing anyway.)

Back when I was working on Amazon Product Search, @mikemccand tried disabling explicit commits during the index build (since we used a "rebuild offline" model) and it had no real impact on the index build time. So, that's where I get the "up to a point" reasoning: while I would believe that flushing every second hurts indexing throughput, I have one anecdata point to suggest that there's no difference between flushing every minute and disabling explicit flushes altogether.

andrross commented 8 months ago

I would be curious to try benchmarking an indexing workload where instead of the existing shard idle behavior, we do the exponentially-decaying refresh rate, and see if there's any noticeable impact on the indexing speed.

With the exponentially decaying refresh rate, I assume we'd still force a refresh if a search request hit the shard if it was in the increased refresh rate state in order to keep the same staleness guarantees, right?

msfroh commented 8 months ago

With the exponentially decaying refresh rate, I assume we'd still force a refresh if a search request hit the shard if it was in the increased refresh rate state in order to keep the same staleness guarantees, right?

That would just (more or less) bring back the existing idle shard behavior, though, which is exactly what this issue is trying to address.

I covered the staleness problem above:

The one major risk that I could see is in the scenario where e.g. someone is sending logs in a "write-only" workload, then something bad happens, so they decide to search their logs, but they don't see logs from the past 64 seconds on that first search. (Of course, that first search will drop the refresh interval back to 1 second, so a followup search would probably find the relevant results.) If they don't do a followup search, it could be a nasty surprise. Of course, there are workarounds -- explicit refresh before search, sending another search request, etc.

msfroh commented 8 months ago

IMO, if someone wants a staleness guarantee, they should either explicitly set the refresh interval (disabling the shard idle behavior) or issue an explicit refresh before they search.

andrross commented 8 months ago

I covered the staleness problem above:

Sorry, missed that! I think the idea of an adaptive idle policy with a bounded max staleness is interesting. It does become challenging to make it the default due to the potential nasty surprise you mentioned. I'm onboard with benchmarking and potentially adding it as an option (and maybe become the default in a future major version).

I also intuitively agree that the default 30 second idle timeout does seem far too aggressive and would be on board with changing that.

msfroh commented 8 months ago

I feel like we still don't have a great solution to the current problem, where real users who have low levels of traffic end up with search requests that spike in latency because of the existing default behavior.

We can increase the default 30 second idle timeout to reduce the number of users who are impacted, but I don't know what the new default should be -- 1 minute? 2 minutes? 5 minutes? 10? I don't have a good suggestion other than picking a different arbitrary value.

andrross commented 8 months ago

I feel like we still don't have a great solution to the current problem, where real users who have low levels of traffic end up with search requests that spike in latency because of the existing default behavior.

Personally, enabling shard idle by default feels like the wrong choice. Disabling it will give you more consistent and predicable behavior. Its only in the case that you have a natural pattern of bulk loads with literally zero search traffic that it really makes a lot of sense. This is a super unsatisfying suggestion, but can we improve documentation and/or highlight this issue in some of the getting starting/set up guides (e.g. https://opensearch.org/docs/latest/install-and-configure/configuration/)? Specifically, I'm suggesting to add some content around choosing the right refresh interval and enable/disabling shard idle as appropriate for the workload.

sgup432 commented 4 months ago

Personally, enabling shard idle by default feels like the wrong choice

I agree with @andrross on this. search.idle setting is used to increase indexing performance which users can set explicitly rather than having it as an implicit setting. It comes as a surprise to many users when search latency spike when explicit refresh during such cases. We can probably highlight it as mentioned above.

I think setting it to any arbitrary number may not solve this and comes down to the same problem. Even with implicit adaptive refresh interval setting, it might again cause surprises and confusion if users are unaware of it.

opensearch-project / OpenSearch

Increase `index.search.idle.after` setting default from 30s to 10 minutes (or more) #9707