opensearch-project / index-management

🗃 Automate periodic data operations, such as deleting indices at a certain age or performing a rollover at a certain size
https://opensearch.org/docs/latest/im-plugin/index/
Apache License 2.0
52 stars 107 forks source link

[BUG] Transform job failing if indices are deleted while executing the search phase #1163

Open rayshrey opened 2 months ago

rayshrey commented 2 months ago

What is the bug?

While the search phase for the transform job is executing, if any of the indices that are part of the source index are deleted then the transform job fails with the following error - Failed to search data in source indices and gets disabled.

How can one reproduce the bug?

Steps to reproduce the behavior:

What is the expected behavior?

Ignore the search failure that had occurred for the deleted index and continue with the remaining results.

What is your host/environment?

Tested in version 1.3

Do you have any screenshots? N/A

Do you have any additional context?

Stack trace of the error

[2024-04-25T10:49:28,130][WARN ][o.o.i.t.TransformSearchService] [62ad7c2d69ee53adcacae4d2dd88ed7d] Operation failed. Retrying in 1s.
Failed to execute phase [query], Partial shards failure (1 shards unavailable)
        at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:667)
        at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:404)
        at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:702)
        at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:587)
        at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResultConsumed(AbstractSearchAsyncAction.java:575)
        at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$onShardResult$9(AbstractSearchAsyncAction.java:558)
        at org.opensearch.action.search.QueryPhaseResultConsumer$PendingMerges.consume(QueryPhaseResultConsumer.java:366)
        at org.opensearch.action.search.QueryPhaseResultConsumer.consumeResult(QueryPhaseResultConsumer.java:130)
        at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:558)
        at org.opensearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:150)
        at org.opensearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:286)
        at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:57)
        at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:42)
        at org.opensearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:79)
        at org.opensearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:49)
        at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:67)
        at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:584)
        at org.opensearch.transport.TransportService$6.handleResponse(TransportService.java:736)
        at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleResponse(SecurityInterceptor.java:306)
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1347)
        at org.opensearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1425)
        at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1405)
        at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:65)
        at org.opensearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:57)
        at org.opensearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:40)
        at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:71)
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:86)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
        at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
        at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:57)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:809)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

[2024-04-25T10:50:10,913][DEBUG][o.o.a.s.TransportSearchAction] [62ad7c2d69ee53adcacae4d2dd88ed7d] [q_TaXY7OQBKUZ1s0YJhIYw][metrics-2024-04-05][4]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[metrics-*], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=false, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"size":0,"query":{"bool":{"must":[{"match_all":{"boost":1.0}}],"should":[{"bool":{"filter":[{"range":{"timestamp":{"from":1714039200000,"to":1714042800000,"include_lower":true,"include_upper":false,"time_zone":"UTC","format":"epoch_millis","boost":1.0}}},{"terms":{"metricname.keyword":["Speed_Out"],"boost":1.0}},{"terms":{"unit.keyword":["southern"],"boost":1.0}},{"terms":{"namespace.keyword":["attorney"],"boost":1.0}},{"terms":{"dim_instance.keyword":["Republican"],"boost":1.0}},{"terms":{"dim_instance_name.keyword":["national"],"boost":1.0}},{"terms":{"dim_account.keyword":["apply"],"boost":1.0}},{"terms":{"dim_account_id":[95750.0625],"boost":1.0}},{"terms":{"dim_name.keyword":["Christine Fox"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},{"bool":{"filter":[{"range":{"timestamp":{"from":1714039200000,"to":1714042800000,"include_lower":true,"include_upper":false,"time_zone":"UTC","format":"epoch_millis","boost":1.0}}},{"terms":{"metricname.keyword":["Speed_Out"],"boost":1.0}},{"terms":{"unit.keyword":["street"],"boost":1.0}},{"terms":{"namespace.keyword":["think"],"boost":1.0}},{"terms":{"dim_instance.keyword":["foreign"],"boost":1.0}},{"terms":{"dim_instance_name.keyword":["thing"],"boost":1.0}},{"terms":{"dim_account.keyword":["give"],"boost":1.0}},{"terms":{"dim_account_id":[73027.78125],"boost":1.0}},{"terms":{"dim_name.keyword":["Andrea Jones"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},{"bool":{"filter":[{"range":{"timestamp":{"from":1714039200000,"to":1714042800000,"include_lower":true,"include_upper":false,"time_zone":"UTC","format":"epoch_millis","boost":1.0}}},{"terms":{"metricname.keyword":["Speed_Out"],"boost":1.0}},{"terms":{"unit.keyword":["stop"],"boost":1.0}},{"terms":{"namespace.keyword":["others"],"boost":1.0}},{"terms":{"dim_instance.keyword":["address"],"boost":1.0}},{"terms":{"dim_instance_name.keyword":["toward"],"boost":1.0}},{"terms":{"dim_account.keyword":["wall"],"boost":1.0}},{"terms":{"dim_account_id":[5028.21240234375],"boost":1.0}},{"terms":{"dim_name.keyword":["Reginald Steele"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},{"bool":{"filter":[{"range":{"timestamp":{"from":1714039200000,"to":1714042800000,"include_lower":true,"include_upper":false,"time_zone":"UTC","format":"epoch_millis","boost":1.0}}},{"terms":{"metricname.keyword":["Speed_Out"],"boost":1.0}},{"terms":{"unit.keyword":["special"],"boost":1.0}},{"terms":{"namespace.keyword":["service"],"boost":1.0}},{"terms":{"dim_instance.keyword":["research"],"boost":1.0}},{"terms":{"dim_instance_name.keyword":["former"],"boost":1.0}},{"terms":{"dim_account.keyword":["out"],"boost":1.0}},{"terms":{"dim_account_id":[7327.4951171875],"boost":1.0}},{"terms":{"dim_name.keyword":["Amy Roach"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},{"bool":{"filter":[{"range":{"timestamp":{"from":1714039200000,"to":1714042800000,"include_lower":true,"include_upper":false,"time_zone":"UTC","format":"epoch_millis","boost":1.0}}},{"terms":{"metricname.keyword":["Speed_Out"],"boost":1.0}},{"terms":{"unit.keyword":["speech"],"boost":1.0}},{"terms":{"namespace.keyword":["response"],"boost":1.0}},{"terms":{"dim_instance.keyword":["government"],"boost":1.0}},{"terms":{"dim_instance_name.keyword":["produce"],"boost":1.0}},{"terms":{"dim_account.keyword":["street"],"boost":1.0}},{"terms":{"dim_account_id":[62391.12890625],"boost":1.0}},{"terms":{"dim_name.keyword":["Jerry Nunez"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},{"bool":{"filter":[{"range":{"timestamp":{"from":1714039200000,"to":1714042800000,"include_lower":true,"include_upper":false,"time_zone":"UTC","format":"epoch_millis","boost":1.0}}},{"terms":{"metricname.keyword":["Speed_Out"],"boost":1.0}},{"terms":{"unit.keyword":["sport"],"boost":1.0}},{"terms":{"namespace.keyword":["imagine"],"boost":1.0}},{"terms":{"dim_instance.keyword":["country"],"boost":1.0}},{"terms":{"dim_instance_name.keyword":["into"],"boost":1.0}},{"terms":{"dim_account.keyword":["air"],"boost":1.0}},{"terms":{"dim_account_id":[58682.67578125],"boost":1.0}},{"terms":{"dim_name.keyword":["Paul Martin"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},{"bool":{"filter":[{"range":{"timestamp":{"from":1714039200000,"to":1714042800000,"include_lower":true,"include_upper":false,"time_zone":"UTC","format":"epoch_millis","boost":1.0}}},{"terms":{"metricname.keyword":["Speed_Out"],"boost":1.0}},{"terms":{"unit.keyword":["standard"],"boost":1.0}},{"terms":{"namespace.keyword":["realize"],"boost":1.0}},{"terms":{"dim_instance.keyword":["party"],"boost":1.0}},{"terms":{"dim_instance_name.keyword":["attack"],"boost":1.0}},{"terms":{"dim_account.keyword":["international"],"boost":1.0}},{"terms":{"dim_account_id":[11613.3173828125],"boost":1.0}},{"terms":{"dim_name.keyword":["Andrea Thomas"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},{"bool":{"filter":[{"range":{"timestamp":{"from":1714039200000,"to":1714042800000,"include_lower":true,"include_upper":false,"time_zone":"UTC","format":"epoch_millis","boost":1.0}}},{"terms":{"metricname.keyword":["Speed_Out"],"boost":1.0}},{"terms":{"unit.keyword":["speech"],"boost":1.0}},{"terms":{"namespace.keyword":["energy"],"boost":1.0}},{"terms":{"dim_instance.keyword":["well"],"boost":1.0}},{"terms":{"dim_instance_name.keyword":["officer"],"boost":1.0}},{"terms":{"dim_account.keyword":["south"],"boost":1.0}},{"terms":{"dim_account_id":[53121.55078125],"boost":1.0}},{"terms":{"dim_name.keyword":["Daniel Phillips"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},{"bool":{"filter":[{"range":{"timestamp":{"from":1714039200000,"to":1714042800000,"include_lower":true,"include_upper":false,"time_zone":"UTC","format":"epoch_millis","boost":1.0}}},{"terms":{"metricname.keyword":["Speed_Out"],"boost":1.0}},{"terms":{"unit.keyword":["speak"],"boost":1.0}},{"terms":{"namespace.keyword":["expect"],"boost":1.0}},{"terms":{"dim_instance.keyword":["great"],"boost":1.0}},{"terms":{"dim_instance_name.keyword":["loss"],"boost":1.0}},{"terms":{"dim_account.keyword":["professional"],"boost":1.0}},{"terms":{"dim_account_id":[84024.0078125],"boost":1.0}},{"terms":{"dim_name.keyword":["Raymond Spencer"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},{"bool":{"filter":[{"range":{"timestamp":{"from":1714039200000,"to":1714042800000,"include_lower":true,"include_upper":false,"time_zone":"UTC","format":"epoch_millis","boost":1.0}}},{"terms":{"metricname.keyword":["Speed_Out"],"boost":1.0}},{"terms":{"unit.keyword":["speak"],"boost":1.0}},{"terms":{"namespace.keyword":["item"],"boost":1.0}},{"terms":{"dim_instance.keyword":["light"],"boost":1.0}},{"terms":{"dim_instance_name.keyword":["account"],"boost":1.0}},{"terms":{"dim_account.keyword":["popular"],"boost":1.0}},{"terms":{"dim_account_id":[48242.03515625],"boost":1.0}},{"terms":{"dim_name.keyword":["Leslie Moore"],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}}],"adjust_pure_negative":true,"minimum_should_match":"1","boost":1.0}},"track_total_hits":-1,"aggregations":{"sample_transform":{"composite":{"size":10,"sources":[{"timestamp":{"date_histogram":{"field":"timestamp","missing_bucket":true,"order":"asc","calendar_interval":"1h","time_zone":"UTC"}}},{"metricname":{"terms":{"field":"metricname.keyword","missing_bucket":true,"order":"asc"}}},{"unit":{"terms":{"field":"unit.keyword","missing_bucket":true,"order":"asc"}}},{"namespace":{"terms":{"field":"namespace.keyword","missing_bucket":true,"order":"asc"}}},{"dim_instance":{"terms":{"field":"dim_instance.keyword","missing_bucket":true,"order":"asc"}}},{"dim_instance_name":{"terms":{"field":"dim_instance_name.keyword","missing_bucket":true,"order":"asc"}}},{"dim_account":{"terms":{"field":"dim_account.keyword","missing_bucket":true,"order":"asc"}}},{"dim_account_id":{"terms":{"field":"dim_account_id","missing_bucket":true,"order":"asc"}}},{"dim_name":{"terms":{"field":"dim_name.keyword","missing_bucket":true,"order":"asc"}}}]},"aggregations":{"sum":{"sum":{"field":"value"}},"avg":{"avg":{"field":"value"}},"max":{"max":{"field":"value"}},"min":{"min":{"field":"value"}},"count":{"value_count":{"field":"value"}}}}}}, cancelAfterTimeInterval=null}] lastShard [true]

[2024-04-25T10:50:10,976][DEBUG][o.o.a.s.TransportSearchAction] [62ad7c2d69ee53adcacae4d2dd88ed7d] #[null,java.lang.NullPointerException]#4 shards failed for phase: [query]
[clusterState.metadata.index(index) must not be null]; nested: NullPointerException[clusterState.metadata.index(index) must not be null];
        at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:695)
        at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:383)
        at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:702)
        at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:587)
        at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResultConsumed(AbstractSearchAsyncAction.java:575)
        at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$onShardResult$9(AbstractSearchAsyncAction.java:558)
        at org.opensearch.action.search.QueryPhaseResultConsumer$PendingMerges.consume(QueryPhaseResultConsumer.java:366)
        at org.opensearch.action.search.QueryPhaseResultConsumer.consumeResult(QueryPhaseResultConsumer.java:130)
        at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:558)
        at org.opensearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:150)
        at org.opensearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:286)
        at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:57)
        at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:42)
        at org.opensearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:79)
        at org.opensearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:49)
        at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:67)
        at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:584)
        at org.opensearch.transport.TransportService$6.handleResponse(TransportService.java:736)
        at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleResponse(SecurityInterceptor.java:306)
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1347)
        at org.opensearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1425)
        at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1405)
        at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:65)
        at org.opensearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:57)
        at org.opensearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:40)
        at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:71)
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:86)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
        at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
        at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:57)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:809)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NullPointerException: clusterState.metadata.index(index) must not be null
        at org.opensearch.indexmanagement.rollup.util.RollupUtilsKt.isRollupIndex(RollupUtils.kt:69)
        at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:81)
        at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:64)
        at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:94)
        at org.opensearch.transport.TransportService.sendLocalRequest(TransportService.java:938)
        at org.opensearch.transport.TransportService.access$100(TransportService.java:92)
        at org.opensearch.transport.TransportService$3.sendRequest(TransportService.java:151)
        at org.opensearch.transport.TransportService.sendRequestInternal(TransportService.java:876)
        at org.opensearch.security.transport.SecurityInterceptor.sendRequestDecorate(SecurityInterceptor.java:209)
        at org.opensearch.security.OpenSearchSecurityPlugin$7$2.sendRequest(OpenSearchSecurityPlugin.java:665)
        at com.amazonaws.elasticsearch.iam.IamTransportRequestSender.sendRequest(IamTransportRequestSender.java:94)
        at com.amazonaws.elasticsearch.ccs.CrossClusterRequestInterceptor$AddHeaderSender.sendRequest(CrossClusterRequestInterceptor.java:132)
        at org.opensearch.transport.TransportService.sendRequest(TransportService.java:763)
        at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:838)
        at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:826)
        at org.opensearch.action.search.SearchTransportService.sendExecuteQuery(SearchTransportService.java:196)
        at org.opensearch.action.search.SearchQueryThenFetchAsyncAction.executePhaseOnShard(SearchQueryThenFetchAsyncAction.java:124)
sarthakaggarwal97 commented 2 months ago

Thanks @rayshrey for opening this.

While the search phase for the transform job is executing, if any of the indices that are part of the source index are deleted

I think you mean when the source index is provided as a pattern, and then any of the resolved indices are deleted. Although, I suspect it would fail even if the only provided source index is deleted as well.

rayshrey commented 2 months ago

I think you mean when the source index is provided as a pattern, and then any of the resolved indices are deleted.

Yes, exactly

Although, I suspect it would fail even if the only provided source index is deleted as well.

Yes will need to check this as well.

karthikeyan21 commented 2 months ago

Any idea on which releases this affect? @rayshrey Can you please add the version you are testing

rayshrey commented 2 months ago

@karthikeyan21 I tested in version 1.3 (Updated the issue details as well)

dblock commented 1 week ago

Catch All Triage - 1 2 3 4 5 6