opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.48k stars 1.74k forks source link

Master Eligible Node didn't rejoin after ES cluster rolling restart #1597

Open GaneshJayaram97 opened 2 years ago

GaneshJayaram97 commented 2 years ago

Elasticsearch version

7.10.2

JVM version (java -version)

openjdk 11.0.12 2021-07-20 LTS OpenJDK Runtime Environment 18.9 (build 11.0.12+7-LTS) OpenJDK 64-Bit Server VM 18.9 (build 11.0.12+7-LTS, mixed mode, sharing)

ES Cluster Topology

5 node setup

3 Master Eligible Nodes (Here, all the nodes are data nodes as well) 2 Data Nodes

Describe the bug We formed a cluster with 5 nodes and while performing a rolling restart observed that previous master node (before rolling restart) is not rejoining the cluster back again. While others have successfully rejoined the cluster

 Following are the things that were observed when the issue occurred 

 1.  Cluster tasks with source "**elected-as-master ([2] nodes joined)**" is running forever causing the other tasks for node-left and node-join to wait in the queue forever

        On a side-note, since the old leader has been disconnected and when it came online ephemeral id got changed and hence new leader node cannot connect with it with the exception "handshake failed. unexpected remote node" 

        From new ES leader Logs :

        org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (auto create [myindex-2021-11-03]) within 1m
            at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:143) ~[elasticsearch-7.10.2.jar:7.10.2]
            at java.util.ArrayList.forEach(ArrayList.java:1541) ~[?:?]
            at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:142) ~[elasticsearch-7.10.2.jar:7.10.2]
            at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) ~[elasticsearch-7.10.2.jar:7.10.2]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
            at java.lang.Thread.run(Thread.java:829) [?:?]
        [2021-11-02T13:19:35,111][WARN ][o.e.c.NodeConnectionsService] [y.y.y.y] failed to connect to {x.x.x.x}{37L59gGkQDSy_Mm4nVKq0A}{GozQ-qT_SNuDZ1PThGs0Dw}{x.x.x.x}{x.x.x.x:9300}{dimr} (tried [26725] times)
        org.elasticsearch.transport.ConnectTransportException: [x.x.x.x][x.x.x.x:9300] handshake failed. unexpected remote node {x.x.x.x}{37L59gGkQDSy_Mm4nVKq0A}{Ai6cyeSIQdKFbWTUThPyYw}{x.x.x.x}{x.x.x.x:9300}{dimr}
            at org.elasticsearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:389) ~[elasticsearch-7.10.2.jar:7.10.2]
            at org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:157) [elasticsearch-7.10.2.jar:7.10.2]
            at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:476) [elasticsearch-7.10.2.jar:7.10.2]
            at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:466) [elasticsearch-7.10.2.jar:7.10.2]
            at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.10.2.jar:7.10.2]
            at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1171) [elasticsearch-7.10.2.jar:7.10.2]
            at org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:253) [elasticsearch-7.10.2.jar:7.10.2]
            at org.elasticsearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:247) [elasticsearch-7.10.2.jar:7.10.2]
            at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) [elasticsearch-7.10.2.jar:7.10.2]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
            at java.lang.Thread.run(Thread.java:829) [?:?]

Note :

            The above logs are repetitive in nature and filled the ES logs
 2. The old leader which came online couldn't able to join back the cluster and keep on re-trying to re-joining

        Old leader logs :

            [2021-11-02T07:41:47,535][DEBUG][o.e.c.c.ElectionSchedulerFactory] [x.x.x.x] scheduleNextElection{gracePeriod=500ms, thisAttempt=23, maxDelayMillis=2400, delayMillis=2571, ElectionScheduler{attempt=24, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}} starting election
            [2021-11-02T07:41:47,535][DEBUG][o.e.c.c.ElectionSchedulerFactory] [x.x.x.x] scheduling scheduleNextElection{gracePeriod=500ms, thisAttempt=24, maxDelayMillis=2500, delayMillis=2780, ElectionScheduler{attempt=25, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
            [2021-11-02T07:41:47,535][DEBUG][o.e.c.c.PreVoteCollector ] [x.x.x.x] PreVotingRound{preVotesReceived={}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={x.x.x.x}{37L59gGkQDSy_Mm4nVKq0A}{jSDAxnsITo6vEuJ39hI5tA}{x.x.x.x}{x.x.x.x:9300}{dimr}, currentTerm=23}, isClosed=false} requesting pre-votes from [{x.x.x.x}{37L59gGkQDSy_Mm4nVKq0A}{jSDAxnsITo6vEuJ39hI5tA}{x.x.x.x}{x.x.x.x:9300}{dimr}, {a.a.a.a}{tb0FsGKdS8GVGanjyLx5fg}{32KWdVyQSw-g0DvJQ5mnrg}{a.a.a.a}{a.a.a.a:9300}{dimr}, {y.y.y.y}{oI2zxdVtSM-DRhQZL0jqRw}{S7_kyKpoRwWVW6OEf4wChg}{y.y.y.y}{y.y.y.y:9300}{dimr}]
            [2021-11-02T07:41:47,539][DEBUG][o.e.c.c.PreVoteCollector ] [x.x.x.x] TransportResponseHandler{PreVoteCollector{state=Tuple [v1=null, v2=PreVoteResponse{currentTerm=23, lastAcceptedTerm=22, lastAcceptedVersion=3662}]}, node={a.a.a.a}{tb0FsGKdS8GVGanjyLx5fg}{32KWdVyQSw-g0DvJQ5mnrg}{a.a.a.a}{a.a.a.a:9300}{dimr}} failed
            org.elasticsearch.transport.RemoteTransportException: [a.a.a.a][a.a.a.a:9300][internal:cluster/request_pre_vote]
            Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: rejecting PreVoteRequest{sourceNode={x.x.x.x}{37L59gGkQDSy_Mm4nVKq0A}{jSDAxnsITo6vEuJ39hI5tA}{x.x.x.x}{x.x.x.x:9300}{dimr}, currentTerm=23} as there is already a leader
                at org.elasticsearch.cluster.coordination.PreVoteCollector.handlePreVoteRequest(PreVoteCollector.java:135) ~[elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.cluster.coordination.PreVoteCollector.lambda$new$0(PreVoteCollector.java:74) ~[elasticsearch-7.10.2.jar:7.10.2]
                at com.amazon.opendistro.elasticsearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:49) ~[?:?]
                at com.amazon.opendistroforelasticsearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:124) ~[?:?]
                at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:72) ~[elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:305) ~[elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:743) ~[elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.2.jar:7.10.2]
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
                at java.lang.Thread.run(Thread.java:829) [?:?]
            [2021-11-02T07:41:47,542][DEBUG][o.e.c.c.PreVoteCollector ] [x.x.x.x] PreVotingRound{preVotesReceived={{x.x.x.x}{37L59gGkQDSy_Mm4nVKq0A}{jSDAxnsITo6vEuJ39hI5tA}{x.x.x.x}{x.x.x.x:9300}{dimr}=PreVoteResponse{currentTerm=23, lastAcceptedTerm=22, lastAcceptedVersion=3662}}, electionStarted=false, preVoteRequest=PreVoteRequest{sourceNode={x.x.x.x}{37L59gGkQDSy_Mm4nVKq0A}{jSDAxnsITo6vEuJ39hI5tA}{x.x.x.x}{x.x.x.x:9300}{dimr}, currentTerm=23}, isClosed=false} added PreVoteResponse{currentTerm=23, lastAcceptedTerm=22, lastAcceptedVersion=3662} from {x.x.x.x}{37L59gGkQDSy_Mm4nVKq0A}{jSDAxnsITo6vEuJ39hI5tA}{x.x.x.x}{x.x.x.x:9300}{dimr}, no quorum yet
            [2021-11-02T07:41:47,541][DEBUG][o.e.c.c.PreVoteCollector ] [x.x.x.x] TransportResponseHandler{PreVoteCollector{state=Tuple [v1=null, v2=PreVoteResponse{currentTerm=23, lastAcceptedTerm=22, lastAcceptedVersion=3662}]}, node={y.y.y.y}{oI2zxdVtSM-DRhQZL0jqRw}{S7_kyKpoRwWVW6OEf4wChg}{y.y.y.y}{y.y.y.y:9300}{dimr}} failed
            org.elasticsearch.transport.RemoteTransportException: [y.y.y.y][y.y.y.y:9300][internal:cluster/request_pre_vote]
            Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: rejecting PreVoteRequest{sourceNode={x.x.x.x}{37L59gGkQDSy_Mm4nVKq0A}{jSDAxnsITo6vEuJ39hI5tA}{x.x.x.x}{x.x.x.x:9300}{dimr}, currentTerm=23} as there is already a leader
                at org.elasticsearch.cluster.coordination.PreVoteCollector.handlePreVoteRequest(PreVoteCollector.java:135) ~[elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.cluster.coordination.PreVoteCollector.lambda$new$0(PreVoteCollector.java:74) ~[elasticsearch-7.10.2.jar:7.10.2]
                at com.amazon.opendistro.elasticsearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:49) ~[?:?]
                at com.amazon.opendistroforelasticsearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:124) ~[?:?]
                at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:72) ~[elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:305) ~[elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:743) ~[elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.2.jar:7.10.2]
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
                at java.lang.Thread.run(Thread.java:829) [?:?]

            [2021-11-02T07:41:48,126][DEBUG][o.e.c.c.JoinHelper       ] [x.x.x.x] already attempting to join {y.y.y.y}{oI2zxdVtSM-DRhQZL0jqRw}{S7_kyKpoRwWVW6OEf4wChg}{y.y.y.y}{y.y.y.y:9300}{dimr} with request JoinRequest{sourceNode={x.x.x.x}{37L59gGkQDSy_Mm4nVKq0A}{jSDAxnsITo6vEuJ39hI5tA}{x.x.x.x}{x.x.x.x:9300}{dimr}, minimumTerm=23, optionalJoin=Optional.empty}, not sending request
            [2021-11-02T07:41:48,126][DEBUG][o.e.d.PeerFinder         ] [x.x.x.x] Peer{transportAddress=c.c.c.c:9300, discoveryNode=null, peersRequestInFlight=false} connection failed
            org.elasticsearch.transport.ConnectTransportException: [][c.c.c.c:9300] connect_timeout[3s]
                at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onTimeout(TcpTransport.java:984) ~[elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) ~[elasticsearch-7.10.2.jar:7.10.2]
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
                at java.lang.Thread.run(Thread.java:829) [?:?]
            [2021-11-02T07:41:48,130][DEBUG][o.e.d.PeerFinder         ] [x.x.x.x] Peer{transportAddress=b.b.b.b:9300, discoveryNode=null, peersRequestInFlight=false} connection failed
            org.elasticsearch.transport.ConnectTransportException: [b.b.b.b][b.b.b.b:9300] non-master-eligible node found
                at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1$1$1.innerOnResponse(HandshakingTransportAddressConnector.java:107) [elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.discovery.HandshakingTransportAddressConnector$1$1$1.innerOnResponse(HandshakingTransportAddressConnector.java:95) [elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:40) [elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:163) [elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:476) [elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:466) [elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1171) [elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:253) [elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:247) [elasticsearch-7.10.2.jar:7.10.2]
                at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) [elasticsearch-7.10.2.jar:7.10.2]
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
                at java.lang.Thread.run(Thread.java:829) [?:?]

            [2021-11-02T07:41:49,087][WARN ][o.e.c.c.ClusterFormationFailureHelper] [x.x.x.x] master not discovered or elected yet, an election requires at least 2 nodes with ids from [tb0FsGKdS8GVGanjyLx5fg, 37L59gGkQDSy_Mm4nVKq0A, oI2zxdVtSM-DRhQZL0jqRw], have discovered [{x.x.x.x}{37L59gGkQDSy_Mm4nVKq0A}{jSDAxnsITo6vEuJ39hI5tA}{x.x.x.x}{x.x.x.x:9300}{dimr}, {a.a.a.a}{tb0FsGKdS8GVGanjyLx5fg}{32KWdVyQSw-g0DvJQ5mnrg}{a.a.a.a}{a.a.a.a:9300}{dimr}, {y.y.y.y}{oI2zxdVtSM-DRhQZL0jqRw}{S7_kyKpoRwWVW6OEf4wChg}{y.y.y.y}{y.y.y.y:9300}{dimr}] which is a quorum; discovery will continue using [b.b.b.b:9300, a.a.a.a:9300, y.y.y.y:9300, c.c.c.c:9300] from hosts providers and [{x.x.x.x}{37L59gGkQDSy_Mm4nVKq0A}{jSDAxnsITo6vEuJ39hI5tA}{x.x.x.x}{x.x.x.x:9300}{dimr}] from last-known cluster state; node term 23, last-accepted version 3662 in term 22
            [2021-11-02T07:41:49,101][WARN ][o.e.n.Node               ] [x.x.x.x] timed out while waiting for initial discovery state - timeout: 30s

 3. Cluster health is green and below is the output 

         curl -XGET "localhost:9200/_cluster/health?pretty"
                {
                "cluster_name" : "elasticsearch",
                "status" : "green",
                "timed_out" : false,
                "number_of_nodes" : 4,
                "number_of_data_nodes" : 4,
                "active_primary_shards" : 43,
                "active_shards" : 122,
                "relocating_shards" : 0,
                "initializing_shards" : 0,
                "unassigned_shards" : 0,
                "delayed_unassigned_shards" : 0,
                "number_of_pending_tasks" : 149,
                "number_of_in_flight_fetch" : 0,
                "task_max_waiting_in_queue_millis" : 271277743,
                "active_shards_percent_as_number" : 100.0
                }

 4. Output of cluster state from new leader node which has old ephemeral id of the old leader node

         curl -XGET "localhost:9200/_cluster/state?filter_path=version,nodes,metadata.cluster_coordination&pretty"
                {
                    "version": 3663,
                    "nodes": {
                    "tb0FsGKdS8GVGanjyLx5fg": {
                        "name": "a.a.a.a",
                        "ephemeral_id": "32KWdVyQSw-g0DvJQ5mnrg",
                        "transport_address": "a.a.a.a:9300",
                        "attributes": {}
                    },
                    "37L59gGkQDSy_Mm4nVKq0A": {
                        "name": "x.x.x.x",
                        "ephemeral_id": "GozQ-qT_SNuDZ1PThGs0Dw",
                        "transport_address": "x.x.x.x:9300",
                        "attributes": {}
                    },
                    "3vhpqpzqQ4WooDM07CWmaQ": {
                        "name": "b.b.b.b",
                        "ephemeral_id": "mU6REJeJTCSNuXcigNQHmg",
                        "transport_address": "b.b.b.b:9300",
                        "attributes": {}
                    },
                    "oI2zxdVtSM-DRhQZL0jqRw": {
                        "name": "y.y.y.y",
                        "ephemeral_id": "S7_kyKpoRwWVW6OEf4wChg",
                        "transport_address": "y.y.y.y.:9300",
                        "attributes": {}
                    }
                    },
                    "metadata": {
                    "cluster_coordination": {
                        "term": 23,
                        "last_committed_config": [
                        "tb0FsGKdS8GVGanjyLx5fg",
                        "oI2zxdVtSM-DRhQZL0jqRw",
                        "37L59gGkQDSy_Mm4nVKq0A"
                        ],
                        "last_accepted_config": [
                        "tb0FsGKdS8GVGanjyLx5fg",
                        "oI2zxdVtSM-DRhQZL0jqRw",
                        "37L59gGkQDSy_Mm4nVKq0A"
                        ],
                        "voting_config_exclusions": []
                    }
                    }
                } 

 5. Cluster tasks keep on growing over period of time

        Output of GET /_cluster/pending_tasks from newly elected leader node

            {
                "tasks": [
                    {
                    "insert_order": 1,
                    "priority": "URGENT",
                    "source": "elected-as-master ([2] nodes joined)",
                    "executing": true,
                    "time_in_queue_millis": 264617841,
                    "time_in_queue": "3d"
                    },
                    {
                    "insert_order": 3,
                    "priority": "IMMEDIATE",
                    "source": "node-left",
                    "executing": false,
                    "time_in_queue_millis": 264617803,
                    "time_in_queue": "3d"
                    },
                    {
                    "insert_order": 6,
                    "priority": "URGENT",
                    "source": "node-join",
                    "executing": false,
                    "time_in_queue_millis": 264529931,
                    "time_in_queue": "3d"
                    },
                    {
                    "insert_order": 1871,
                    "priority": "URGENT",
                    "source": "node-join",
                    "executing": false,
                    "time_in_queue_millis": 178425300,
                    "time_in_queue": "2d"
                    },
                    {
                    "insert_order": 9324,
                    "priority": "URGENT",
                    "source": "node-join",
                    "executing": false,
                    "time_in_queue_millis": 17498238,
                    "time_in_queue": "4.8h"
                    },
                    {
                    "insert_order": 970,
                    "priority": "URGENT",
                    "source": "node-join",
                    "executing": false,
                    "time_in_queue_millis": 197779438,
                    "time_in_queue": "2.2d"
                    },
                    {
                    "insert_order": 7,
                    "priority": "HIGH",
                    "source": "shard-failed",
                    "executing": false,
                    "time_in_queue_millis": 264523252,
                    "time_in_queue": "3d"
                    },
                    {
                    "insert_order": 8,
                    "priority": "HIGH",
                    "source": "shard-failed",
                    "executing": false,
                    "time_in_queue_millis": 264522485,
                    "time_in_queue": "3d"
                    },
                    {
                    "insert_order": 9328,
                    "priority": "URGENT",
                    "source": "node-join",
                    "executing": false,
                    "time_in_queue_millis": 17431702,
                    "time_in_queue": "4.8h"
                    },
                    {
                    "insert_order": 9426,
                    "priority": "URGENT",
                    "source": "node-join",
                    "executing": false,
                    "time_in_queue_millis": 15536579,
                    "time_in_queue": "4.3h"
                    },
                    {
                    "insert_order": 5062,
                    "priority": "URGENT",
                    "source": "node-join",
                    "executing": false,
                    "time_in_queue_millis": 109487418,
                    "time_in_queue": "1.2d"
                    },
                    {
                    "insert_order": 5364,
                    "priority": "URGENT",
                    "source": "node-join",
                    "executing": false,
                    "time_in_queue_millis": 102934485,
                    "time_in_queue": "1.1d"
                    },
                    {
                    "insert_order": 4,
                    "priority": "NORMAL",
                    "source": "update snapshot after shards started [false] or node configuration changed [true]",
                    "executing": false,
                    "time_in_queue_millis": 264617719,
                    "time_in_queue": "3d"
                    },
                    {
                    "insert_order": 5,
                    "priority": "NORMAL",
                    "source": "rollover_index source [.opendistro-ism-managed-index-history-2021.10.29-000008] to target [.opendistro-ism-managed-index-history-2021.10.30-000009]",
                    "executing": false,
                    "time_in_queue_millis": 264617680,
                    "time_in_queue": "3d"
                    },
                    {
                    "insert_order": 18,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 246056219,
                    "time_in_queue": "2.8d"
                    },
                    {
                    "insert_order": 16,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 246769596,
                    "time_in_queue": "2.8d"
                    },
                    {
                    "insert_order": 2,
                    "priority": "HIGH",
                    "source": "cluster_reroute(post-join reroute)",
                    "executing": false,
                    "time_in_queue_millis": 264617825,
                    "time_in_queue": "3d"
                    },
                    {
                    "insert_order": 9409,
                    "priority": "URGENT",
                    "source": "node-join",
                    "executing": false,
                    "time_in_queue_millis": 15748370,
                    "time_in_queue": "4.3h"
                    },
                    {
                    "insert_order": 9604,
                    "priority": "URGENT",
                    "source": "node-join",
                    "executing": false,
                    "time_in_queue_millis": 11640387,
                    "time_in_queue": "3.2h"
                    },
                    {
                    "insert_order": 10,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 257306891,
                    "time_in_queue": "2.9d"
                    },
                    {
                    "insert_order": 12,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 253892465,
                    "time_in_queue": "2.9d"
                    },
                    {
                    "insert_order": 13,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 252715177,
                    "time_in_queue": "2.9d"
                    },
                    {
                    "insert_order": 24,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 227759462,
                    "time_in_queue": "2.6d"
                    },
                    {
                    "insert_order": 1369,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 189169597,
                    "time_in_queue": "2.1d"
                    },
                    {
                    "insert_order": 1409,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 188453228,
                    "time_in_queue": "2.1d"
                    },
                    {
                    "insert_order": 26,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 223915180,
                    "time_in_queue": "2.5d"
                    },
                    {
                    "insert_order": 15,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 246771340,
                    "time_in_queue": "2.8d"
                    },
                    {
                    "insert_order": 14,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 246771487,
                    "time_in_queue": "2.8d"
                    },
                    {
                    "insert_order": 6747,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 73253466,
                    "time_in_queue": "20.3h"
                    },
                    {
                    "insert_order": 32,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 217969598,
                    "time_in_queue": "2.5d"
                    },
                    {
                    "insert_order": 31,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 217969599,
                    "time_in_queue": "2.5d"
                    },
                    {
                    "insert_order": 67,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 217256215,
                    "time_in_queue": "2.5d"
                    },
                    {
                    "insert_order": 17,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 246769595,
                    "time_in_queue": "2.8d"
                    },
                    {
                    "insert_order": 71,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 217253224,
                    "time_in_queue": "2.5d"
                    },
                    {
                    "insert_order": 22,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 246053231,
                    "time_in_queue": "2.8d"
                    },
                    {
                    "insert_order": 9,
                    "priority": "HIGH",
                    "source": "shard-failed",
                    "executing": false,
                    "time_in_queue_millis": 264510886,
                    "time_in_queue": "3d"
                    },
                    {
                    "insert_order": 10136,
                    "priority": "URGENT",
                    "source": "auto create [myindex-2021-11-02]",
                    "executing": false,
                    "time_in_queue_millis": 47453,
                    "time_in_queue": "47.4s"
                    },
                    {
                    "insert_order": 19,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 246055201,
                    "time_in_queue": "2.8d"
                    },
                    {
                    "insert_order": 70,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 217253468,
                    "time_in_queue": "2.5d"
                    },
                    {
                    "insert_order": 21,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 246053463,
                    "time_in_queue": "2.8d"
                    },
                    {
                    "insert_order": 4076,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 130854208,
                    "time_in_queue": "1.5d"
                    },
                    {
                    "insert_order": 1039,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 196292464,
                    "time_in_queue": "2.2d"
                    },
                    {
                    "insert_order": 25,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 225092469,
                    "time_in_queue": "2.6d"
                    },
                    {
                    "insert_order": 23,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 228507020,
                    "time_in_queue": "2.6d"
                    },
                    {
                    "insert_order": 1367,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 189171488,
                    "time_in_queue": "2.1d"
                    },
                    {
                    "insert_order": 1370,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 189169597,
                    "time_in_queue": "2.1d"
                    },
                    {
                    "insert_order": 1408,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 188453468,
                    "time_in_queue": "2.1d"
                    },
                    {
                    "insert_order": 1405,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 188456218,
                    "time_in_queue": "2.1d"
                    },
                    {
                    "insert_order": 5412,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 102054198,
                    "time_in_queue": "1.1d"
                    },
                    {
                    "insert_order": 5413,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 102053468,
                    "time_in_queue": "1.1d"
                    },
                    {
                    "insert_order": 30,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 217971353,
                    "time_in_queue": "2.5d"
                    },
                    {
                    "insert_order": 27,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 223414995,
                    "time_in_queue": "2.5d"
                    },
                    {
                    "insert_order": 2216,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 170907027,
                    "time_in_queue": "1.9d"
                    },
                    {
                    "insert_order": 2452,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 165814993,
                    "time_in_queue": "1.9d"
                    },
                    {
                    "insert_order": 29,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 217971486,
                    "time_in_queue": "2.5d"
                    },
                    {
                    "insert_order": 2250,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 170159458,
                    "time_in_queue": "1.9d"
                    },
                    {
                    "insert_order": 7554,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 55707030,
                    "time_in_queue": "15.4h"
                    },
                    {
                    "insert_order": 7588,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 54959456,
                    "time_in_queue": "15.2h"
                    },
                    {
                    "insert_order": 2430,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 166315178,
                    "time_in_queue": "1.9d"
                    },
                    {
                    "insert_order": 2705,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 160369596,
                    "time_in_queue": "1.8d"
                    },
                    {
                    "insert_order": 8042,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 45169601,
                    "time_in_queue": "12.5h"
                    },
                    {
                    "insert_order": 2703,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 160371348,
                    "time_in_queue": "1.8d"
                    },
                    {
                    "insert_order": 69,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 217254209,
                    "time_in_queue": "2.5d"
                    },
                    {
                    "insert_order": 2740,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 159656216,
                    "time_in_queue": "1.8d"
                    },
                    {
                    "insert_order": 2741,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 159655195,
                    "time_in_queue": "1.8d"
                    },
                    {
                    "insert_order": 20,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 246054208,
                    "time_in_queue": "2.8d"
                    },
                    {
                    "insert_order": 2743,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 159653467,
                    "time_in_queue": "1.8d"
                    },
                    {
                    "insert_order": 3584,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 141359459,
                    "time_in_queue": "1.6d"
                    },
                    {
                    "insert_order": 68,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 217255198,
                    "time_in_queue": "2.5d"
                    },
                    {
                    "insert_order": 880,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 199707029,
                    "time_in_queue": "2.3d"
                    },
                    {
                    "insert_order": 3764,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 137515176,
                    "time_in_queue": "1.5d"
                    },
                    {
                    "insert_order": 11,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 256559442,
                    "time_in_queue": "2.9d"
                    },
                    {
                    "insert_order": 10137,
                    "priority": "URGENT",
                    "source": "auto create [myindex-2021-11-02]",
                    "executing": false,
                    "time_in_queue_millis": 43422,
                    "time_in_queue": "43.4s"
                    },
                    {
                    "insert_order": 4039,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 131569595,
                    "time_in_queue": "1.5d"
                    },
                    {
                    "insert_order": 4036,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 131571484,
                    "time_in_queue": "1.5d"
                    },
                    {
                    "insert_order": 4037,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 131571349,
                    "time_in_queue": "1.5d"
                    },
                    {
                    "insert_order": 914,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 198959459,
                    "time_in_queue": "2.3d"
                    },
                    {
                    "insert_order": 4078,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 130853233,
                    "time_in_queue": "1.5d"
                    },
                    {
                    "insert_order": 4074,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 130856211,
                    "time_in_queue": "1.5d"
                    },
                    {
                    "insert_order": 4075,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 130855196,
                    "time_in_queue": "1.5d"
                    },
                    {
                    "insert_order": 4884,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 113307027,
                    "time_in_queue": "1.3d"
                    },
                    {
                    "insert_order": 4077,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 130853471,
                    "time_in_queue": "1.5d"
                    },
                    {
                    "insert_order": 1095,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 195115175,
                    "time_in_queue": "2.2d"
                    },
                    {
                    "insert_order": 4918,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 112559456,
                    "time_in_queue": "1.3d"
                    },
                    {
                    "insert_order": 5042,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 109892468,
                    "time_in_queue": "1.2d"
                    },
                    {
                    "insert_order": 1117,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 194614992,
                    "time_in_queue": "2.2d"
                    },
                    {
                    "insert_order": 5099,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 108715181,
                    "time_in_queue": "1.2d"
                    },
                    {
                    "insert_order": 5121,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 108214985,
                    "time_in_queue": "1.2d"
                    },
                    {
                    "insert_order": 1368,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 189171352,
                    "time_in_queue": "2.1d"
                    },
                    {
                    "insert_order": 5375,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 102769595,
                    "time_in_queue": "1.1d"
                    },
                    {
                    "insert_order": 5372,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 102771482,
                    "time_in_queue": "1.1d"
                    },
                    {
                    "insert_order": 5373,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 102771353,
                    "time_in_queue": "1.1d"
                    },
                    {
                    "insert_order": 5374,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 102769596,
                    "time_in_queue": "1.1d"
                    },
                    {
                    "insert_order": 5414,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 102053231,
                    "time_in_queue": "1.1d"
                    },
                    {
                    "insert_order": 5410,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 102056216,
                    "time_in_queue": "1.1d"
                    },
                    {
                    "insert_order": 5411,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 102055193,
                    "time_in_queue": "1.1d"
                    },
                    {
                    "insert_order": 6220,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 84507028,
                    "time_in_queue": "23.4h"
                    },
                    {
                    "insert_order": 6254,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 83759459,
                    "time_in_queue": "23.2h"
                    },
                    {
                    "insert_order": 6434,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 79915180,
                    "time_in_queue": "22.1h"
                    },
                    {
                    "insert_order": 6378,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 81092470,
                    "time_in_queue": "22.5h"
                    },
                    {
                    "insert_order": 6456,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 79414994,
                    "time_in_queue": "22h"
                    },
                    {
                    "insert_order": 1407,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 188454208,
                    "time_in_queue": "2.1d"
                    },
                    {
                    "insert_order": 1406,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 188455197,
                    "time_in_queue": "2.1d"
                    },
                    {
                    "insert_order": 6709,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 73969597,
                    "time_in_queue": "20.5h"
                    },
                    {
                    "insert_order": 6706,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 73971484,
                    "time_in_queue": "20.5h"
                    },
                    {
                    "insert_order": 6707,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 73971352,
                    "time_in_queue": "20.5h"
                    },
                    {
                    "insert_order": 6708,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 73969600,
                    "time_in_queue": "20.5h"
                    },
                    {
                    "insert_order": 6748,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 73253233,
                    "time_in_queue": "20.3h"
                    },
                    {
                    "insert_order": 6744,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 73256216,
                    "time_in_queue": "20.3h"
                    },
                    {
                    "insert_order": 6745,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 73255201,
                    "time_in_queue": "20.3h"
                    },
                    {
                    "insert_order": 6746,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 73254209,
                    "time_in_queue": "20.3h"
                    },
                    {
                    "insert_order": 2374,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 167492468,
                    "time_in_queue": "1.9d"
                    },
                    {
                    "insert_order": 7768,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 51115174,
                    "time_in_queue": "14.1h"
                    },
                    {
                    "insert_order": 7712,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 52292472,
                    "time_in_queue": "14.5h"
                    },
                    {
                    "insert_order": 7790,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 50614991,
                    "time_in_queue": "14h"
                    },
                    {
                    "insert_order": 8080,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 44454209,
                    "time_in_queue": "12.3h"
                    },
                    {
                    "insert_order": 2702,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 160371481,
                    "time_in_queue": "1.8d"
                    },
                    {
                    "insert_order": 8043,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 45169599,
                    "time_in_queue": "12.5h"
                    },
                    {
                    "insert_order": 8040,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 45171483,
                    "time_in_queue": "12.5h"
                    },
                    {
                    "insert_order": 8041,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 45171351,
                    "time_in_queue": "12.5h"
                    },
                    {
                    "insert_order": 8081,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 44453470,
                    "time_in_queue": "12.3h"
                    },
                    {
                    "insert_order": 8082,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 44453233,
                    "time_in_queue": "12.3h"
                    },
                    {
                    "insert_order": 8078,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 44456213,
                    "time_in_queue": "12.3h"
                    },
                    {
                    "insert_order": 8079,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 44455201,
                    "time_in_queue": "12.3h"
                    },
                    {
                    "insert_order": 8888,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 26907030,
                    "time_in_queue": "7.4h"
                    },
                    {
                    "insert_order": 2704,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 160369596,
                    "time_in_queue": "1.8d"
                    },
                    {
                    "insert_order": 2744,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 159653231,
                    "time_in_queue": "1.8d"
                    },
                    {
                    "insert_order": 8922,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 26159456,
                    "time_in_queue": "7.2h"
                    },
                    {
                    "insert_order": 9124,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 21814992,
                    "time_in_queue": "6h"
                    },
                    {
                    "insert_order": 9046,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 23492466,
                    "time_in_queue": "6.5h"
                    },
                    {
                    "insert_order": 9102,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 22315179,
                    "time_in_queue": "6.1h"
                    },
                    {
                    "insert_order": 2742,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 159654209,
                    "time_in_queue": "1.8d"
                    },
                    {
                    "insert_order": 3550,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 142107026,
                    "time_in_queue": "1.6d"
                    },
                    {
                    "insert_order": 9379,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 16369599,
                    "time_in_queue": "4.5h"
                    },
                    {
                    "insert_order": 9376,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 16371477,
                    "time_in_queue": "4.5h"
                    },
                    {
                    "insert_order": 9377,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 16371352,
                    "time_in_queue": "4.5h"
                    },
                    {
                    "insert_order": 9378,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 16369599,
                    "time_in_queue": "4.5h"
                    },
                    {
                    "insert_order": 3708,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 138692467,
                    "time_in_queue": "1.6d"
                    },
                    {
                    "insert_order": 9419,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 15653228,
                    "time_in_queue": "4.3h"
                    },
                    {
                    "insert_order": 9415,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 15656216,
                    "time_in_queue": "4.3h"
                    },
                    {
                    "insert_order": 9416,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 15655196,
                    "time_in_queue": "4.3h"
                    },
                    {
                    "insert_order": 9417,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 15654209,
                    "time_in_queue": "4.3h"
                    },
                    {
                    "insert_order": 9418,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 15653465,
                    "time_in_queue": "4.3h"
                    },
                    {
                    "insert_order": 3786,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 137014990,
                    "time_in_queue": "1.5d"
                    },
                    {
                    "insert_order": 4038,
                    "priority": "NORMAL",
                    "source": "opendistro-im",
                    "executing": false,
                    "time_in_queue_millis": 131569598,
                    "time_in_queue": "1.5d"
                    }
                ]
            }

 6. Thread Pool Info 

        curl -X GET "localhost:9200/_cat/thread_pool?pretty"
        a.a.a.a  ad-batch-task-threadpool               0 0 0
        a.a.a.a  ad-threadpool                          0 0 0
        a.a.a.a  analyze                                0 0 0
        a.a.a.a  fetch_shard_started                    0 0 0
        a.a.a.a  fetch_shard_store                      0 0 0
        a.a.a.a  flush                                  0 0 0
        a.a.a.a  force_merge                            0 0 0
        a.a.a.a  generic                                0 0 0
        a.a.a.a  get                                    0 0 0
        a.a.a.a  listener                               0 0 0
        a.a.a.a  management                             1 0 0
        a.a.a.a  open_distro_job_scheduler              0 0 0
        a.a.a.a  opendistro_asynchronous_search_generic 0 0 0
        a.a.a.a  refresh                                0 0 0
        a.a.a.a  search                                 0 0 0
        a.a.a.a  search_throttled                       0 0 0
        a.a.a.a  snapshot                               0 0 0
        a.a.a.a  sql-worker                             0 0 0
        a.a.a.a  system_read                            0 0 0
        a.a.a.a  system_write                           0 0 0
        a.a.a.a  warmer                                 0 0 0
        a.a.a.a  write                                  0 0 0
        b.b.b.b ad-batch-task-threadpool               0 0 0
        b.b.b.b ad-threadpool                          0 0 0
        b.b.b.b analyze                                0 0 0
        b.b.b.b fetch_shard_started                    0 0 0
        b.b.b.b fetch_shard_store                      0 0 0
        b.b.b.b flush                                  0 0 0
        b.b.b.b force_merge                            0 0 0
        b.b.b.b generic                                0 0 0
        b.b.b.b get                                    0 0 0
        b.b.b.b listener                               0 0 0
        b.b.b.b management                             1 0 0
        b.b.b.b open_distro_job_scheduler              0 0 0
        b.b.b.b opendistro_asynchronous_search_generic 0 0 0
        b.b.b.b refresh                                0 0 0
        b.b.b.b search                                 0 0 0
        b.b.b.b search_throttled                       0 0 0
        b.b.b.b snapshot                               0 0 0
        b.b.b.b sql-worker                             0 0 0
        b.b.b.b system_read                            0 0 0
        b.b.b.b system_write                           0 0 0
        b.b.b.b warmer                                 0 0 0
        b.b.b.b write                                  0 0 0
        y.y.y.y  ad-batch-task-threadpool               0 0 0
        y.y.y.y  ad-threadpool                          0 0 0
        y.y.y.y  analyze                                0 0 0
        y.y.y.y  fetch_shard_started                    0 0 0
        y.y.y.y  fetch_shard_store                      0 0 0
        y.y.y.y  flush                                  0 0 0
        y.y.y.y  force_merge                            0 0 0
        y.y.y.y  generic                                0 0 0
        y.y.y.y  get                                    0 0 0
        y.y.y.y  listener                               0 0 0
        y.y.y.y  management                             1 0 0
        y.y.y.y  open_distro_job_scheduler              0 0 0
        y.y.y.y  opendistro_asynchronous_search_generic 0 0 0
        y.y.y.y  refresh                                0 0 0
        y.y.y.y  search                                 0 0 0
        y.y.y.y  search_throttled                       0 0 0
        y.y.y.y  snapshot                               0 0 0
        y.y.y.y  sql-worker                             0 0 0
        y.y.y.y  system_read                            0 0 0
        y.y.y.y  system_write                           0 0 0
        y.y.y.y  warmer                                 0 0 0
        y.y.y.y  write                                  0 0 0

Workaround suggested :

Full restart of ES cluster

Links Referred/Used before raising this issue :

  https://discuss.opendistrocommunity.dev/t/killed-active-master-not-being-removed-from-the-cluster-state/5011
  https://github.com/elastic/elasticsearch/issues/56979
  https://github.com/elastic/elasticsearch/issues/80525

To Reproduce

Currently there is no specific reproducing steps to simulate the node rejoin failure but it is occurring one or the other times when rolling restarts were performed

Expected behavior

All nodes should be rejoining the cluster

Plugins

[]

Host/Environment (please complete the following information):

Additional context Note :

  1. Terminologies :
      x.x.x.x refers to Old leader IP
      y.y.y.y refers to new leader IP
      a.a.a.a, b.b.b.b, c.c.c.c refers to the members in ES cluster

      Here, b.b.b.b and c.c.c.c are data nodes and c.c.c.c is unreachable

  2. Out of 5 nodes, one node is not reachable now at the time of collecting the logs and this is a data node
  3. The issue occurs only in 5 node setup and not in 3 node setup 
anasalkouz commented 2 years ago

Is this reproducible with OpenSearch? are you trying to upgrade to OpenSearch and which version?

GaneshJayaram97 commented 2 years ago

Hi @anasalkouz,

We are using opendistro for elasticsearch and we will be moving to opensearch soon as opendistro has been deprecated in favour of opensearch

Opendistro version : 1.13.0 ES version : 7.10.2

GaneshJayaram97 commented 2 years ago

Hi @anasalkouz,

The issue is again reproduced in another setup

Attaching the details along with thread and heap dump

Details

Note :

  1. 5 Thread Dump samples taken from new leader instance (samples collected with few seconds difference)
  2. 5 Thread Dump samples taken from rejoin failed instance (samples collected with few seconds difference)
GaneshJayaram97 commented 2 years ago

Hi @anasalkouz,

Any updates on this issue ?

GaneshJayaram97 commented 2 years ago

Hi @anasalkouz,

Kindly provide any updates

Thanks, Ganesh

McAndersDK commented 11 months ago

@GaneshJayaram97 did you ever figure out what the problem were?

GaneshJayaram97 commented 11 months ago

@McAndersDK, Nope. I have just shared my observations and attached few logs, API output, thread dump samples that could be handy for figuring out the root cause.

sandeshkr419 commented 1 month ago

@GaneshJayaram97 With multiple cluster manager election improvements which have gone into previous few releases, I am curious to check if this is still an issue with latest OpenSearch 2.16?