pires / kubernetes-elasticsearch-cluster

Elasticsearch cluster on top of Kubernetes made easy.
Apache License 2.0
1.51k stars 690 forks source link

Re-election takes over 30 seconds when deleting master pod (but fast when killing the process directly) #231

Open nabadger opened 5 years ago

nabadger commented 5 years ago

Hi,

I've been struggling to understand what's causing this, so wonder if you can offer any help. This is something that I can re-reproduce across various kubernetes-elasticsearch repo's (including the operators as well). It's also something I can re-reproduce on various clusters.

I'd really like to know if this is expected behaviour or not...

My Configuration:

I've setup ES using the example on the README (this is a 3 node kubernetes cluster running v1.11.3)

kubectl create -f es-discovery-svc.yaml
kubectl create -f es-svc.yaml
kubectl create -f es-master.yaml
kubectl rollout status -f es-master.yaml

kubectl create -f es-ingest-svc.yaml
kubectl create -f es-ingest.yaml
kubectl rollout status -f es-ingest.yaml

kubectl create -f es-data.yaml
kubectl rollout status -f es-data.yaml

This all works fine and brings up the ES cluster as expected.

I monitor the state of the ES master by execing into an ingestion pod (kubectl exec ...) and running:

watch curl localhost:9200/_cat/nodes

I then kubectl exec into the pod running the ES master and run kill 1 (the java process).

This starts the master re-election process straight away, and typically a new master is elected in 2-3seconds (expected right?).

If on the otherhand, I delete the pod which is running the master (kubectl delete pod <master pod>), re-election always takes over 30 seconds.

At this point the cURL command also hangs until the new master is available. I don't think this is expected right, as it essentially means the cluster is unavailable to use.

I've also tried playing with various kubernetes pod-termination timeouts, along with the ES fault-detection timeouts, but can't seem to work around the problem.

Do you know if this is expected behaviour? If so, how do people actually upgrade the masters with a short-period of downtime? We also run ES outside of Kubernetes, and master re-election happens in under 3s (because we're essentially just doing SIGTERM on the parent process like kill 1) - hence I feel this is a Kubernetes thing.

I've added 2 sets of logs

1 - Logs with kill 1 on the ES java process

# kill process in container, failover is quick

es-master-d4d46765-v9sbw es-master    {es-ingest-84fd6b464-5dbtn}{ki0kyZrQTWGwiiwMU3FmqQ}{M6tOaextQYiNRI8vwJ4DjQ}{10.244.1.3}{10.244.1.3:9300}{xpack.installed=true}
es-master-d4d46765-v9sbw es-master
es-master-d4d46765-krkhr es-master [2018-09-19T17:54:56,501][INFO ][o.e.d.z.ZenDiscovery     ] [es-master-d4d46765-krkhr] master_left [{es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true}], reason [shut_down]
es-master-d4d46765-krkhr es-master [2018-09-19T17:54:56,503][WARN ][o.e.d.z.ZenDiscovery     ] [es-master-d4d46765-krkhr] master left (reason = shut_down), current nodes: nodes:
es-master-d4d46765-krkhr es-master    {es-data-b479bcbd-wx6pg}{baYZM1pUT4Os1sDESMpdyQ}{nHFhi16LTDi2O-QS0SPVEg}{10.244.3.4}{10.244.3.4:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-ingest-84fd6b464-d5xcs}{wmFRpodsReaBKtmOenav0A}{nExqhmupRA6zOgPDkInFVA}{10.244.3.3}{10.244.3.3:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-master-d4d46765-krkhr}{tdk80Ro6QH-Nz9pGd3xkvg}{cZbioHXWTFCszKiqgHxRyg}{10.244.3.2}{10.244.3.2:9300}{xpack.installed=true}, local
es-master-d4d46765-krkhr es-master    {es-ingest-84fd6b464-5dbtn}{ki0kyZrQTWGwiiwMU3FmqQ}{M6tOaextQYiNRI8vwJ4DjQ}{10.244.1.3}{10.244.1.3:9300}{xpack.installed=true}
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:56,529][INFO ][o.e.x.w.WatcherService   ] [es-master-d4d46765-v9sbw] stopping watch service, reason [no master node]
es-master-d4d46765-krkhr es-master    {es-data-b479bcbd-brt64}{d2QLc-r1Qjy-XDuVSkXg1Q}{a0OoXCtsRryhwlq0wmyJDg}{10.244.2.4}{10.244.2.4:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true}, master
es-master-d4d46765-krkhr es-master    {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master
es-master-d4d46765-krkhr es-master [2018-09-19T17:54:56,566][INFO ][o.e.x.w.WatcherService   ] [es-master-d4d46765-krkhr] stopping watch service, reason [no master node]
es-master-d4d46765-n5646 es-master [2018-09-19T17:54:57,006][INFO ][o.e.n.Node               ] [es-master-d4d46765-n5646] stopped
es-master-d4d46765-n5646 es-master [2018-09-19T17:54:57,006][INFO ][o.e.n.Node               ] [es-master-d4d46765-n5646] closing ...
es-master-d4d46765-n5646 es-master [2018-09-19T17:54:57,054][INFO ][o.e.n.Node               ] [es-master-d4d46765-n5646] closed
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:59,612][INFO ][o.e.c.s.MasterService    ] [es-master-d4d46765-v9sbw] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master [2018-09-19T17:54:59,667][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-krkhr] detected_master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true}, reason: apply cluster state (from master [master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true} committed version [20]])
es-master-d4d46765-krkhr es-master [2018-09-19T17:54:59,687][WARN ][o.e.c.NodeConnectionsService] [es-master-d4d46765-krkhr] failed to connect to node {es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true} (tried [1] times)
es-master-d4d46765-krkhr es-master org.elasticsearch.transport.ConnectTransportException: [es-master-d4d46765-n5646][10.244.1.2:9300] connect_exception
es-master-d4d46765-krkhr es-master  at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:165) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:631) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:530) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:331) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:153) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.cluster.NodeConnectionsService$1.doRun(NodeConnectionsService.java:106) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:725) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
es-master-d4d46765-krkhr es-master  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
es-master-d4d46765-krkhr es-master  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
es-master-d4d46765-krkhr es-master Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 10.244.1.2/10.244.1.2:9300
es-master-d4d46765-krkhr es-master  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-krkhr es-master  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-krkhr es-master  ... 1 more
es-master-d4d46765-krkhr es-master Caused by: java.net.ConnectException: Connection refused
es-master-d4d46765-krkhr es-master  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-krkhr es-master  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:59,748][WARN ][o.e.d.z.PublishClusterStateAction] [es-master-d4d46765-v9sbw] publishing cluster state with version [20] failed for the following nodes: [[{es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true}]]
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:59,750][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-v9sbw] new_master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true}, reason: apply cluster state (from master [master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true} committed version [20] source [zen-disco-elected-as-master ([1] nodes joined)[, ]]])
es-master-d4d46765-krkhr es-master  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-krkhr es-master  ... 1 more
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:59,764][WARN ][o.e.c.NodeConnectionsService] [es-master-d4d46765-v9sbw] failed to connect to node {es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true} (tried [1] times)
es-master-d4d46765-v9sbw es-master org.elasticsearch.transport.ConnectTransportException: [es-master-d4d46765-n5646][10.244.1.2:9300] connect_exception
es-master-d4d46765-v9sbw es-master  at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:165) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master  at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:631) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master  at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:530) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master  at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:331) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master  at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master  at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:153) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master  at org.elasticsearch.cluster.NodeConnectionsService$1.doRun(NodeConnectionsService.java:106) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:725) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-v9sbw es-master  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
es-master-d4d46765-v9sbw es-master  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
es-master-d4d46765-v9sbw es-master  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
es-master-d4d46765-v9sbw es-master Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 10.244.1.2/10.244.1.2:9300
es-master-d4d46765-v9sbw es-master  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-v9sbw es-master  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-v9sbw es-master  ... 1 more
es-master-d4d46765-v9sbw es-master Caused by: java.net.ConnectException: Connection refused
es-master-d4d46765-v9sbw es-master  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-v9sbw es-master  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-v9sbw es-master  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-v9sbw es-master  ... 1 more
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:59,834][INFO ][o.e.c.s.MasterService    ] [es-master-d4d46765-v9sbw] zen-disco-node-failed({es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true}), reason(transport disconnected), reason: removed {{es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true},}
es-master-d4d46765-krkhr es-master [2018-09-19T17:54:59,852][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-krkhr] removed {{es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true},}, reason: apply cluster state (from master [master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true} committed version [21]])
es-master-d4d46765-v9sbw es-master [2018-09-19T17:54:59,932][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-v9sbw] removed {{es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true},}, reason: apply cluster state (from master [master {es-master-d4d46765-v9sbw}{imqhDhPEQJqUIkrVaz8I_g}{VvaRmrjnTCGMf0nOAzJa7A}{10.244.2.3}{10.244.2.3:9300}{xpack.installed=true} committed version [21] source [zen-disco-node-failed({es-master-d4d46765-n5646}{UWyJX9sYRH6xR2m_4vPcvw}{GLmdZUi3TEOiRGneXGUxSw}{10.244.1.2}{10.244.1.2:9300}{xpack.installed=true}), reason(transport disconnected)]])

2 - Logs with kubectl delete pod on the pod hosting the master ES instance

es-master-d4d46765-6x4bp es-master [2018-09-19T18:07:01,271][INFO ][o.e.n.Node               ] [es-master-d4d46765-6x4bp] stopping ...
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:01,261][INFO ][o.e.d.z.ZenDiscovery     ] [es-master-d4d46765-vh5tp] master_left [{es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}], reason [shut_down]
es-master-d4d46765-krkhr es-master [2018-09-19T18:07:01,255][INFO ][o.e.d.z.ZenDiscovery     ] [es-master-d4d46765-krkhr] master_left [{es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}], reason [shut_down]
es-master-d4d46765-6x4bp es-master [2018-09-19T18:07:01,276][INFO ][o.e.x.w.WatcherService   ] [es-master-d4d46765-6x4bp] stopping watch service, reason [shutdown initiated]
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:01,263][WARN ][o.e.d.z.ZenDiscovery     ] [es-master-d4d46765-vh5tp] master left (reason = shut_down), current nodes: nodes:
es-master-d4d46765-vh5tp es-master    {es-data-b479bcbd-wx6pg}{baYZM1pUT4Os1sDESMpdyQ}{nHFhi16LTDi2O-QS0SPVEg}{10.244.3.4}{10.244.3.4:9300}{xpack.installed=true}
es-master-d4d46765-vh5tp es-master    {es-ingest-84fd6b464-d5xcs}{wmFRpodsReaBKtmOenav0A}{nExqhmupRA6zOgPDkInFVA}{10.244.3.3}{10.244.3.3:9300}{xpack.installed=true}
es-master-d4d46765-vh5tp es-master    {es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}, master
es-master-d4d46765-vh5tp es-master    {es-ingest-84fd6b464-5dbtn}{ki0kyZrQTWGwiiwMU3FmqQ}{M6tOaextQYiNRI8vwJ4DjQ}{10.244.1.3}{10.244.1.3:9300}{xpack.installed=true}
es-master-d4d46765-vh5tp es-master    {es-data-b479bcbd-brt64}{d2QLc-r1Qjy-XDuVSkXg1Q}{a0OoXCtsRryhwlq0wmyJDg}{10.244.2.4}{10.244.2.4:9300}{xpack.installed=true}
es-master-d4d46765-vh5tp es-master    {es-master-d4d46765-krkhr}{tdk80Ro6QH-Nz9pGd3xkvg}{cZbioHXWTFCszKiqgHxRyg}{10.244.3.2}{10.244.3.2:9300}{xpack.installed=true}
es-master-d4d46765-vh5tp es-master    {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true}, local
es-master-d4d46765-vh5tp es-master
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:01,276][INFO ][o.e.x.w.WatcherService   ] [es-master-d4d46765-vh5tp] stopping watch service, reason [no master node]
es-master-d4d46765-krkhr es-master [2018-09-19T18:07:01,256][WARN ][o.e.d.z.ZenDiscovery     ] [es-master-d4d46765-krkhr] master left (reason = shut_down), current nodes: nodes:
es-master-d4d46765-krkhr es-master    {es-data-b479bcbd-wx6pg}{baYZM1pUT4Os1sDESMpdyQ}{nHFhi16LTDi2O-QS0SPVEg}{10.244.3.4}{10.244.3.4:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-data-b479bcbd-brt64}{d2QLc-r1Qjy-XDuVSkXg1Q}{a0OoXCtsRryhwlq0wmyJDg}{10.244.2.4}{10.244.2.4:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}, master
es-master-d4d46765-krkhr es-master    {es-ingest-84fd6b464-d5xcs}{wmFRpodsReaBKtmOenav0A}{nExqhmupRA6zOgPDkInFVA}{10.244.3.3}{10.244.3.3:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-ingest-84fd6b464-5dbtn}{ki0kyZrQTWGwiiwMU3FmqQ}{M6tOaextQYiNRI8vwJ4DjQ}{10.244.1.3}{10.244.1.3:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master    {es-master-d4d46765-krkhr}{tdk80Ro6QH-Nz9pGd3xkvg}{cZbioHXWTFCszKiqgHxRyg}{10.244.3.2}{10.244.3.2:9300}{xpack.installed=true}, local
es-master-d4d46765-krkhr es-master
es-master-d4d46765-krkhr es-master [2018-09-19T18:07:01,258][INFO ][o.e.x.w.WatcherService   ] [es-master-d4d46765-krkhr] stopping watch service, reason [no master node]
es-master-d4d46765-6x4bp es-master [2018-09-19T18:07:01,847][INFO ][o.e.n.Node               ] [es-master-d4d46765-6x4bp] stopped
es-master-d4d46765-6x4bp es-master [2018-09-19T18:07:01,848][INFO ][o.e.n.Node               ] [es-master-d4d46765-6x4bp] closing ...
es-master-d4d46765-6x4bp es-master [2018-09-19T18:07:01,886][INFO ][o.e.n.Node               ] [es-master-d4d46765-6x4bp] closed
+ es-master-d4d46765-fpcmg › es-master
- es-master-d4d46765-6x4bp
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:04,384][INFO ][o.e.c.s.MasterService    ] [es-master-d4d46765-vh5tp] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true}
es-master-d4d46765-krkhr es-master [2018-09-19T18:07:04,408][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-krkhr] detected_master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true}, reason: apply cluster state (from master [master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true} committed version [35]])
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:07,184][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] initializing ...
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:07,398][INFO ][o.e.e.NodeEnvironment    ] [es-master-d4d46765-fpcmg] using [1] data paths, mounts [[/data (/dev/vda1)]], net usable_space [75gb], net total_space [77.3gb], types [ext4]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:07,400][INFO ][o.e.e.NodeEnvironment    ] [es-master-d4d46765-fpcmg] heap size [247.5mb], compressed ordinary object pointers [true]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:07,401][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] node name [es-master-d4d46765-fpcmg], node ID [L0BKnWY2RRmU8A3EtmY3VQ]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:07,402][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] version[6.3.2], pid[1], build[default/tar/053779d/2018-07-20T05:20:23.451332Z], OS[Linux/4.15.0-30-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_171/25.171-b11]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:07,402][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] JVM arguments [-XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+DisableExplicitGC, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Xms256m, -Xmx256m, -Des.path.home=/elasticsearch, -Des.path.conf=/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=tar]
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:08,808][WARN ][o.e.c.NodeConnectionsService] [es-master-d4d46765-vh5tp] failed to connect to node {es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true} (tried [1] times)
es-master-d4d46765-vh5tp es-master org.elasticsearch.transport.ConnectTransportException: [es-master-d4d46765-6x4bp][10.244.2.5:9300] connect_exception
es-master-d4d46765-vh5tp es-master  at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:165) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master  at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:631) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master  at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:530) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master  at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:331) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master  at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master  at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:153) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master  at org.elasticsearch.cluster.NodeConnectionsService$ConnectionChecker.doRun(NodeConnectionsService.java:180) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:725) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-vh5tp es-master  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
es-master-d4d46765-vh5tp es-master  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
es-master-d4d46765-vh5tp es-master  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
es-master-d4d46765-vh5tp es-master Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.244.2.5/10.244.2.5:9300
es-master-d4d46765-vh5tp es-master  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-vh5tp es-master  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-vh5tp es-master  ... 1 more
es-master-d4d46765-vh5tp es-master Caused by: java.net.NoRouteToHostException: Host is unreachable
es-master-d4d46765-vh5tp es-master  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-vh5tp es-master  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-vh5tp es-master  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-vh5tp es-master  ... 1 more
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:10,480][WARN ][o.e.d.c.s.Settings       ] [http.enabled] setting was deprecated in Elasticsearch and will be removed in a future release! See the breaking changes documentation for the next major version.
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,189][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [aggs-matrix-stats]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,193][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [analysis-common]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,194][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [ingest-common]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,194][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [lang-expression]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,195][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [lang-mustache]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,197][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [lang-painless]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,197][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [mapper-extras]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,197][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [parent-join]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,197][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [percolator]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,197][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [rank-eval]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,198][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [reindex]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,198][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [repository-url]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,198][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [transport-netty4]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,198][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [tribe]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,198][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-core]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,198][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-deprecation]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,199][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-graph]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,201][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-logstash]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,201][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-monitoring]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,201][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-rollup]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,201][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-security]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,201][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-sql]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,201][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-upgrade]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,202][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] loaded module [x-pack-watcher]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:13,202][INFO ][o.e.p.PluginsService     ] [es-master-d4d46765-fpcmg] no plugins loaded
es-master-d4d46765-krkhr es-master [2018-09-19T18:07:21,888][WARN ][o.e.c.NodeConnectionsService] [es-master-d4d46765-krkhr] failed to connect to node {es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true} (tried [1] times)
es-master-d4d46765-krkhr es-master org.elasticsearch.transport.ConnectTransportException: [es-master-d4d46765-6x4bp][10.244.2.5:9300] connect_exception
es-master-d4d46765-krkhr es-master  at org.elasticsearch.transport.TcpChannel.awaitConnected(TcpChannel.java:165) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:631) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:530) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:331) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:153) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.cluster.NodeConnectionsService$1.doRun(NodeConnectionsService.java:106) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:725) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.2.jar:6.3.2]
es-master-d4d46765-krkhr es-master  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
es-master-d4d46765-krkhr es-master  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
es-master-d4d46765-krkhr es-master  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
es-master-d4d46765-krkhr es-master Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: 10.244.2.5/10.244.2.5:9300
es-master-d4d46765-krkhr es-master  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-krkhr es-master  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-krkhr es-master  ... 1 more
es-master-d4d46765-krkhr es-master Caused by: java.net.NoRouteToHostException: Host is unreachable
es-master-d4d46765-krkhr es-master  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
es-master-d4d46765-krkhr es-master  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) ~[?:?]
es-master-d4d46765-krkhr es-master  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]
es-master-d4d46765-krkhr es-master  ... 1 more
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:22,474][INFO ][o.e.x.s.a.s.FileRolesStore] [es-master-d4d46765-fpcmg] parsed [0] roles from file [/elasticsearch/config/roles.yml]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:24,768][INFO ][o.e.d.DiscoveryModule    ] [es-master-d4d46765-fpcmg] using discovery type [zen]
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:26,175][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] initialized
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:26,176][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] starting ...
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:26,563][INFO ][o.e.t.TransportService   ] [es-master-d4d46765-fpcmg] publish_address {10.244.2.6:9300}, bound_addresses {10.244.2.6:9300}
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:26,594][INFO ][o.e.b.BootstrapChecks    ] [es-master-d4d46765-fpcmg] bound or publishing to a non-loopback address, enforcing bootstrap checks
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:34,398][WARN ][o.e.d.z.PublishClusterStateAction] [es-master-d4d46765-vh5tp] timed out waiting for all nodes to process published state [35] (timeout [30s], pending nodes: [{es-data-b479bcbd-brt64}{d2QLc-r1Qjy-XDuVSkXg1Q}{a0OoXCtsRryhwlq0wmyJDg}{10.244.2.4}{10.244.2.4:9300}{xpack.installed=true}])
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:34,399][WARN ][o.e.d.z.PublishClusterStateAction] [es-master-d4d46765-vh5tp] publishing cluster state with version [35] failed for the following nodes: [[{es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}]]
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:34,402][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-vh5tp] new_master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true}, reason: apply cluster state (from master [master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true} committed version [35] source [zen-disco-elected-as-master ([1] nodes joined)[, ]]])
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:44,649][WARN ][o.e.c.s.MasterService    ] [es-master-d4d46765-vh5tp] cluster state update task [zen-disco-elected-as-master ([1] nodes joined)[, ]] took [40.2s] above the warn threshold of 30s
es-master-d4d46765-krkhr es-master [2018-09-19T18:07:44,656][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-krkhr] removed {{es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true},}, reason: apply cluster state (from master [master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true} committed version [36]])
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:44,654][INFO ][o.e.c.s.MasterService    ] [es-master-d4d46765-vh5tp] zen-disco-node-failed({es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}), reason(transport disconnected), reason: removed {{es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true},}
es-master-d4d46765-vh5tp es-master [2018-09-19T18:07:44,720][INFO ][o.e.c.s.ClusterApplierService] [es-master-d4d46765-vh5tp] removed {{es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true},}, reason: apply cluster state (from master [master {es-master-d4d46765-vh5tp}{oXAiZsyWTaCmLlyIbnNTOQ}{4-VgFjRoSUO5SfSwQVRxQA}{10.244.1.4}{10.244.1.4:9300}{xpack.installed=true} committed version [36] source [zen-disco-node-failed({es-master-d4d46765-6x4bp}{7-1vmi94RFWzEke2kEiZDw}{yzuYqVVnRJGKOZnDcQQS7w}{10.244.2.5}{10.244.2.5:9300}{xpack.installed=true}), reason(transport disconnected)]])
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:56,679][WARN ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] timed out while waiting for initial discovery state - timeout: 30s
es-master-d4d46765-fpcmg es-master [2018-09-19T18:07:56,680][INFO ][o.e.n.Node               ] [es-master-d4d46765-fpcmg] started

In the set of of logs where we kubectl delete pod, it looks like master re-election happens twice.

nabadger commented 5 years ago

There's some obvious information in our logs actually that helps:

[o.e.d.z.PublishClusterStateAction] [es-master-d4d46765-vh5tp] timed out waiting for all nodes to process published state [35] (timeout [30s], pending nodes: [{es-data-b479bcbd-brt64}{d2QLc-r1Qjy-XDuVSkXg1Q}{a0OoXCtsRryhwlq0wmyJDg}{10.244.2.4}{10.244.2.4:9300}{xpack.installed=true}])

This 30s timeout.

We think it's related to this: https://discuss.elastic.co/t/timed-out-waiting-for-all-nodes-to-process-published-state-and-cluster-unavailability/138590

nabadger commented 5 years ago

Adding an extra sleep after trapping the sigterm seems to resolve the issue for us (see above merge if you're interested).