es-master fails: Liveness probe failed: dial tcp 172.17.0.4:9300: getsockopt: connection refused

pires / kubernetes-elasticsearch-cluster

Elasticsearch cluster on top of Kubernetes made easy.

Apache License 2.0

1.51k stars 687 forks source link

es-master fails: Liveness probe failed: dial tcp 172.17.0.4:9300: getsockopt: connection refused #175

Closed viglia closed 6 years ago

viglia commented 6 years ago

Hi,

I'm having a problem launching the es-master deployment.

It keeps restarting and if I check the pod description the failure appears to be: Liveness probe failed: dial tcp 172.17.0.4:9300: getsockopt: connection refused

After checking all the issues, open and closed (plus the documentation), I've made the following tests

1) giving minikube more memory

2) add

name: "NETWORK_HOST" value: "eth0:ipv4"

3) add

name: "NETWORK_HOST" value: "eth0"

None of them seems to work and the log does not help me too much.

Additional info about the environment: minikube cluster running on RHEL7

mbert commented 6 years ago

I had the same problem. It seems like on a rather slow cluster the liveness probe strikes too fast. Adding an initialDelaySeconds setting under the liveness probe in es-master.yaml and es-client.yaml helped me:

    livenessProbe:
      tcpSocket:
        port: transport
      initialDelaySeconds: 30

viglia commented 6 years ago

@mbert thank you!

tarr11 commented 6 years ago

Interesting, I had the same problem (trying this on GKE with a 3 n1-standard-2 cluster)

Using a 30 second initialDelay seconds caused 2 of 3 masters to work, and the third would constantly restart and inevitably enter a CrashLoopRestart cycle. (I also set NETWORK_HOST)

Setting it to 60 second delay seemed to solve the problem. I think there's some sort of master discovery algorithm that needs a bit of time to boostrap.

xxf09th commented 6 years ago

@tarr11 thx! It works