pires / kubernetes-elasticsearch-cluster

Elasticsearch cluster on top of Kubernetes made easy.
Apache License 2.0
1.51k stars 690 forks source link

Help needed - Liveness probe failed: dial tcp 10.8.0.6:9300: getsockopt: connection refused #161

Closed BezVezeE closed 6 years ago

BezVezeE commented 6 years ago

Hi there,

i have issue trying to setup my kubernetes es cluster, when i deploy the pods and services everything going well but when i try to curl the client pod it give me connection error

im running this on a google container engine (kubernetes engine) on kubernetes version 1.8.4 i also tried this to run on version 1.7.4 also have the same problem,

i tried with differnt container version both 6.1.1 and 5.6.4 but still the same problem

NAME                          TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)          AGE
svc/elasticsearch             LoadBalancer   10.11.246.242   x.x.x.x          9200:30381/TCP   19m
svc/elasticsearch-discovery   ClusterIP      10.11.241.242   <none>           9300/TCP         19m

curl http://10.11.246.242:9200
curl: (7) Failed to connect to 10.11.246.242 port 9200: Connection timed out

kubectl describe pods es-client-7dcb955598-cdbfv

Name:           es-client-7dcb955598-cdbfv
Namespace:      default
Node:           gke-cluster-1-test-default-pool-626ce927-1jjt/10.128.0.5
Start Time:     Thu, 21 Dec 2017 15:52:45 +0100
Labels:         component=elasticsearch
                pod-template-hash=3876511154
                role=client
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"es-client-7dcb955598","uid":"9acc9f68-e65e-11e7-b697-42010a8002e...
                kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container es-client; cpu request for init container init-sysctl
Status:         Running
IP:             10.8.0.6
Created By:     ReplicaSet/es-client-7dcb955598
Controlled By:  ReplicaSet/es-client-7dcb955598
Init Containers:
  init-sysctl:
    Container ID:  docker://21945145307bca6a9ce5902a8fd14c459ba99b2055c66205811460d23e6b9c5d
    Image:         busybox:1.27.2
    Image ID:      docker-pullable://busybox@sha256:91ef6c1c52b166be02645b8efee30d1ee65362024f7da41c404681561734c465
    Port:          <none>
    Command:
      sysctl
      -w
      vm.max_map_count=262144
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 21 Dec 2017 15:52:46 +0100
      Finished:     Thu, 21 Dec 2017 15:52:46 +0100
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-7rc2c (ro)
Containers:
  es-client:
    Container ID:   docker://9cdd5fea52066f095f85262dca8477434f33e9f714bd000b07ff7318f8b6cf6d
    Image:          quay.io/pires/docker-elasticsearch-kubernetes:5.6.4
    Image ID:       docker-pullable://quay.io/pires/docker-elasticsearch-kubernetes@sha256:efbc82bc4f94d0ead4574948cfb99bbffa5e4bd5709361ee8f8abd9c5406ddb9
    Ports:          9200/TCP, 9300/TCP
    State:          Running
      Started:      Thu, 21 Dec 2017 15:52:47 +0100
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      100m
    Liveness:   tcp-socket :transport delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:http/_cluster/health delay=20s timeout=5s period=10s #success=1 #failure=3
    Environment:
      NAMESPACE:     default (v1:metadata.namespace)
      NODE_NAME:     es-client-7dcb955598-cdbfv (v1:metadata.name)
      CLUSTER_NAME:  myesdb
      NODE_MASTER:   false
      NODE_DATA:     false
      HTTP_ENABLE:   true
      ES_JAVA_OPTS:  -Xms256m -Xmx256m
      NETWORK_HOST:  _site_,_lo_
    Mounts:
      /data from storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-7rc2c (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          True
  PodScheduled   True
Volumes:
  storage:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  default-token-7rc2c:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-7rc2c
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
                 node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age                From                                                    Message
  ----     ------                 ----               ----                                                    -------
  Normal   Scheduled              10m                default-scheduler                                       Successfully assigned es-client-7dcb955598-cdbfv to gke-cluster-1-test-default-pool-626ce927-1jjt
  Normal   SuccessfulMountVolume  10m                kubelet, gke-cluster-1-test-default-pool-626ce927-1jjt  MountVolume.SetUp succeeded for volume "storage"
  Normal   SuccessfulMountVolume  10m                kubelet, gke-cluster-1-test-default-pool-626ce927-1jjt  MountVolume.SetUp succeeded for volume "default-token-7rc2c"
  Normal   Pulled                 10m                kubelet, gke-cluster-1-test-default-pool-626ce927-1jjt  Container image "busybox:1.27.2" already present on machine
  Normal   Created                10m                kubelet, gke-cluster-1-test-default-pool-626ce927-1jjt  Created container
  Normal   Started                10m                kubelet, gke-cluster-1-test-default-pool-626ce927-1jjt  Started container
  Normal   Pulled                 10m                kubelet, gke-cluster-1-test-default-pool-626ce927-1jjt  Container image "quay.io/pires/docker-elasticsearch-kubernetes:5.6.4" already present on machine
  Normal   Created                10m                kubelet, gke-cluster-1-test-default-pool-626ce927-1jjt  Created container
  Normal   Started                10m                kubelet, gke-cluster-1-test-default-pool-626ce927-1jjt  Started container
  Warning  Unhealthy              10m (x2 over 10m)  kubelet, gke-cluster-1-test-default-pool-626ce927-1jjt  Liveness probe failed: dial tcp 10.8.0.6:9300: getsockopt: connection refused
  Warning  Unhealthy              10m                kubelet, gke-cluster-1-test-default-pool-626ce927-1jjt  Readiness probe failed: Get http://10.8.0.6:9200/_cluster/health: dial tcp 10.8.0.6:9200: getsockopt: connection refused

NAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/es-client   2         2         2            2           5m
deploy/es-data     2         2         2            2           4m
deploy/es-master   3         3         3            3           5m

NAME                            READY     STATUS    RESTARTS   AGE
po/es-client-7dcb955598-cdbfv   1/1       Running   0          5m
po/es-client-7dcb955598-pkn8f   1/1       Running   0          5m
po/es-data-5bc6fbf549-8h8rz     1/1       Running   0          4m
po/es-data-5bc6fbf549-ksvxv     1/1       Running   0          4m
po/es-master-6bc55fb649-56hp9   1/1       Running   0          5m
po/es-master-6bc55fb649-g5f7b   1/1       Running   0          5m
po/es-master-6bc55fb649-l4v8b   1/1       Running   0          5m

[2017-12-21T14:52:51,103][INFO ][o.e.n.Node               ] [es-master-6bc55fb649-56hp9] initializing ...
[2017-12-21T14:52:51,540][INFO ][o.e.e.NodeEnvironment    ] [es-master-6bc55fb649-56hp9] using [1] data paths, mounts [[/data (/dev/sda1)]], net usable_space [1.7gb], net total_space [5.6gb], spins? [possibly], types [ext4]
[2017-12-21T14:52:51,547][INFO ][o.e.e.NodeEnvironment    ] [es-master-6bc55fb649-56hp9] heap size [247.6mb], compressed ordinary object pointers [true]
[2017-12-21T14:52:51,548][INFO ][o.e.n.Node               ] [es-master-6bc55fb649-56hp9] node name [es-master-6bc55fb649-56hp9], node ID [jsDsh2o7SrySwFQGBnHmgA]
[2017-12-21T14:52:51,552][INFO ][o.e.n.Node               ] [es-master-6bc55fb649-56hp9] version[5.6.4], pid[1], build[8bbedf5/2017-10-31T18:55:38.105Z], OS[Linux/4.4.86+/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_131/25.131-b11]
[2017-12-21T14:52:51,554][INFO ][o.e.n.Node               ] [es-master-6bc55fb649-56hp9] JVM arguments [-XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+DisableExplicitGC, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Xms256m, -Xmx256m, -Des.path.home=/elasticsearch]
[2017-12-21T14:52:56,073][INFO ][o.e.p.PluginsService     ] [es-master-6bc55fb649-56hp9] loaded module [aggs-matrix-stats]
[2017-12-21T14:52:56,079][INFO ][o.e.p.PluginsService     ] [es-master-6bc55fb649-56hp9] loaded module [ingest-common]
[2017-12-21T14:52:56,082][INFO ][o.e.p.PluginsService     ] [es-master-6bc55fb649-56hp9] loaded module [lang-expression]
[2017-12-21T14:52:56,083][INFO ][o.e.p.PluginsService     ] [es-master-6bc55fb649-56hp9] loaded module [lang-groovy]
[2017-12-21T14:52:56,083][INFO ][o.e.p.PluginsService     ] [es-master-6bc55fb649-56hp9] loaded module [lang-mustache]
[2017-12-21T14:52:56,083][INFO ][o.e.p.PluginsService     ] [es-master-6bc55fb649-56hp9] loaded module [lang-painless]
[2017-12-21T14:52:56,083][INFO ][o.e.p.PluginsService     ] [es-master-6bc55fb649-56hp9] loaded module [parent-join]
[2017-12-21T14:52:56,083][INFO ][o.e.p.PluginsService     ] [es-master-6bc55fb649-56hp9] loaded module [percolator]
[2017-12-21T14:52:56,083][INFO ][o.e.p.PluginsService     ] [es-master-6bc55fb649-56hp9] loaded module [reindex]
[2017-12-21T14:52:56,084][INFO ][o.e.p.PluginsService     ] [es-master-6bc55fb649-56hp9] loaded module [transport-netty3]
[2017-12-21T14:52:56,085][INFO ][o.e.p.PluginsService     ] [es-master-6bc55fb649-56hp9] loaded module [transport-netty4]
[2017-12-21T14:52:56,093][INFO ][o.e.p.PluginsService     ] [es-master-6bc55fb649-56hp9] no plugins loaded
[2017-12-21T14:53:04,906][INFO ][o.e.d.DiscoveryModule    ] [es-master-6bc55fb649-56hp9] using discovery type [zen]
[2017-12-21T14:53:07,689][INFO ][o.e.n.Node               ] [es-master-6bc55fb649-56hp9] initialized
[2017-12-21T14:53:07,697][INFO ][o.e.n.Node               ] [es-master-6bc55fb649-56hp9] starting ...
[2017-12-21T14:53:08,444][INFO ][o.e.t.TransportService   ] [es-master-6bc55fb649-56hp9] publish_address {10.8.0.5:9300}, bound_addresses {10.8.0.5:9300}
[2017-12-21T14:53:08,508][INFO ][o.e.b.BootstrapChecks    ] [es-master-6bc55fb649-56hp9] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-12-21T14:53:11,764][INFO ][o.e.c.s.ClusterService   ] [es-master-6bc55fb649-56hp9] detected_master {es-master-6bc55fb649-l4v8b}{Xrms5F3bTmyxvnqnYexL_g}{K-tPNgBbRG6LmgZPCxlCRw}{10.8.2.7}{10.8.2.7:9300}, added {{es-master-6bc55fb649-l4v8b}{Xrms5F3bTmyxvnqnYexL_g}{K-tPNgBbRG6LmgZPCxlCRw}{10.8.2.7}{10.8.2.7:9300},{es-master-6bc55fb649-g5f7b}{nZN4k5ciSSOhKAJt76RtHw}{B11NzSdZQHOcC36zSBnc_A}{10.8.1.8}{10.8.1.8:9300},}, reason: zen-disco-receive(from master [master {es-master-6bc55fb649-l4v8b}{Xrms5F3bTmyxvnqnYexL_g}{K-tPNgBbRG6LmgZPCxlCRw}{10.8.2.7}{10.8.2.7:9300} committed version [3]])
[2017-12-21T14:53:11,863][INFO ][o.e.n.Node               ] [es-master-6bc55fb649-56hp9] started
[2017-12-21T14:53:13,263][INFO ][o.e.c.s.ClusterService   ] [es-master-6bc55fb649-56hp9] added {{es-client-7dcb955598-pkn8f}{JHaUHQaoQjSjWia1OJz7ew}{qA_LtpvTRKu0OcUSvV4X6Q}{10.8.1.9}{10.8.1.9:9300},}, reason: zen-disco-receive(from master [master {es-master-6bc55fb649-l4v8b}{Xrms5F3bTmyxvnqnYexL_g}{K-tPNgBbRG6LmgZPCxlCRw}{10.8.2.7}{10.8.2.7:9300} committed version [4]])
[2017-12-21T14:53:13,369][INFO ][o.e.c.s.ClusterService   ] [es-master-6bc55fb649-56hp9] added {{es-client-7dcb955598-cdbfv}{s5eLWN-GRkeMtnrhaj-oMg}{vUspFi_NSiGlGW5fLFfw7w}{10.8.0.6}{10.8.0.6:9300},}, reason: zen-disco-receive(from master [master {es-master-6bc55fb649-l4v8b}{Xrms5F3bTmyxvnqnYexL_g}{K-tPNgBbRG6LmgZPCxlCRw}{10.8.2.7}{10.8.2.7:9300} committed version [5]])
[2017-12-21T14:53:38,742][INFO ][o.e.c.s.ClusterService   ] [es-master-6bc55fb649-56hp9] added {{es-data-5bc6fbf549-ksvxv}{MQV4CX5BQs-1xxfcVZtu2w}{8YJ4Li_ITP-E7wSMEgYaww}{10.8.0.7}{10.8.0.7:9300},}, reason: zen-disco-receive(from master [master {es-master-6bc55fb649-l4v8b}{Xrms5F3bTmyxvnqnYexL_g}{K-tPNgBbRG6LmgZPCxlCRw}{10.8.2.7}{10.8.2.7:9300} committed version [6]])
[2017-12-21T14:53:39,693][INFO ][o.e.c.s.ClusterService   ] [es-master-6bc55fb649-56hp9] added {{es-data-5bc6fbf549-8h8rz}{tbwc5WMpQr2S416UY0bmKw}{D4omrpkESTeoksdAHjuEqw}{10.8.1.10}{10.8.1.10:9300},}, reason: zen-disco-receive(from master [master {es-master-6bc55fb649-l4v8b}{Xrms5F3bTmyxvnqnYexL_g}{K-tPNgBbRG6LmgZPCxlCRw}{10.8.2.7}{10.8.2.7:9300} committed version [7]])
BezVezeE commented 6 years ago

i think i got it, it works fine

but still this errors bothers me,

Warning  Unhealthy              10m (x2 over 10m)  kubelet, gke-cluster-1-test-default-pool-626ce927-1jjt  Liveness probe failed: dial tcp 10.8.0.6:9300: getsockopt: connection refused
  Warning  Unhealthy              10m                kubelet, gke-cluster-1-test-default-pool-626ce927-1jjt  Readiness probe failed: Get http://10.8.0.6:9200/_cluster/health: dial tcp 10.8.0.6:9200: getsockopt: connection refused

any reason why this happens and what is the problem,

gilblau commented 6 years ago

Can you please explain how you resolved the issue, I am facing the same issue!

pires commented 6 years ago

This is a networking issue! The kubelet cannot access 10.8.0.6 ports 9200 or 9300. Maybe something to do with firewall rules or some NetworkPolicy you may have in place. Or maybe you changed the ports Elasticsearch binds to.

mfamador commented 6 years ago

Hi, same issue here. Es-clients keep restarting with "Liveness probe failed: dial tcp 10.1.0.21:9300: getsockopt: connection refused". Tried with Minikube and Kubernetes in Docker for Mac Edge. Any help would be appreciated. Thank you

pires commented 6 years ago

Maybe something related to how those solutions do networking and maybe need some changes to the network device it will bind on. See the troubleshooting section for pointers.

mfamador commented 6 years ago

That was it, sorry I missed it! Just changed NETWORK_HOST to "eth0:ipv4" and it's working perfectly! Thanks a lot!

pires commented 6 years ago

Great @mfamador! 🎉

syafiqFiqq commented 6 years ago

hi @mfamador , i got the exact problem. but mine is when deploying es-master. do you know how can i proceed? @pires do you have an idea? Does this related to NETWORK_HOST issue also?

my environment is as below: OS : ubuntu Kubernetes : version 1.7.12

root@node1:~/ kubectl --version
Kubernetes v1.7.12+coreos.0
root@node1:~/ kubectl get pods
NAME                                READY     STATUS    RESTARTS   AGE
es-master-2519959699-0l7v4          1/1       Running   0          32m
es-master-2519959699-j95l7          1/1       Running   1          32m
es-master-2519959699-zxb0h          1/1       Running   0          32m
heketi-deployment-309687121-7qn9l   1/1       Running   0          5h
root@node1:~/ kubectl describe pods es-master-2519959699-0l7v4
Name:       es-master-2519959699-0l7v4
Namespace:  default
Node:       node5/192.168.0.115
Start Time: Fri, 06 Apr 2018 16:17:43 +0800
Labels:     component=elasticsearch
        pod-template-hash=2519959699
        role=master
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"es-master-2519959699","uid":"fad9d18a-3972-11e8-b36c-080027cd863...
Status:     Running
IP:     10.233.108.2
Created By: ReplicaSet/es-master-2519959699
Controlled By:  ReplicaSet/es-master-2519959699
Init Containers:
  init-sysctl:
    Container ID:   docker://6a7a62487575327f313c85442d4c3a49b2eeb29e6da8140cce127e3245b9c74a
    Image:      busybox:1.27.2
    Image ID:       docker-pullable://busybox@sha256:bbc3a03235220b170ba48a157dd097dd1379299370e1ed99ce976df0355d24f0
    Port:       <none>
    Command:
      sysctl
      -w
      vm.max_map_count=262144
    State:      Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 06 Apr 2018 16:17:44 +0800
      Finished:     Fri, 06 Apr 2018 16:17:44 +0800
    Ready:      True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-q0hd5 (ro)
Containers:
  es-master:
    Container ID:   docker://0eca6f1f7d9e5f2099cda571dca850326268b9a0ef9e5a433cdb0403538f1304
    Image:      quay.io/pires/docker-elasticsearch-kubernetes:6.2.2_1
    Image ID:       docker-pullable://quay.io/pires/docker-elasticsearch-kubernetes@sha256:180f9d8779ed7d3724f52831b6071e338b0f276e8fe8f146dd2e8c7f5c8975dd
    Port:       9300/TCP
    State:      Running
      Started:      Fri, 06 Apr 2018 16:17:45 +0800
    Ready:      True
    Restart Count:  0
    Limits:
      cpu:  1
    Requests:
      cpu:  1
    Liveness:   tcp-socket :transport delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      NAMESPACE:        default (v1:metadata.namespace)
      NODE_NAME:        es-master-2519959699-0l7v4 (v1:metadata.name)
      CLUSTER_NAME:     myesdb
      NUMBER_OF_MASTERS:    2
      NODE_MASTER:      true
      NODE_INGEST:      false
      NODE_DATA:        false
      HTTP_ENABLE:      false
      ES_JAVA_OPTS:     -Xms256m -Xmx256m
      PROCESSORS:       1 (limits.cpu)
    Mounts:
      /data from storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-q0hd5 (ro)
Conditions:
  Type      Status
  Initialized   True 
  Ready     True 
  PodScheduled  True 
Volumes:
  storage:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium: 
  default-token-q0hd5:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-q0hd5
    Optional:   false
QoS Class:  Burstable
Node-Selectors: <none>
Tolerations:    <none>
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath               Type        Reason          Message
  --------- --------    -----   ----            -------------               --------    ------          -------
  32m       32m     1   default-scheduler                       Normal      Scheduled       Successfully assigned es-master-2519959699-0l7v4 to node5
  32m       32m     1   kubelet, node5                          Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "storage" 
  32m       32m     1   kubelet, node5                          Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "default-token-q0hd5" 
  32m       32m     1   kubelet, node5      spec.initContainers{init-sysctl}    Normal      Pulled          Container image "busybox:1.27.2" already present on machine
  32m       32m     1   kubelet, node5      spec.initContainers{init-sysctl}    Normal      Created         Created container
  32m       32m     1   kubelet, node5      spec.initContainers{init-sysctl}    Normal      Started         Started container
  32m       32m     1   kubelet, node5      spec.containers{es-master}      Normal      Pulled          Container image "quay.io/pires/docker-elasticsearch-kubernetes:6.2.2_1" already present on machine
  32m       32m     1   kubelet, node5      spec.containers{es-master}      Normal      Created         Created container
  32m       32m     1   kubelet, node5      spec.containers{es-master}      Normal      Started         Started container
  32m       32m     2   kubelet, node5      spec.containers{es-master}      Warning     Unhealthy       Liveness probe failed: dial tcp 10.233.108.2:9300: getsockopt: connection refused

Thanks a lot!

ddy86 commented 6 years ago

@syafiqFiqq @BezVezeE I got the same error, set a environment param has no effect, did you resolved?

From this issues https://github.com/pires/kubernetes-elasticsearch-cluster/issues/175 ,I get the solution. The point is "initialDelaySeconds: 30"