pires / kubernetes-elasticsearch-cluster

Elasticsearch cluster on top of Kubernetes made easy.
Apache License 2.0
1.51k stars 690 forks source link

Pods elasticsearch failed with "Back-off restarting failed container" #205

Open maryxu opened 6 years ago

maryxu commented 6 years ago

Can you help to guide us about why Pods elasticsearch failed with "Back-off restarting failed container"? Thanks a lot!

[root@####### ~]# kubectl describe pod efk3-elasticsearch-2 --namespace=efk --insecure-skip-tls-verify=true Name: efk3-elasticsearch-2 Namespace: efk Node: lvdevk8sw23/10.219.161.3 Start Time: Tue, 03 Jul 2018 14:58:28 +0800 Labels: app=elasticsearch component=master controller-revision-hash=efk3-elasticsearch-569cf776f release=efk3 statefulset.kubernetes.io/pod-name=efk3-elasticsearch-2 Annotations: Status: Running IP: 10.42.6.19 Controlled By: StatefulSet/efk3-elasticsearch Init Containers: sysctl: Container ID: docker://bed338fd0e395678abbbd1c49be7d14e1636faacdadba334c0e5607c4eb07251 Image: busybox Image ID: docker-pullable://busybox@sha256:141c253bc4c3fd0a201d32dc1f493bcf3fff003b6df416dea4f41046e0f37d47 Port: Command: sysctl -w vm.max_map_count=262144 State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 03 Jul 2018 14:58:32 +0800 Finished: Tue, 03 Jul 2018 14:58:32 +0800 Ready: True Restart Count: 0 Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from efk3-elasticsearch-token-5vqzr (ro) Containers: elasticsearch: Container ID: docker://336bfce82124136d78de952552de2f2688c5f4249c9c440b1259fd7b3e230046 Image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.2.4 Image ID: docker-pullable://docker.elastic.co/elasticsearch/elasticsearch-oss@sha256:2d9c774c536bd1f64abc4993ebc96a2344404d780cbeb81a8b3b4c3807550e57 Ports: 9300/TCP, 9200/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Tue, 10 Jul 2018 13:20:21 +0800 Finished: Tue, 10 Jul 2018 13:20:26 +0800 Ready: False Restart Count: 1929 Limits: cpu: 1 Requests: cpu: 25m memory: 512Mi Readiness: http-get http://:9200/_cluster/health%3Flocal=true delay=5s timeout=1s period=10s #success=1 #failure=3 Environment: cluster.name: efk3-cluster discovery.zen.ping.unicast.hosts: efk3-elasticsearch discovery.zen.minimum_master_nodes: 2 KUBERNETES_NAMESPACE: efk (v1:metadata.namespace) discovery.zen.ping.unicast.hosts: efk3-elasticsearch PROCESSORS: 1 (limits.cpu) ES_JAVA_OPTS: -Djava.net.preferIPv4Stack=true -Xms512m -Xmx512m Mounts: /usr/share/elasticsearch/data from data (rw) /var/run/secrets/kubernetes.io/serviceaccount from efk3-elasticsearch-token-5vqzr (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: data-efk3-elasticsearch-2 ReadOnly: false efk3-elasticsearch-token-5vqzr: Type: Secret (a volume populated by a Secret) SecretName: efk3-elasticsearch-token-5vqzr Optional: false QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message


Normal Pulled 48m (x1921 over 6d) kubelet, lvdevk8sw23 Container image "docker.elastic.co/elasticsearch/elasticsearch-oss:6.2.4" already present on machine Warning BackOff 3m (x44220 over 6d) kubelet, lvdevk8sw23 Back-off restarting failed container

mat1010 commented 6 years ago

@maryxu Could you run a kubectl logs po/efk3-elasticsearch-2 --namespace=efk --insecure-skip-tls-verify=true and check the PODs logs to see if there are issues with the process within the container itself?

It would be also helpful to paste the output always in code blocks for easier reading.

maryxu commented 6 years ago

Thanks for your response! Do you mean of this? I pasted these into code blocks already.

[root@####### ~]# kubectl logs po/efk3-elasticsearch-2 --namespace=efk --insecure-skip-tls-verify=true [2018-07-10T07:04:24,598][INFO ][o.e.n.Node ] [] initializing ... [2018-07-10T07:04:24,687][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main] org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data/efk3-cluster]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])? at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:125) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.2.4.jar:6.2.4] at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:85) ~[elasticsearch-6.2.4.jar:6.2.4] Caused by: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data/efk3-cluster]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])? at org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:244) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.node.Node.(Node.java:264) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.node.Node.(Node.java:246) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Bootstrap$5.(Bootstrap.java:213) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:323) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-6.2.4.jar:6.2.4] ... 6 more Caused by: java.io.IOException: failed to obtain lock on /usr/share/elasticsearch/data/nodes/0 at org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:223) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.node.Node.(Node.java:264) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.node.Node.(Node.java:246) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Bootstrap$5.(Bootstrap.java:213) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:323) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-6.2.4.jar:6.2.4] ... 6 more Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes/0/node.lock at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) ~[?:?] at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?] at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?] at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) ~[?:?] at java.nio.channels.FileChannel.open(FileChannel.java:287) ~[?:1.8.0_161] at java.nio.channels.FileChannel.open(FileChannel.java:335) ~[?:1.8.0_161] at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:125) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43] at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43] at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43] at org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:209) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.node.Node.(Node.java:264) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.node.Node.(Node.java:246) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Bootstrap$5.(Bootstrap.java:213) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:323) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-6.2.4.jar:6.2.4] ... 6 more

Mary Xu,

From: Matthias Kneer notifications@github.com Sent: 2018年7月10日 14:57 To: pires/kubernetes-elasticsearch-cluster kubernetes-elasticsearch-cluster@noreply.github.com Cc: Xu, Mary Mary.Xu@activenetwork.com; Mention mention@noreply.github.com Subject: Re: [pires/kubernetes-elasticsearch-cluster] Pods elasticsearch failed with "Back-off restarting failed container" (#205)

@maryxuhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_maryxu&d=DwMFaQ&c=8phuB5cQQHlkI2lLHrIvhpifJML7AQU49gcTMQttO8k&r=CNfej8kgPEZdHUGkjeeWopsXPs0-nOpeGddofc8Xj-g&m=AKLm6957lP8Z-9bbRTpxoNvKsvp1IQhKcLQIxWdQjmg&s=2XvtDtMGwQtrAV_Rdgsd43msMktLxoO75c9luCEMYB8&e= Could you run a kubectl logs po/efk3-elasticsearch-2 --namespace=efk --insecure-skip-tls-verify=true and check the PODs logs to see if there are issues with the process within the container itself?

It would be also helpful to paste the output always in code blocks for easier reading.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pires_kubernetes-2Delasticsearch-2Dcluster_issues_205-23issuecomment-2D403720509&d=DwMFaQ&c=8phuB5cQQHlkI2lLHrIvhpifJML7AQU49gcTMQttO8k&r=CNfej8kgPEZdHUGkjeeWopsXPs0-nOpeGddofc8Xj-g&m=AKLm6957lP8Z-9bbRTpxoNvKsvp1IQhKcLQIxWdQjmg&s=JxvCXZ-sPvdEV3Hn5Xm8NZDETCNUZ9I62Pcx8hoNwBE&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMp74jf6B9KHnVrN-2DP-2DQkAoisUCnVH8jks5uFFAfgaJpZM4VI1c2&d=DwMFaQ&c=8phuB5cQQHlkI2lLHrIvhpifJML7AQU49gcTMQttO8k&r=CNfej8kgPEZdHUGkjeeWopsXPs0-nOpeGddofc8Xj-g&m=AKLm6957lP8Z-9bbRTpxoNvKsvp1IQhKcLQIxWdQjmg&s=mnVbYFbpGIbnAqZXtoIGafAAvyDPsIYjj9bagROs2fc&e=.

mat1010 commented 6 years ago

It looks like a permission issue

Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes/0/node.lock
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) ~[?:?]
        at java.nio.channels.FileChannel.open(FileChannel.java:287) ~[?:1.8.0_161]
        at java.nio.channels.FileChannel.open(FileChannel.java:335) ~[?:1.8.0_161]
        at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:125) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
        at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
        at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:209) ~[elasticsearch-6.2.4.jar:6.2.4]
        at org.elasticsearch.node.Node.<init>(Node.java:264) ~[elasticsearch-6.2.4.jar:6.2.4]
        at org.elasticsearch.node.Node.<init>(Node.java:246) ~[elasticsearch-6.2.4.jar:6.2.4]
        at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.2.4.jar:6.2.4]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.2.4.jar:6.2.4]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:323) ~[elasticsearch-6.2.4.jar:6.2.4]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-6.2.4.jar:6.2.4]
        ... 6 more

Elasticsearch seems to not be able to write into /usr/share/elasticsearch/data/

maryxu commented 6 years ago

I’m not quite understand. I’m using a NFS data for the PV and PVC. For the other applications, the container can write to the NFS data without no problem.

[root@k8s-test11 data]# kubectl get pv,pvc --namespace=efk --insecure-skip-tls-verify=true NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pv/data-efk1-elasticsearch-0 5Gi RWO Retain Bound efk/data-efk3-elasticsearch-0 14d pv/data-efk1-elasticsearch-1 5Gi RWO Retain Bound efk/data-efk3-elasticsearch-1 14d pv/data-efk1-elasticsearch-2 5Gi RWO Retain Bound efk/data-efk3-elasticsearch-2 14d

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc/data-efk3-elasticsearch-0 Bound data-efk1-elasticsearch-0 5Gi RWO 14d pvc/data-efk3-elasticsearch-1 Bound data-efk1-elasticsearch-1 5Gi RWO 14d pvc/data-efk3-elasticsearch-2 Bound data-efk1-elasticsearch-2 5Gi RWO 14d

Mary Xu,

From: Matthias Kneer notifications@github.com Sent: 2018年7月10日 15:21 To: pires/kubernetes-elasticsearch-cluster kubernetes-elasticsearch-cluster@noreply.github.com Cc: Xu, Mary Mary.Xu@activenetwork.com; Mention mention@noreply.github.com Subject: Re: [pires/kubernetes-elasticsearch-cluster] Pods elasticsearch failed with "Back-off restarting failed container" (#205)

It looks like a permission issue

Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes/0/node.lock

    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) ~[?:?]

    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]

    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]

    at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) ~[?:?]

    at java.nio.channels.FileChannel.open(FileChannel.java:287) ~[?:1.8.0_161]

    at java.nio.channels.FileChannel.open(FileChannel.java:335) ~[?:1.8.0_161]

    at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:125) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]

    at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]

    at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]

    at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:209) ~[elasticsearch-6.2.4.jar:6.2.4]

    at org.elasticsearch.node.Node.<init>(Node.java:264) ~[elasticsearch-6.2.4.jar:6.2.4]

    at org.elasticsearch.node.Node.<init>(Node.java:246) ~[elasticsearch-6.2.4.jar:6.2.4]

    at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.2.4.jar:6.2.4]

    at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.2.4.jar:6.2.4]

    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:323) ~[elasticsearch-6.2.4.jar:6.2.4]

    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-6.2.4.jar:6.2.4]

    ... 6 more

Elasticsearch seems to not be able to write into /usr/share/elasticsearch/data/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pires_kubernetes-2Delasticsearch-2Dcluster_issues_205-23issuecomment-2D403725920&d=DwMFaQ&c=8phuB5cQQHlkI2lLHrIvhpifJML7AQU49gcTMQttO8k&r=CNfej8kgPEZdHUGkjeeWopsXPs0-nOpeGddofc8Xj-g&m=PxWC-nhhnUpL3Xj00nJYiaRMsDfDrBy5YMGREDyADGQ&s=UX5b268VpIyc3-7u96OLjM-uVWHk9FmiySyMQdRiS74&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMp74l1xbRFQB5Fa4Ps-2DL7t0pJk0qPc0ks5uFFXHgaJpZM4VI1c2&d=DwMFaQ&c=8phuB5cQQHlkI2lLHrIvhpifJML7AQU49gcTMQttO8k&r=CNfej8kgPEZdHUGkjeeWopsXPs0-nOpeGddofc8Xj-g&m=PxWC-nhhnUpL3Xj00nJYiaRMsDfDrBy5YMGREDyADGQ&s=sDyuNrJ4_B4_OuQ5W9nVuknCu9mZzWVa_ciRTkPIu2k&e=.

mat1010 commented 6 years ago

For the other applications, the container can write to the NFS data without no problem.

What are the other applications? Do you have multiple APPs running in the same container? Please post your kubernetes statefulset. Including the NFS volumes that are attached and mounted to the PODs.

The elasticsearch container will most likely not be started as root. Therefore the user elasticsearch needs the permissions to write into /usr/share/elasticsearch/data