pires / kubernetes-elasticsearch-cluster

Elasticsearch cluster on top of Kubernetes made easy.
Apache License 2.0
1.51k stars 690 forks source link

Elasticsearch startup fails with 'failed to obtain data locks' #145

Open artushin opened 6 years ago

artushin commented 6 years ago

Pod startup fails with the following:

[2017-10-26T22:24:41,544][INFO ][o.e.n.Node               ] [elasticsearch-master-4227219301-s0dr9] initializing ...
[2017-10-26T22:24:41,591][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [elasticsearch-master-4227219301-s0dr9] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: failed to obtain node locks, tried [[/data/data/monitoring]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:127) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:58) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.cli.Command.main(Command.java:88) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) ~[elasticsearch-5.3.2.jar:5.3.2]
Caused by: java.lang.IllegalStateException: failed to obtain node locks, tried [[/data/data/monitoring]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?
    at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:260) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.node.Node.<init>(Node.java:262) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.node.Node.<init>(Node.java:242) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:242) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.3.2.jar:5.3.2]
    ... 6 more
Caused by: java.io.IOException: failed to obtain lock on /data/data/nodes/0
    at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:239) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.node.Node.<init>(Node.java:262) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.node.Node.<init>(Node.java:242) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:242) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.3.2.jar:5.3.2]
    ... 6 more
Caused by: java.io.IOException: Mount point not found
    at sun.nio.fs.LinuxFileStore.findMountEntry(LinuxFileStore.java:91) ~[?:?]
    at sun.nio.fs.UnixFileStore.<init>(UnixFileStore.java:65) ~[?:?]
    at sun.nio.fs.LinuxFileStore.<init>(LinuxFileStore.java:44) ~[?:?]
    at sun.nio.fs.LinuxFileSystemProvider.getFileStore(LinuxFileSystemProvider.java:51) ~[?:?]
    at sun.nio.fs.LinuxFileSystemProvider.getFileStore(LinuxFileSystemProvider.java:39) ~[?:?]
    at sun.nio.fs.UnixFileSystemProvider.getFileStore(UnixFileSystemProvider.java:368) ~[?:?]
    at java.nio.file.Files.getFileStore(Files.java:1461) ~[?:1.8.0_121]
    at org.elasticsearch.env.ESFileStore.getMatchingFileStore(ESFileStore.java:107) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.env.Environment.getFileStore(Environment.java:351) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.env.NodeEnvironment$NodePath.<init>(NodeEnvironment.java:108) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:227) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.node.Node.<init>(Node.java:262) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.node.Node.<init>(Node.java:242) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:242) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.3.2.jar:5.3.2]
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.3.2.jar:5.3.2]
    ... 6 more

Same issue as https://github.com/elastic/elasticsearch-docker/issues/44

Workaround for those of us on the pires images: Use https://github.com/goldmann/docker-squash to reduce the size of overlay in /proc/mounts then push your own image and use that.

😢

pires commented 6 years ago

If quay.io supports multi-stage builds, I can try and work on reducing the layers needed. But right now I'm really busy with work stuff :/

artushin commented 6 years ago

Squashing worked alright (grep overlay /proc/mounts | wc -c went down from 1870 to 358) and elasticsearch started. I don't actually know if the latest images have the same issue. I think it would depend on when you last built docker-jre.

pires commented 6 years ago

I remember there was some issue with Alpine Linux containers and layers. Anyway, I can't find a way right now to squash this without overwriting automatically tagged images. Using your own is fine by me.

ykyuen commented 6 years ago

:heart_eyes: Thanks for providing the workaround, using docker-squash could solve the problem. Another approach is updating the base image from alpine to new version (Alpine >= 3.6.2) or other linux distro as suggested by @AtzeDeVries in this issue.

just a warm reminder, if u want to keep using the same image version tag when deploying to k8s, make sure u have imagePullPolicy: Always in the deployment .yaml file. Otherwise k8s will always keep using the old image. (i got stuck for two days! :persevere: )

dearjadu commented 6 years ago

I was having the same problem with the java.io.IOException: Mount Point not found using the quay.io/pires/docker-elasticsearch-kubernetes:2.4.0 image (as of June 1, 2018) on Azure Kubernetes Service with https://github.com/pires/kubernetes-elasticsearch-cluster (for both es-master.yml and es-data.yml). Surprisingly, the exact same Kube specs worked on the first attempt on GKE and AWS (with KOPS). Also, notably, if I move to quay.io/pires/docker-elasticsearch-kubernetes:5.6.0, the configuration worked on all three providers equally.

To workaround the issue, as explained by @artushin in the original post, I followed these steps:

  1. Pulled quay.io/pires/docker-elasticsearch-kubernetes:2.4.0 on my Linux VM
  2. Installed docker-squash as described in https://github.com/goldmann/docker-squash
  3. Ran: docker-squash 852fb8f4a5e1, where 852fb8f4a5e1 is the Docker image ID of quay.io/pires/docker-elasticsearch-kubernetes:2.4.0
  4. Tagged the newly produced image (9c4ca802e8b9) as myrepo/pires-docker-elasticsearch-kubernetes:2.4.0
  5. Changed my es-master.yml and es-data.yml to refer to myrepo/pires-docker-elasticsearch-kubernetes:2.4.0 (everything else remains identical)
  6. Deployed es-master.yml and es-data.yml using kubectl apply -f

That worked on AKS. I'm still testing the same with GKE and AWS/KOPS, but I'm hopeful that a single kubespec will work across the three engines.

Super thanks to @artushin

alexsandro-xpt commented 6 years ago

@dearjadu It's still work really fine at Azure AKS? What is VM Size are you using?

dearjadu commented 6 years ago

@alexsandro-xpt Yes, it does. I was able to use the same "sqashed" image in both GKE and AKS. I'm using Standard_DS11_v2 on AKS (and n1-highmem-2 on GKE).

pires commented 6 years ago

FYI, that image (2.4.0) is super old. I highly recommend the new 6.x versions.