lando / platformsh

The Official Platform.sh Lando Plugin
https://docs.lando.dev/platformsh
GNU General Public License v3.0
6 stars 4 forks source link

p.sh Elasticsearch fails to start correctly. #97

Closed pirog closed 2 years ago

pirog commented 4 years ago

Fairly easy to replicate this:

  1. lando init the lando-d8 site, or lando destroy the site if already have it
  2. Add a trivial ES instance to services.yaml
db:
    type: mariadb:10.2
    disk: 2048

cache:
    type: redis:5.0

searchelastic:
    type: elasticsearch:7.2
    disk: 256
  1. lando start, this should error (you may need to lando restart)
  2. docker logs landod8_searchelastic_1
mikemilano commented 4 years ago

Here are the processes running but there is no response on port 9200, even from within the container.

root@c7e8c6d4798c:/app# ps aux|grep elastic
root       249  0.0  0.0   4052  1032 ?        Ss   15:29   0:00 runsv elasticsearch
elastic+  7738  0.0  0.0  17972  2912 ?        S    15:38   0:00 /bin/bash /usr/share/elasticsearch/bin/elasticsearch -Epath.lo
elastic+  7751  0.0  0.4 4750128 34816 ?       Sl   15:38   0:00 /usr/share/elasticsearch/jdk/bin/java -cp /usr/share/elasticse
root      7769  0.0  0.0  14504   968 pts/0    S+   15:38   0:00 grep --color=auto elastic

The errors seen here after Lando handing off repeat hundreds of times in the lando logs for this service:

searchelastic_1  | 2020-06-11 15:29:54,681 platformsh.agent DEBUG Finished: /etc/platform/boot
searchelastic_1  | lando 15:29:54.71 INFO  ==> Lando handing off to: exec init
searchelastic_1  | runsv idmapd: fatal: unable to lock supervise/lock: temporary failure
searchelastic_1  | runsv elasticsearch: fatal: unable to lock supervise/lock: temporary failure
mikemilano commented 4 years ago

I found Java exceptions being thrown in the ES logs.

The first one was related to memory. Changing-Xms1536m to -Xms1536m in /etc/elasticsearch/jvm.options resolves this issue.

Once that is gone however, there is a network error. Changing network.host: [ _local_ ] to network.host: [ _eth0_ ], then commenting out the http section immediately below it, resolves this issue. Replacing _local_ with _eth0_ in the http section may resolve this as well.

There are templates that create these files which I'm unsure can be modified by Lando with env vars or something. If they can be, the solution is relatively easy.

Otherwise, the plan of attack at this point is to refactor these values with a script that replaces these (perhaps using sed) during the lando build process.