[o.e.d.z.UnicastZenPing ] [es-coordination] failed to resolve host [master3]

ravibhooshan commented 2 years ago

Hi, Thanks for this repo. I am trying to build a ES 6.8.23 cluster on 3 node Docker SWARM. I am following your code, but not able to run. Always get this error: [2022-02-01T17:48:19,911][WARN ][o.e.d.z.ZenDiscovery ] [es-coordination] not enough master nodes discovered during pinging (found [[]], but needed [2]), pinging again [2022-02-01T17:48:19,919][WARN ][o.e.d.z.UnicastZenPing ] [es-coordination] failed to resolve host [master1]

Below is stack file. Can you please help me on figuring out this error.

version: "3.7"

services: coordination: image: XXXXXXXX.com:8444/elastic:main ulimits: memlock: soft: -1 hard: -1 healthcheck: test: curl -fs http://localhost:9200/_cat/health || exit 1 interval: 30s timeout: 5s retries: 3 start_period: 45s configs:

source: es-coordination target: /usr/share/elasticsearch/config/elasticsearch.yml
source: jvm-options-coordination target: /usr/share/elasticsearch/config/jvm.options networks:
esnet ports:
target: 9200 published: 9200 protocol: tcp mode: host deploy: endpoint_mode: dnsrr mode: "replicated" replicas: 2 resources: limits: memory: 4G

master1: image: XXXXXXX.com:8444/elastic:main ulimits: memlock: soft: -1 hard: -1 healthcheck: test: curl -fs http://localhost:9200/_cat/health || exit 1 interval: 30s timeout: 5s retries: 3 start_period: 45s configs:
source: es-master1 target: /usr/share/elasticsearch/config/elasticsearch.yml
source: jvm-options-master target: /usr/share/elasticsearch/config/jvm.options networks:
esnet volumes:
esmaster1:/usr/share/elasticsearch/data deploy: placement: constraints: [ node.hostname == XXXXXsahdb01 ] endpoint_mode: dnsrr mode: "replicated" replicas: 1 resources: limits: memory: 4G

master2: image: XXXXXXX.com:8444/elastic:main ulimits: memlock: soft: -1 hard: -1 healthcheck: test: curl -fs http://localhost:9200/_cat/health || exit 1 interval: 30s timeout: 5s retries: 3 start_period: 45s configs:
source: es-master2 target: /usr/share/elasticsearch/config/elasticsearch.yml
source: jvm-options-master target: /usr/share/elasticsearch/config/jvm.options networks:
esnet volumes:
esmaster2:/usr/share/elasticsearch/data deploy: placement: constraints: [ node.hostname == XXXXXsahdb02 ] endpoint_mode: dnsrr mode: "replicated" replicas: 1 resources: limits: memory: 4G

master3: image: XXXXXXX.com:8444/elastic:main ulimits: memlock: soft: -1 hard: -1 healthcheck: test: curl -fs http://localhost:9200/_cat/health || exit 1 interval: 30s timeout: 5s retries: 3 start_period: 45s configs:
source: es-master3 target: /usr/share/elasticsearch/config/elasticsearch.yml
source: jvm-options-master target: /usr/share/elasticsearch/config/jvm.options networks:
esnet volumes:
esmaster3:/usr/share/elasticsearch/data deploy: placement: constraints: [ node.hostname == XXXXXlsahdb03 ] endpoint_mode: dnsrr mode: "replicated" replicas: 1 resources: limits: memory: 4G

data1: image: XXXXXXXX.com:8444/elastic:main ulimits: memlock: soft: -1 hard: -1 healthcheck: test: curl -fs http://localhost:9200/_cat/health || exit 1 interval: 30s timeout: 5s retries: 3 start_period: 45s configs:
source: es-data1 target: /usr/share/elasticsearch/config/elasticsearch.yml
source: jvm-options-data target: /usr/share/elasticsearch/config/jvm.options networks:
esnet volumes:
esdata1:/usr/share/elasticsearch/data deploy: placement: constraints: [ node.hostname == XXXXsahdb01 ] endpoint_mode: dnsrr mode: "replicated" replicas: 1 resources: limits: memory: 4G data2: image: XXXXXXXX.com:8444/elastic:main ulimits: memlock: soft: -1 hard: -1 healthcheck: test: curl -fs http://localhost:9200/_cat/health || exit 1 interval: 30s timeout: 5s retries: 3 start_period: 45s configs:
source: es-data2 target: /usr/share/elasticsearch/config/elasticsearch.yml
source: jvm-options-data target: /usr/share/elasticsearch/config/jvm.options networks:
esnet volumes:
esdata2:/usr/share/elasticsearch/data deploy: placement: constraints: [ node.hostname == XXXXsahdb02 ] endpoint_mode: dnsrr mode: "replicated" replicas: 1 resources: limits: memory: 4G data3: image: XXXXXX:8444/elastic:main ulimits: memlock: soft: -1 hard: -1 healthcheck: test: curl -fs http://localhost:9200/_cat/health || exit 1 interval: 30s timeout: 5s retries: 3 start_period: 45s configs:
source: es-data3 target: /usr/share/elasticsearch/config/elasticsearch.yml
source: jvm-options-data target: /usr/share/elasticsearch/config/jvm.options networks:
esnet volumes:
esdata3:/usr/share/elasticsearch/data deploy: placement: constraints: [ node.hostname == XXXXsahdb03 ] endpoint_mode: dnsrr mode: "replicated" replicas: 1 resources: limits: memory: 4G

networks: esnet: driver: overlay attachable: true name: esnet proxy: driver: overlay name: proxy

volumes: esmaster1: esmaster2: esmaster3:

esdata1: esdata2: esdata3:

configs: es-coordination: name: es-coordination file: es-config/es-coordination.yml es-master1: name: es-master1 file: es-config/es-master1.yml es-master2: name: es-master2 file: es-config/es-master2.yml es-master3: name: es-master3 file: es-config/es-master3.yml

es-data1: name: es-data1 file: es-config/es-data1.yml es-data2: name: es-data2 file: es-config/es-data2.yml es-data3: name: es-data3 file: es-config/es-data3.yml es-data4: name: es-data4 file: es-config/es-data4.yml

Sandeepbharmoria commented 2 years ago

Yes , I confirm same issue at my side as well. The docker file have issues, which I fixed , but still on esnet network master nodes are unable to find each other as per yml files.

gunman808 commented 2 months ago

Isn't it a bad idea to monitor the health state of the cluster service and not of the container itself? In this case the cluster will never be able to start, because in docker swarm, the name of the container will only be available if the container is in state "healthy". So it will never be healthy, because the cluster will never be green, because the needed elastissearch nodes could not be resolved. It's a deadlock. The cluster will work fine, if you remove the health check. It is not quite easy, to get a good health check for this case.

jakubhajek / elasticsearch-docker-swarm

[o.e.d.z.UnicastZenPing ] [es-coordination] failed to resolve host [master3] #12