Multi-Node Example? - Githubissues

someone1 commented 6 years ago

Hello,

I'm trying to deploy this across a 4-host cluster where I have it setup with 3 master nodes, 1 ingest node, 1 tribe node, and 2 data nodes. The problem is that it's not clear how this should all work in a setup like this. I keep the data volume local to each host for each node type and I share a nfs volume with each host for /elasticsearch/config/searchguard/ssl so that each node can sign their certificate with the same CA root.

The problem is several layers deep:

The startup scripts will clear out and regenerate ALL certificates if any "change" is detected, so password env variables need to be repeated across all node types even if they are not applicable.
The startup scripts will NOT generate a certificate for a node if it thinks it already generated certificates, so if I have 7 nodes and one of them does the 'gen_all.sh' before any other container startup, then none of the other containers will generate node certificates and fail to launch - I have to manually go into each node container and run the generate script for that node (e.g. cd /elasticsearch/config/searchguard/ssl && NODE_NAME=$HOSTNAME /run/auth/certificates/gen_node_cert.sh)
Even when all this is done - the cluster will not start up correctly, I get "cannot communicate" errors.

Here is my compose file (I use Rancher so some of it might be non-standard - I also use my own images which just pull from yours and adds a two ingest plugins, please ignore any mention of x-pack, that plugin is NOT installed):

version: '2'
volumes:
  elasticsearch-config:
    external: true
    driver: rancher-nfs
  es-storage-volume:
    driver: local
    per_container: true
services:
  es-storage:
    image: rawmind/alpine-volume:0.0.2-2
    environment:
      SERVICE_GID: '1000'
      SERVICE_UID: '1000'
      SERVICE_VOLUME: /elasticsearch/data
    network_mode: none
    volumes:
    - es-storage-volume:/elasticsearch/data
    labels:
      io.rancher.container.start_once: 'true'
  es-data:
    mem_limit: 2147483648
    cap_add:
    - IPC_LOCK
    image: someone1/elasticsearch-searchguard-xpack
    environment:
      HEAP_SIZE: 1g
      CLUSTER_NAME: condor-es
      HOSTS: es-master
      NODE_DATA: 'true'
      NODE_INGEST: 'false'
      NODE_MASTER: 'false'
      NODE_NAME: ''
      ELASTIC_PWD: <removed>
      KIBANA_PWD: <removed>
      LOGSTASH_PWD: <removed>
      BEATS_PWD: <removed>
      CA_PWD: <removed>
      TS_PWD: <removed>
      KS_PWD: <removed>
    ulimits:
      memlock:
        hard: -1
        soft: -1
      nofile:
        hard: 65536
        soft: 65536
    volumes:
    - elasticsearch-config:/elasticsearch/config/searchguard/ssl
    volumes_from:
    - es-storage
    command:
    - -Ebootstrap.memory_lock=true
    - -Esearch.remote.connect=false
    labels:
      io.rancher.scheduler.affinity:host_label: esready=true
      io.rancher.sidekicks: es-storage,es-sysctl
      io.rancher.container.hostname_override: container_name
      io.rancher.container.pull_image: always
      io.rancher.scheduler.global: 'true'
  es-sysctl:
    privileged: true
    image: rawmind/alpine-sysctl:0.1
    environment:
      SYSCTL_KEY: vm.max_map_count
      SYSCTL_VALUE: '262144'
    network_mode: none
    labels:
      io.rancher.container.start_once: 'true'
  es-ingest:
    mem_limit: 1073741824
    cap_add:
    - IPC_LOCK
    image: someone1/elasticsearch-searchguard-xpack
    environment:
      HEAP_SIZE: 512m
      CLUSTER_NAME: condor-es
      HOSTS: es-master
      NODE_DATA: 'false'
      NODE_INGEST: 'true'
      NODE_MASTER: 'false'
      NODE_NAME: ''
      ELASTIC_PWD: <removed>
      KIBANA_PWD: <removed>
      LOGSTASH_PWD: <removed>
      BEATS_PWD: <removed>
      CA_PWD: <removed>
      TS_PWD: <removed>
      KS_PWD: <removed>
    ulimits:
      memlock:
        hard: -1
        soft: -1
      nofile:
        hard: 65536
        soft: 65536
    volumes:
    - elasticsearch-config:/elasticsearch/config/searchguard/ssl
    volumes_from:
    - es-storage
    command:
    - -Ebootstrap.memory_lock=true
    - -Esearch.remote.connect=false
    labels:
      io.rancher.scheduler.affinity:container_label_soft_ne: io.rancher.stack_service.name=$${stack_name}/$${service_name}
      io.rancher.sidekicks: es-storage,es-sysctl
      io.rancher.container.hostname_override: container_name
      io.rancher.container.pull_image: always
  es-tribe:
    mem_limit: 1073741824
    cap_add:
    - IPC_LOCK
    image: someone1/elasticsearch-searchguard-xpack
    environment:
      CLUSTER_NAME: condor-es
      HOSTS: es-master
      NODE_DATA: 'false'
      NODE_INGEST: 'false'
      NODE_MASTER: 'false'
      NODE_NAME: ''
      HEAP_SIZE: 512m
      ELASTIC_PWD: <removed>
      KIBANA_PWD: <removed>
      LOGSTASH_PWD: <removed>
      BEATS_PWD: <removed>
      CA_PWD: <removed>
      TS_PWD: <removed>
      KS_PWD: <removed>
    ulimits:
      memlock:
        hard: -1
        soft: -1
      nofile:
        hard: 65536
        soft: 65536
    volumes:
    - elasticsearch-config:/elasticsearch/config/searchguard/ssl
    volumes_from:
    - es-storage
    ports:
    - 9200:9200/tcp
    command:
    - -Ebootstrap.memory_lock=true
    - -Esearch.remote.connect=false
    labels:
      io.rancher.scheduler.affinity:container_label_soft_ne: io.rancher.stack_service.name=$${stack_name}/$${service_name}
      io.rancher.sidekicks: es-storage,es-sysctl
      io.rancher.container.hostname_override: container_name
      io.rancher.container.pull_image: always
  es-master:
    mem_limit: 1073741824
    cap_add:
    - IPC_LOCK
    image: someone1/elasticsearch-searchguard-xpack
    environment:
      HEAP_SIZE: 512m
      CLUSTER_NAME: condor-es
      MINIMUM_MASTER_NODES: '2'
      HOSTS: es-master
      NODE_DATA: 'false'
      NODE_INGEST: 'false'
      NODE_MASTER: 'true'
      NODE_NAME: ''
      ELASTIC_PWD: <removed>
      KIBANA_PWD: <removed>
      LOGSTASH_PWD: <removed>
      BEATS_PWD: <removed>
      CA_PWD: <removed>
      TS_PWD: <removed>
      KS_PWD: <removed>
    ulimits:
      memlock:
        hard: -1
        soft: -1
      nofile:
        hard: 65536
        soft: 65536
    volumes:
    - elasticsearch-config:/elasticsearch/config/searchguard/ssl
    volumes_from:
    - es-storage
    command:
    - -Ebootstrap.memory_lock=true
    - -Esearch.remote.connect=false
    labels:
      io.rancher.scheduler.affinity:container_label_soft_ne: io.rancher.stack_service.name=$${stack_name}/$${service_name}
      io.rancher.sidekicks: es-storage,es-sysctl
      io.rancher.container.hostname_override: container_name
      io.rancher.container.pull_image: always

Any help/guidance getting this to work would be much appreciated!

khezen commented 6 years ago

I will have a look when I have some time left. Meanwhile if you can also provide logs that would help.

someone1 commented 6 years ago

I've moved to the standard ES cluster images as I needed to continue with my project but I can try and spin this up sometime after (next week hopefully!) to get you logs. I think you should be able to recreate the problem by trying to spin up a 2-node cluster.

ps23 commented 5 years ago

1) What is the current state for a working docker cluster example? 2) Can we have a procedure in the documentation on best practice for how to setup the cluster with search-guard?

khezen commented 5 years ago

Hi,

What is the current state for a working docker cluster example? So far no progress has been done here. If I had time to allocate I'd implement an example inside a kubernetes cluster.

Can we have a procedure in the documentation on best practice for how to setup the cluster with search-guard? Yes. Since my attention is taken by another project any contributor is welcome to take ownership

khezen / docker-elasticsearch

Multi-Node Example? #31