Avoid cluster down on a configuration missmatch

deimosfr commented 7 years ago

Hi,

As explained in this PR (https://github.com/pires/docker-elasticsearch/pull/44), it would be great to have a safe guard and ensure each node has properly booted before restarting/starting another one. In a rolling restart case, this could be dramatic if any nodes are rolled out and no one could boot.

To avoid this I've proposed a script inside the docker container but this has been refused. So here is another proposal with a configmap:

kind: ConfigMap
apiVersion: v1
metadata:
  name: elasticsearch-configmap
data:
  postStart.sh: |-
    #!/bin/sh
    while [ $(netstat -auntpl | grep LISTEN | grep -c 9300) -eq 0 ] ; do
      echo "waiting port to be opened"
      sleep 5
      pidof java || exit 1
    done
    exit 0

Then in the statefulset, getting:

...
   containers:
...
        lifecycle:
          postStart:
            exec:
              command: ["/elasticsearch/scripts/postStart.sh"]
...
        volumeMounts:
...
        - name: elasticsearch-configmap
          mountPath: /elasticsearch/scripts
      volumes:
...
        - name: elasticsearch-configmap
          configMap:
            name: elasticsearch-configmap
            defaultMode: 0775

What do you think ? I can make a PR if you agree with the idea.

pires commented 7 years ago

I don't understand the script. If TCP 9300 is not bound to, sleep 5 seconds and kill the java process? 🤔

deimosfr commented 7 years ago

The script waits 5 sec to ensure it doesn't goes too fast, check the java process is running. The miss understood you have is because I didn't yet give the full explanation. In fact the container doesn't handle SIGTERM and that's a problem to properly perform stops without kubernetes force shutdown. I've updated the run.sh script as well if you want to take a look:

    #!/bin/sh

    # SIGTERM-handler
    term_handler() {
      if [ $PID -ne 0 ]; then
        kill -SIGTERM "$PID"
        wait "$PID"
      fi
      exit 143;
    }

    PID=0
    BASE=/elasticsearch

    # allow for memlock
    ulimit -l unlimited

    # Set a random node name if not set.
    if [ -z "${NODE_NAME}" ]; then
        NODE_NAME=$(uuidgen)
    fi
    export NODE_NAME=${NODE_NAME}

    # Prevent "Text file busy" errors
    sync

    if [ ! -z "${ES_PLUGINS_INSTALL}" ]; then
       OLDIFS=$IFS
       IFS=','
       for plugin in ${ES_PLUGINS_INSTALL}; do
          if ! $BASE/bin/elasticsearch-plugin list | grep -qs ${plugin}; then
             yes | $BASE/bin/elasticsearch-plugin install --batch ${plugin}
          fi
       done
       IFS=$OLDIFS
    fi

    if [ ! -z "${SHARD_ALLOCATION_AWARENESS_ATTR}" ]; then
        # this will map to a file like  /etc/hostname => /dockerhostname so reading that file will get the
        #  container hostname
        if [ "$NODE_DATA" == "true" ]; then
            ES_SHARD_ATTR=`cat ${SHARD_ALLOCATION_AWARENESS_ATTR}`
            NODE_NAME="${ES_SHARD_ATTR}-${NODE_NAME}"
            echo "node.attr.${SHARD_ALLOCATION_AWARENESS}: ${ES_SHARD_ATTR}" >> $BASE/config/elasticsearch.yml
        fi
        if [ "$NODE_MASTER" == "true" ]; then
            echo "cluster.routing.allocation.awareness.attributes: ${SHARD_ALLOCATION_AWARENESS}" >> $BASE/config/elasticsearch.yml
        fi
    fi

    trap 'kill ${!}; term_handler' SIGTERM

    # run
    chown -R elasticsearch:elasticsearch $BASE
    chown -R elasticsearch:elasticsearch /data
    su-exec elasticsearch $BASE/bin/elasticsearch &
    PID="$!"

    while true ; do
      tail -f /dev/null & wait ${!}
    done

If you need more information on the topic, I suggest this blog post (https://medium.com/@gchudnov/trapping-signals-in-docker-containers-7a57fdda7d86).

I really would like to help on enhancing your Elasticsearch manifests. But it looks like it will require some important modification. That's why I'm trying to go step by step, but for this one, the topic is a little bit larger.

psalaberria002 commented 6 years ago

Something like this would be great for zero downtime upgrades. For small shards the current setup works well, but when handling large amount of data, or node pool upgrades in GKE, something more robust is needed.

@pires are you willing to merge something similar to this solution?

deimosfr commented 6 years ago

It's a shame that there is no much interest into this in this repository. If you want to help on building a better thing, I'll be happy. Here is the helm chart https://github.com/MySocialApp/kubernetes-helm-chart-elasticsearch

pires commented 6 years ago

This repo is not meant to be a production solution, but rather serve as inspiration for what you want to build.

Now, I don't think the proposed solution is complete. But I am no longer using Elasticsearch so I can't come up with a better one myself. If I were, I'd implement this logic as part of an operator and not a containerized script.

deimosfr commented 6 years ago

The script based solution is a very good transition to operators. Unfortunately, the maturity level of most of operators is not good enough today (due to the complexity of distributed systems) and this solution is a very good one while waiting operators to be fully prod ready.

pires / kubernetes-elasticsearch-cluster

Avoid cluster down on a configuration missmatch #146