Fixing issue #70 - Githubissues

nick-pap commented 6 years ago

Changed chown -R to a faster method detailed here: https://www.unixlore.net/articles/speeding-up-bulk-file-operations.html

Also added some text output to show work is being done until Java starts up.

pires commented 6 years ago

Can you please measure the difference and share it here?

nick-pap commented 6 years ago

It depends on the contents of the folder and the quality of the connection to the storage. I've been waiting for 10 - 15 minutes for data node to start because chmod was still working... The DB was about 500M docs taking up 170GB.

After changing this in a local image, I got it down to 3-4 minutes.

Again, take these measurements with a big grain of salt.

However, the proposed change is solid and you can replicate the results of the referenced article easily.

nick-pap commented 6 years ago

After working with the changed script for more than a month now, I can vouch for its safety and efficiency. It really makes a difference when bringing up multiple master/data containers and drastically reduces the time needed for them to become live.

Since chown is only run when a file does not already have the required user:group assignment, once a file is processed, all subsequent containers will not need to do it again.

You can even bind mount it via a ConfigMap so you don't have to change your images:

kind: ConfigMap
apiVersion: v1
metadata:
  name: es-run
  namespace: default
data:
  run.sh: |-
    #!/bin/sh
    echo "Starting ElasticSearch $ES_VERSION"

    BASE=/elasticsearch

    # allow for memlock if enabled
    if [ "$MEMORY_LOCK" == "true" ]; then
        ulimit -l unlimited
    fi

    # Set a random node name if not set.
    if [ -z "${NODE_NAME}" ]; then
        NODE_NAME=$(uuidgen)
    fi
    export NODE_NAME=${NODE_NAME}

    # Create a temporary folder for Elastic Search ourselves.
    # Ref: https://github.com/elastic/elasticsearch/pull/27659
    export ES_TMPDIR=`mktemp -d -t elasticsearch.XXXXXXXX`

    # Prevent "Text file busy" errors
    sync

    if [ ! -z "${ES_PLUGINS_INSTALL}" ]; then
       OLDIFS=$IFS
       IFS=','
       for plugin in ${ES_PLUGINS_INSTALL}; do
          if ! $BASE/bin/elasticsearch-plugin list | grep -qs ${plugin}; then
             until $BASE/bin/elasticsearch-plugin install --batch ${plugin}; do
               echo "failed to install ${plugin}, retrying in 3s"
               sleep 3
             done
          fi
       done
       IFS=$OLDIFS
    fi

    if [ ! -z "${SHARD_ALLOCATION_AWARENESS_ATTR}" ]; then
        # this will map to a file like  /etc/hostname => /dockerhostname so reading that file will get the
        #  container hostname
        if [ "$NODE_DATA" == "true" ]; then
            ES_SHARD_ATTR=`cat ${SHARD_ALLOCATION_AWARENESS_ATTR}`
            NODE_NAME="${ES_SHARD_ATTR}-${NODE_NAME}"
            echo "node.attr.${SHARD_ALLOCATION_AWARENESS}: ${ES_SHARD_ATTR}" >> $BASE/config/elasticsearch.yml
        fi
        if [ "$NODE_MASTER" == "true" ]; then
            echo "cluster.routing.allocation.awareness.attributes: ${SHARD_ALLOCATION_AWARENESS}" >> $BASE/config/elasticsearch.yml
        fi
    fi

    # remove x-pack-ml module
    rm -rf /elasticsearch/modules/x-pack/x-pack-ml

    # run
    if [[ $(whoami) == "root" ]]; then
        echo "Changing ownership of $BASE folder"
        find . -type f -print0 | xargs -0 chown elasticsearch:elasticsearch $BASE

        echo "Changing ownership of /data folder"
        find . -type f -print0 | xargs -0 chown elasticsearch:elasticsearch /data

        exec su-exec elasticsearch $BASE/bin/elasticsearch $ES_EXTRA_ARGS
    else
        # the container's first process is not running as 'root', 
        # it does not have the rights to chown. however, we may
        # assume that it is being ran as 'elasticsearch', and that
        # the volumes already have the right permissions. this is
        # the case for kubernetes for example, when 'runAsUser: 1000'
        # and 'fsGroup:100' are defined in the pod's security context.
        $BASE/bin/elasticsearch $ES_EXTRA_ARGS
    fi

and then change the es-data, es-master deployments to add:

spec:
  template:
    spec:
      volumes:
        - name: es-run
          configMap:
            name: es-run
            defaultMode: 484

and then

spec:
  template:
    spec:
      containers:
        volumeMounts:
            - name: es-run
              mountPath: /run.sh
              subPath: run.sh

pires commented 6 years ago

Thanks a lot @nick-pap

pires / docker-elasticsearch

Fixing issue #70 #71