Open deimosfr opened 7 years ago
I don't understand the script. If TCP 9300 is not bound to, sleep 5 seconds and kill the java process? 🤔
The script waits 5 sec to ensure it doesn't goes too fast, check the java process is running. The miss understood you have is because I didn't yet give the full explanation. In fact the container doesn't handle SIGTERM and that's a problem to properly perform stops without kubernetes force shutdown. I've updated the run.sh script as well if you want to take a look:
#!/bin/sh
# SIGTERM-handler
term_handler() {
if [ $PID -ne 0 ]; then
kill -SIGTERM "$PID"
wait "$PID"
fi
exit 143;
}
PID=0
BASE=/elasticsearch
# allow for memlock
ulimit -l unlimited
# Set a random node name if not set.
if [ -z "${NODE_NAME}" ]; then
NODE_NAME=$(uuidgen)
fi
export NODE_NAME=${NODE_NAME}
# Prevent "Text file busy" errors
sync
if [ ! -z "${ES_PLUGINS_INSTALL}" ]; then
OLDIFS=$IFS
IFS=','
for plugin in ${ES_PLUGINS_INSTALL}; do
if ! $BASE/bin/elasticsearch-plugin list | grep -qs ${plugin}; then
yes | $BASE/bin/elasticsearch-plugin install --batch ${plugin}
fi
done
IFS=$OLDIFS
fi
if [ ! -z "${SHARD_ALLOCATION_AWARENESS_ATTR}" ]; then
# this will map to a file like /etc/hostname => /dockerhostname so reading that file will get the
# container hostname
if [ "$NODE_DATA" == "true" ]; then
ES_SHARD_ATTR=`cat ${SHARD_ALLOCATION_AWARENESS_ATTR}`
NODE_NAME="${ES_SHARD_ATTR}-${NODE_NAME}"
echo "node.attr.${SHARD_ALLOCATION_AWARENESS}: ${ES_SHARD_ATTR}" >> $BASE/config/elasticsearch.yml
fi
if [ "$NODE_MASTER" == "true" ]; then
echo "cluster.routing.allocation.awareness.attributes: ${SHARD_ALLOCATION_AWARENESS}" >> $BASE/config/elasticsearch.yml
fi
fi
trap 'kill ${!}; term_handler' SIGTERM
# run
chown -R elasticsearch:elasticsearch $BASE
chown -R elasticsearch:elasticsearch /data
su-exec elasticsearch $BASE/bin/elasticsearch &
PID="$!"
while true ; do
tail -f /dev/null & wait ${!}
done
If you need more information on the topic, I suggest this blog post (https://medium.com/@gchudnov/trapping-signals-in-docker-containers-7a57fdda7d86).
I really would like to help on enhancing your Elasticsearch manifests. But it looks like it will require some important modification. That's why I'm trying to go step by step, but for this one, the topic is a little bit larger.
Something like this would be great for zero downtime upgrades. For small shards the current setup works well, but when handling large amount of data, or node pool upgrades in GKE, something more robust is needed.
@pires are you willing to merge something similar to this solution?
It's a shame that there is no much interest into this in this repository. If you want to help on building a better thing, I'll be happy. Here is the helm chart https://github.com/MySocialApp/kubernetes-helm-chart-elasticsearch
This repo is not meant to be a production solution, but rather serve as inspiration for what you want to build.
Now, I don't think the proposed solution is complete. But I am no longer using Elasticsearch so I can't come up with a better one myself. If I were, I'd implement this logic as part of an operator and not a containerized script.
The script based solution is a very good transition to operators. Unfortunately, the maturity level of most of operators is not good enough today (due to the complexity of distributed systems) and this solution is a very good one while waiting operators to be fully prod ready.
Hi,
As explained in this PR (https://github.com/pires/docker-elasticsearch/pull/44), it would be great to have a safe guard and ensure each node has properly booted before restarting/starting another one. In a rolling restart case, this could be dramatic if any nodes are rolled out and no one could boot.
To avoid this I've proposed a script inside the docker container but this has been refused. So here is another proposal with a configmap:
Then in the statefulset, getting:
What do you think ? I can make a PR if you agree with the idea.