Preventing data los on data nodes during rolling update

depauna commented 6 years ago

Shouldn't there be an http livenessprobe on the data nodes that prevents terminating older data nodes when they are still replicating new data to newly running data nodes during a rolling update?

So meaning newly created data nodes are not seen as fully running untill the cluster status is green.

I'm looking at something like this:

       livenessProbe:
          httpGet:
            path: /_cluster/health?wait_for_status=green
            host: elasticsearch
            port: 9200
            scheme: HTTP
          initialDelaySeconds: 300
          timeoutSeconds: 60
          failureThreshold: 5

pires commented 6 years ago

A failed livenessProbe kills a pod. Are you thinking of readinessProbe?

depauna commented 6 years ago

Yup, totally messed that up. Something like this.

        readinessProbe:
          exec:
            command:
            - curl 
            - -i 
            - -H 
            - "Accept: application/json" 
            - -H 
            - "Content-Type: application/json" 
            - -X 
            - GET 
            - http://{{ .Values.elasticsearch.name }}:{{ .Values.elasticsearch.client.restPort }}/_cluster/health?wait_for_status=green&timeout=31557600s
          initialDelaySeconds: 30
          timeoutSeconds: 10
          failureThreshold: 3

Only downside is; when a node goes down or reboots. Cluster status becomes yellow and then when the node tries to start it will never fully happen (will remain 0/1 Running). As the status is not green.

Still wondering if we need it. As with rolling updates data could get lost when as soon as one data node is available again the other one goes down before syncing new data to the new one.

Or do the containers have something built in for that?

depauna commented 6 years ago

Any thoughts?

pires / kubernetes-elasticsearch-cluster

Preventing data los on data nodes during rolling update #188