zalando-incubator / kubernetes-on-aws

Deploying Kubernetes on AWS with CloudFormation and Ubuntu
https://kubernetes-on-aws.readthedocs.io/
MIT License
626 stars 163 forks source link

zmon-redis downtime #842

Closed mohabusama closed 6 years ago

mohabusama commented 6 years ago

Since zmon-redis is running as a single pod, there are chances where the pod gets re-scheduled (autoscaling) and in turn leading to some unexpected behavior:

We have various alternatives afaik:

Jan-M commented 6 years ago

This is just another case of the auto-scaler not honoring and prioritizing nodes for termination.

If downscaling via cluster auto scaler would prefer nodes without statefulsets or nodes not impacting pod disruption budgets this could be easily prevented.

szuecs commented 6 years ago

IMHO this is working by design in Kubernetes, Pods can be terminated any time. Systems that have no operator, that takes ownership of failover to replica similar to https://github.com/zalando-incubator/postgres-operator are a bug itself if these can not run with more than 2 replicas and need a single write master. Therefore closing this, because it has to be solved by the redis application owner