prometheus-operator / runbooks

https://runbooks.prometheus-operator.dev
Apache License 2.0
93 stars 172 forks source link

Wrong information on etcd runbooks? #30

Closed mehyedes closed 2 years ago

mehyedes commented 2 years ago

Hi, I was going through the etcd runbooks and I noticed this https://github.com/prometheus-operator/runbooks/blob/5ef0792031ab316fb23724e5ab395a175878dd7d/content/runbooks/etcd/etcdInsufficientMembers.md?plain=1#L8-L12 Same in https://github.com/prometheus-operator/runbooks/blob/5ef0792031ab316fb23724e5ab395a175878dd7d/content/runbooks/etcd/etcdNoLeader.md?plain=1#L10-L11

As far as I know, etcd would only reject write operations when it loses its quorum, but would still allow reads. It's also highlighted in the etcd docs here

When the majority members of the cluster fail, the etcd cluster fails and cannot accept more writes.

This is more of a question as I was a bit confused :sweat_smile: So it would be great if you could confirm which information is correct

paulfantom commented 2 years ago

Good questions :thinking: We adapted this from OpenShift runbooks (example: https://github.com/openshift/runbooks/blob/master/alerts/cluster-etcd-operator/etcdInsufficientMembers.md).

I wonder how the etcd cluster losing quorum affects kubernetes API server. There might be some switch in API server to prevent reads too and it would be good to investigate this further.

mehyedes commented 2 years ago

Thanks for the quick response. As far as I understand, etcd(or Raft) only requires consensus for write operations. Therefore, any other etcd member(follower) can process read requests.

Did some further googling, I am quoting the official k8s docs here:

Performance and stability of the cluster is sensitive to network and disk I/O. Any resource starvation can lead to heartbeat timeout, causing instability of the cluster. An unstable etcd indicates that no leader is elected. Under such circumstances, a cluster cannot make any changes to its current state, which implies no new pods can be scheduled.

and quoting another (not-so-official) source:

Kubernetes relies on etcd for storing the state of the whole cluster. Losing etcd consensus makes the Kubernetes API server essentially read only, i.e. no changes can be performed in the cluster.

Couldn't find better resources unfortunately for now.

Unless there an explicit configuration in k8s that allows read operations only from the etcd leader, only updates to the cluster state would be forbidden and read requests should continue to be processed(although some members might serve stale data).

nvtkaszpir commented 2 years ago

When etcd does not have a majority of instances available the Kubernetes and OpenShift APIs will reject read and write requests and operations that preserve the health of workloads cannot be performed.

In here instances refer to etcd instances which form etcd cluster. In general loosing quorum will switch etcd to read only, which effectively renders k8s api read only.

Also this is common when there are 3 addressed defined as initial nodes but only 2 can communicate, thus etcd cluster was not even formed yet. This can happen when you create/restore cluster and there is not enough members to form cluster.

etcd no Leader

This is a bit different, because this can happen if nodes from the cluster are orphaned - they were part of the cluster but now they are in minority and thus can not form a cluster, for example due to network partition.

I believe it is a good catch to make runbook less ambiguous, and update it with more explicit and simpler explanation.

mehyedes commented 2 years ago

Hi @nvtkaszpir Thanks for the explanation. Doesn't that confirm that the statement below is wrong then?

When etcd does not have a majority of instances available the Kubernetes and OpenShift APIs will reject read and write requests and operations that preserve the health of workloads cannot be performed.

as the kubernetes API would still allow read requests.

nvtkaszpir commented 2 years ago

Certain read operations may work, but not all, that is, this may depend on the exposed API. For example (and I totally made this up now) you can read configmaps but you can not read pod resources.

mehyedes commented 2 years ago

I see, thanks.

I've always thought that reads wouldn't be affected at all, but seems I wasn't totally right. Are there any resources where I can read more about this in detail, in the context of "vanilla" k8s?

nvtkaszpir commented 2 years ago

Unfortunately I can not recommend anything in that matter, you have to look into this on your own, I suggest webarchive and coreOS docs.