ltagliamonte-dd commented 4 years ago

I was wondering if you have already considered to use an init container instead of a sidecar one to ensure consul registration before a pod is marked "ready". If you already run a pool of sidecars will save a bit of resources.

as it is today the sidecar code can't be used as init container, because it also manages the deregister call.

Any thoughts?

jacksontj commented 4 years ago

Thanks for the question! As you mentioned the goal of the sidecar is to ensure registration of the service to consul (for anyone else checking this out later, this is covered in more detail in our blog post -- https://medium.com/wish-engineering/katalog-sync-reliable-integration-of-consul-and-kubernetes-ebe8aae0852a).

K8s init containers are run before any of the other containers in the pod are launched. Which means if we were to run the "sidecar" as an init container it would be unable to determine the "readiness" of the pod as the rest of the containers won't start until the init container closes.

A couple other things to note (1) the sidecar is optional (it just does coordination) and (2) the sidecar should have very low resource requirements (we run it on a lot of our services, the memory/cpu footprint is tiny).

Hopefully that covers the question?

ltagliamonte-dd commented 4 years ago

@jacksontj thank you for the reply I went over the blog post again with more attention and now I have full understanding of why the sidecar: you want to stop a rollout to happen if you can't sync in the consul cluster you are marking the container in the pod not ready, so the rollout should stop itself.

Anyway if this situation happens you are in a degraded state that is directly proportional to the maxUnavailable configured for the deployment am I right?

I have another question now :D : Normal rollout, you need to remove a pod, what happens if the local daemonset process (the one that queries the kubelet api) can't update the local agent? Kubernetes is going to remove the pod and consul will have a stale entry? Do you have any protections in this scenario?

jacksontj commented 4 years ago

Anyway if this situation happens you are in a degraded state that is directly proportional to the maxUnavailable configured for the deployment am I right?

Sortof, the issue is not so much with how fast k8s rolls out the pods (since the max surge/unavailable deals with that) but with how fast k8s rolls it out as seen by consul. So the (optional) sidecar enables users to tie the sync with consul to the k8s rollout state (such that it is in-sync with what consul sees)

Normal rollout, you need to remove a pod, what happens if the local daemonset process (the one > that queries the kubelet api) can't update the local agent? Kubernetes is going to remove the pod and consul will have a stale entry? Do you have any protections in this scenario?

This is actually a great question, and something we have thought about (and something we've hit before). The recent changes are in https://github.com/wish/katalog-sync/pull/28.

There are basically 2 failure modes to worry about (1) kubelet API missing/unavailable and (2) consul-agent API missing/unavailable. For both of these there are metrics (error counts as well as latency).

1 in the event that the kubelet API is missing katalog-sync will continue to sync the last-seen state. Otherwise you can get into a situation where the kubelet restarts and everything gets dropped from consul

2 in the event of an issue talking to consul we are unable to update the agent -- and the services' TTLs will expire (meaning the services will de-register from consul). This is somewhat expected as the consul-agent being unavailable usually means the node is bad or at a minimum that consul is having an issue (such that it can't sync state remotely -- which is no worse than direct sidecar consul).

Hopefully that answers the questions :)

ltagliamonte-dd commented 4 years ago

thank you for the great reply @jacksontj I have another followup question :D In a normal setup services will talk to each other using the clusterIP provided by the k8s service. when a deployment is triggered all traffic keeps going to the clusterIP, new iptables rules gets created, existing traffic keep being forwarded because still has entries in the conntrack table and new traffic "automagically" reaches the new pods because of the new iptables rules.

If you index podIP instead of services how do you deal with deployments at app level? Do you have your apps continuously refresh DNS? Do you use consul discovery? do you have a service mesh that takes care of routing?

jacksontj commented 4 years ago

If you index podIP instead of services how do you deal with deployments at app level? Do you have your apps continuously refresh DNS? Do you use consul discovery? do you have a service mesh that takes care of routing?

Fundamentally you need to re-discover, whether that is through consul's DNS interface, the consul API, or a service mesh. Any of these mechanisms work -- it'll be up to your setup what is preferable.

wish / katalog-sync

[Question] init container Vs sidecar service #33

1 in the event that the kubelet API is missing katalog-sync will continue to sync the last-seen state. Otherwise you can get into a situation where the kubelet restarts and everything gets dropped from consul