Options to initiate rescheduling on another node if agent dies

timowevel1 commented 2 years ago

Hey,

do you have any suggestions how it can be added that a pod gets rescheduled on another agent if an agent dies? I think thats the idea behind HA, but I didnt figure out yet how to achieve it on K3S with Hetzner.

If I create a nginx pod and I kill the node (simulate downtime of the node) it wont get rescheduled. Do you have any ideas for this, maybe to add it into this project?

Thanks.

mysticaltech commented 2 years ago

@timowevel1 Just FYI, don't hesitate to use discussions for these kind of questions!

About your question above, it's best to take a peek at the ingress-nginx helm values file, especially this part:

https://github.com/kubernetes/ingress-nginx/blob/0f5bf530ae677a2f389d6f06de27b6a5cf910621/charts/ingress-nginx/values.yaml#L194

So by default, Nginx launches with a Deployment and not a DaemonSet, but for your described ideal behavior, a DaemonSet would suit you best. Please note that this param comes under the controller section, and you change the current config via a HelmChartConfig definition yaml file you just applied; please check the readme for more details.

If you want to keep using a Deployment, you instead need to change the replica count, search for replicaCount in the values file, and you can also set the minUnavailable, value which is 1, which would explain why it did not re-deploy when you killed it. Remember, all of the above values come under controller.

timowevel1 commented 2 years ago

@timowevel1 Just FYI, don't hesitate to use discussions for these kind of questions!

About your question above, it's best to take a peek at the ingress-nginx helm values file, especially this part:

https://github.com/kubernetes/ingress-nginx/blob/0f5bf530ae677a2f389d6f06de27b6a5cf910621/charts/ingress-nginx/values.yaml#L194

So by default, Nginx launches with a Deployment and not a DaemonSet, but for your described ideal behavior, a DaemonSet would suit you best. Please note that this param comes under the controller section, and you change the current config via a HelmChartConfig definition yaml file you just applied; please check the readme for more details.

If you want to keep using a Deployment, you instead need to change the replica count, search for replicaCount in the values file, and you can also set the minUnavailable, value which is 1, which would explain why it did not re-deploy when you killed it. Remember, all of the above values come under controller.

Thank you for this clarification. How is Traefik deployed usually? Also as a deployment? If yes, how is it guaranteed that http request always reach the cluster as there could be nodes with no traefik running then

Yes, I am sorry thats why I closed these issues, I will use the discussions, thank you!

mysticaltech commented 2 years ago

That's a good question about Traefik. We rely on the default setup that k3s does, so the best way to answer this question would be to introspect the helmchart and helmchartconfig with kubectl get and then kubectl describe for those belonging to Treafik. And same for the describe on its pods.

When you find the answer, please let me know.

mysticaltech commented 2 years ago

@timowevel1 The defaults have now been improved and made HA if possible. Good catch! Just released in v1.5.3.

timowevel1 commented 2 years ago

That's a good question about Traefik. We rely on the default setup that k3s does, so the best way to answer this question would be to introspect the helmchart and helmchartconfig with kubectl get and then kubectl describe for those belonging to Treafik. And same for the describe on its pods.

When you find the answer, please let me know.

The reason I use nginx is that I am more familiar with it and didnt get http to https redirects work with traefik. Normally the controller should get scheduled on every node

timowevel1 commented 2 years ago

That's a good question about Traefik. We rely on the default setup that k3s does, so the best way to answer this question would be to introspect the helmchart and helmchartconfig with kubectl get and then kubectl describe for those belonging to Treafik. And same for the describe on its pods.

When you find the answer, please let me know.

The reason I use nginx is that I am more familiar with it and didnt get http to https redirects work with traefik. Normally the controller should get scheduled on every node

mysticaltech commented 2 years ago

@timowevel1 No not at all, the controller gets scheduled on every node only when you choose kind=DaemonSet in the helm chart values. See the explanations above.

At least in the new release, it will be HA by default if you have enough nodes.

But YOU CAN set kind=DaemonSet for yourself. If you want to redeploy, you can create a custom nginx values file at the root of your module. See the kube.tf.example for more info. Good luck!

kube-hetzner / terraform-hcloud-kube-hetzner

Options to initiate rescheduling on another node if agent dies #318