hellofresh / eks-rolling-update

EKS Rolling Update is a utility for updating the launch configuration of worker nodes in an EKS cluster.
Apache License 2.0
362 stars 80 forks source link

Could not configure Kubernetes Python Client #97

Open amarbarot opened 3 years ago

amarbarot commented 3 years ago

Hi

I have been looking into this product and testing it, I am hitting this stumbling block of not being able to configure the kubernetes python client, the python client is installed, is this a known issue?, or any ways we can dig deeper in terms of what the kubernetes python client dependencies are?

[ root$] docker run -ti --rm -e AWS_DEFAULT_REGION -v "/root/.aws/config" -v "/root/.kube/us-gpd" eks-rolling-update:latest -c gpdeks1 2021-02-25 02:31:16,139 INFO Describing autoscaling groups... 2021-02-25 02:31:16,444 ERROR Could not configure kubernetes python client 2021-02-25 02:31:16,444 ERROR Rolling update of ASG has failed. Exiting 2021-02-25 02:31:16,444 ERROR AWS Auto Scaling Group processes will need resuming manually

Thanks

chadlwilson commented 3 years ago

Looks like you are not using the default KUBECONFIG location (us-gpd vs config) - you may need to override KUBECONFIG or mount it to a different location per the docs and the example?

amarbarot commented 3 years ago

Thanks @chadlwilson

I followed the docs exactly, and got it running, final question, so it correctly gets the right node count, but i see no restarts, when it says rolling update complete does it only perform it once we upgrade the agents? Should we be expecting this to rollout the worker nodes if we have not upgraded them?

[ root@gpd-terraform-builder ~ $] docker run -ti --rm -e AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION} -v "$HOME/.aws":"/root/.aws" -v "$HOME/.kube/us-gpd":"/root/.kube/config" eks-rolling-update:latest -c gpdeks1 2021-02-25 15:16:25,772 INFO Describing autoscaling groups... 2021-02-25 15:16:27,483 INFO Getting k8s nodes... 2021-02-25 15:16:27,677 INFO Current k8s node count is 18 2021-02-25 15:16:27,678 INFO All asgs processed 2021-02-25 15:16:27,679 INFO Rolling update of all asg is complete!

chadlwilson commented 3 years ago

That looks like a "no-op" run, so it probably thinks all nodes are up-to-date - that is they are already using the latest launch template version specified for the ASG(s), so there are no nodes to roll. There is age-based support with RUN_MODE=4 and MAX_ALLOWABLE_NODE_AGE if that's what you are looking for.

You could try "touching" the launch template by modifying it without making new changes, so it saves a new version. You can then run with -p to "plan" and you should see the tool identifying your nodes as being out of date. If you run it for real by removing -p you'll see it trying to drain and terminate nodes.

Might want to do it on a smaller test cluster first, or a single ASG if you have multiple sharing the same launch template, using ASG_NAMES filter :-)

amarbarot commented 3 years ago

excellent,

so this is what ill be after is something like this, as we want a pipeline where we can manually execute this job at any time: RUN_MODE=4 and MAX_ALLOWABLE_NODE_AGE=0

Do you have any examples of how we are supposed to pass these variables in?, are they just variables to add to our docker exec command?

chadlwilson commented 3 years ago

They are just env vars that need to be available to Python, so when running with Docker can supply them the same way you supply any env vars into a container (with -e, or --env-file etc)