Open amorey opened 3 years ago
Technically, the next release might help you (the main branch code can do it, it's not yet released).
What you're trying to achieve has never been publicly added to the project (and you might want to document it here, if you feel it's worth sharing). If someone did that, it didn't share the recipe.
Here is what I would do:
There are maybe better ways, but I don't know those, and therefore I can't help more than that.
Thanks for your quick reply! Can you point me to the feature in the next release that might help?
As a variant on your suggestion, I could also create a shell script and make it accessible to kured as an attached volume in the kubernetes manifest.
While it's probably possible, it's not the easiest.
We currently wrap the rebootCommand to escape the container namespace. So basically the command is by default something similar to nsenter ... $rebootCommand
. It's necessary so that you can restart the node (as systemctl reboot is absent from kured container). In your case, you would need to bypass that nsenter
bit, and don't wrap the command. We don't have the code yet for that. (however due to code structure, it's a one liner change!). FYI, assuming you do that one liner change, you would still need to introduce the nsenter bit in your script. I think it's a good way to do it.
Did you think of an init container, which would drop the necessary script on your hosts instead? That seems the easiest: It will be able to create the script easily, and kured can still trigger it easily.
I hadn't thought of an init container but that would work in terms of dropping the script on the host. One disadvantage I can think of is AWS permissions. If the script were to execute inside the container namespace then you could use a tool like kube2iam to set the AWS IAM role for execution but I don't think that's an option on the host.
Would you be open to a command line option to enable reboot-command execution in the container namespace (e.g. --reboot-namespace [host|container]
)? That should give more control to the user over the reboot script.
I don't mind myself (though I would probably name that argument differently). We need to find whether it makes sense to also have the sentinel (check) in the same namespace or not (and therefore add another argument). Alternatively, we could simply expose that we are doing nsenter
instead, by making sure it's included in the default command, and force our users to be explicit, rather than implicit. I am not sure yet.
I would love to discuss this in a next community meeting tbh.
In the meantime, feel free to propose a PR, and we can discuss it there : )
Assuming we go for a cli argument like wrap-commands-with-nsenter
, your change should be a 3 liner: one global var, one argument in the CLI, and one if wrap-commands-with-nsenter {}
(ok that's a multi-line, but you get the gist;) ).
Exposing the use of nsenter
sounds like a great solution.
In addition, it would be useful to have the ability to run a script after the instance restarts (e.g. to add the instance back to its Auto Scaling group). Does it make sense to add this capability to kured? Or does it make more sense to hack this into the init container script?
@amorey The problem with exposing nsenter
is user friendliness: what about the PID if it's not 1 (rancherOS)? Will people remember to put nsenter...
in front of their reboot command? I am not sure it's easily done.
For a script after instance restart, it's effectively the role of your node's init system and/or kubernetes. I wouldn't do that in kured, myself.
@amorey The problem with exposing nsenter is user friendliness: what about the PID if it's not 1 (rancherOS)? Will people remember to put nsenter... in front of their reboot command? I am not sure it's easily done.
All good questions. Let me know what you end up deciding. In the mean time there should be work arounds for the execution context issues.
For a script after instance restart, it's effectively the role of your node's init system and/or kubernetes. I wouldn't do that in kured, myself.
Makes sense. Thanks for the advice.
I just came across this kubernetes blog post from today: https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
If graceful node shutdown works as advertised then, in the future, kured should be able to offload node draining to kubernetes it might also take care of some of the issues I'm dealing with now.
In the mean time, I can trigger ELB de-registration from within kubernetes by adding a "node.kubernetes.io/exclude-from-external-load-balancers" label to the node. Can you recommend a way to do this with kured? In addition, I need to set a grace period to wait until ELB closes connections. Can I use --drain-grace-period
for this?
@evrardjp I noticed that kured sets a node lock annotation which is removed after restart. If it were possible to set a node lock label that behaved in the same way then it would be possible to remove nodes from external load balancers gracefully before restart without having to use a custom script (using the "node.kubernetes.io/exclude-from-external-load-balancers" label). Would you be open to including this feature in kured?
I just came across this kubernetes blog post from today: https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/
If graceful node shutdown works as advertised then, in the future, kured should be able to offload node draining to kubernetes it might also take care of some of the issues I'm dealing with now.
In the mean time, I can trigger ELB de-registration from within kubernetes by adding a "node.kubernetes.io/exclude-from-external-load-balancers" label to the node. Can you recommend a way to do this with kured? In addition, I need to set a grace period to wait until ELB closes connections. Can I use
--drain-grace-period
for this?
Yes, I read this recently. When kured will be supporting 1.21 and above only, we can probably remove a bunch of code.
@evrardjp I noticed that kured sets a node lock annotation which is removed after restart. If it were possible to set a node lock label that behaved in the same way then it would be possible to remove nodes from external load balancers gracefully before restart without having to use a custom script (using the "node.kubernetes.io/exclude-from-external-load-balancers" label). Would you be open to including this feature in kured?
I saw your PR. It's awesome. I like it so far :) Let's discuss it there!
This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).
/remove stale
This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).
/remove stale
This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).
/remove stale
Is there a recommended way to initiate a graceful restart of EC2 instances from kured? For my particular use case, I'd like to de-register instances from ELBs before rebooting the OS but I can't see a way to do this with the kured command line options.