kubereboot / kured

Kubernetes Reboot Daemon
https://kured.dev
Apache License 2.0
2.12k stars 201 forks source link

Cron schedule for node reboots #891

Open robinbobst opened 5 months ago

robinbobst commented 5 months ago

Hi there

I would like to use Kured because it sounds like a useful tool which would be very beneficial in our environment

But we have the problem that in our environment node restarts are only allowed on certain maintenance windows which happen every month at the same time.

Is it planned to extend Kured with a cron schedule in the future.

So that these reboots can be scheduled using cron schedules. example: 5 4 2 * *

At the moment you can only configure on which days kured is allowed to reboot the nodes.

Thank you for your feedback and efforts

ckotzbauer commented 5 months ago

Hi @robinbobst, thanks for your request. There were some similar requests in the past, but it was not implemented yet, it seemd that the need from the community was not that big. However, it would be possible to implement a cron-like feature.

One important question which came to my mind: Now (with the reboot-days and -time) there's a timeframe where kured is allowed to reboot nodes. A cron works differently and specifies a recurring timestamp, not a timeframe. As the cluster normally has multiple nodes and the full-reboot can take several hours, a single timestamp, described as a cron would not be enough...

robinbobst commented 5 months ago

Hi @ckotzbauer

Thanks for your fast response.

Yes you are right regarding the timestamp. My idea was that this could be combined with the existing options --start-time & --end-time

Just like with the option --reboot-days you specify on which day the node get rebooted and with the --start-time & --end-time options you specify in what time-window this should occur on this specific day.

I would look at the cron feature as an extension of the --reboot-days option.

The config would look something like this:

--cron "5 4 2 * *"
--start-time "18:00:00"
--end-time "23:59:59"
--time-zone "UTC"
ckotzbauer commented 5 months ago

Okay, thanks for the explanations. I'm now sure that the cron-pattern is the wrong way to achieve this. When it should be used together with the existing flags, its not very transparent to the user what would happen and which of the (partially overlapping) behaviours takes place.

You can't use the reboot-days flag because the maintenance windows are always on the same day-of-month (which means every weekday can be used). So I think a new flag to specify the reboot-day-of-month instead of the reboot-days would be much more sufficient. WDYT?

robinbobst commented 5 months ago

Hi @ckotzbauer

You are right, I was so focused on my usecase that I didn't even pay attention to the other fields of the cron.

Yes a new flag with reboot-day-of-month would definitely be more effective in solving the problem.

ant31 commented 5 months ago

We have same usecase, but with slightly different requirements. How about a "max-reboot-frequency: Ndays" It would play well with the current configuration flag, and in addition it would check that last reboot was not executed less than "Ndays".

That way it's easy to configure "reboot every Mondays, from 2am to 7am at most every two weeks" or "every 3 months"

Doesn't colide with day-of-month, both could work together nicely. For our usecase I am more interested to limit the frequency.

leonnicolas commented 4 months ago

Not exactly --max-reboot-frequency, but I think it is what @ant31 suggests: https://github.com/kubereboot/kured/pull/904

ckotzbauer commented 4 months ago

@robinbobst @ant31 I like the approach of @leonnicolas and I think it might cover your needs as well, what do you think?

ant31 commented 4 months ago

@ckotzbauer, yes it matches 100% the feature request, thank you for the quick review !

Not sure it matches @robinbobst.

github-actions[bot] commented 2 months ago

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).

github-actions[bot] commented 2 weeks ago

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).