kube-rs / kube

Rust Kubernetes client and controller runtime
https://kube.rs
Apache License 2.0
3.03k stars 313 forks source link

Support Leaderelection like client-go #485

Open ZhiHanZ opened 3 years ago

ZhiHanZ commented 3 years ago

Hi there, I am pretty interested in this k8s client rewritten by rust, Does there exists a way to do leader election using kube-rs? just like https://github.com/kubernetes/client-go/blob/master/tools/leaderelection/leaderelection.go

clux commented 3 years ago

There's currently no support for that in here. It's a bit of a peripheral api, and it's also marked as alpha.

It looks a bit involved, but not beyond the scope of what we could implement at least in an example initially.

It is specific behaviour against certain kubernetes objects, and we don't do a lot of that in these crates. In particular, it looks like it's using coordination.k8s.io/v1 leases, pods and endpoints, so not sure it should ever live in here.

clux commented 3 years ago

On the other hand, if it does, it would be more as part of proposed utilities.

thedodd commented 3 years ago

Implementing this following the same pattern as the K8s core controllers should be rather straightforward. The Go code will translate well enough to Rust (the Rust code will quite likely be a bit more simple though).

@clux @ZhiHanZ I'm going to be implementing this in my application code soon, but I can keep it well isolated enough that I could just copy/paste it over to a new PR here so that other folks can use it. Any preferences on what this should look like, organization of code, etc?

xrl commented 3 years ago

I needed some advisory locks for my web app. I needed to make sure a namespace was "locked" to avoid users from stepping on eachother.

Here's my "lease" code: https://gist.github.com/xrl/3c5727e30e78ae300539fd93defc031b

TODO:

If this code seems useful, what's the best way to contribute it?

nightkr commented 3 years ago

@xrl Cool! This seems like something that we ought to have in kube_runtime. Haven't had the time to go through all of it, but a few notes:

Poll for the lease then acquire it

Any reason not to watch instead?

Spawns background task for renewing the lease

This feels a bit dangerous, since renewing the lease is supposed to mean "yes I'm still here and doing useful work", which seems like something that the application ought to signal explicitly.

I'm also personally a bit wary of spawning background tasks without the user's explicit consent (see https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/).

Joining a lease will shutdown the background task and zero out lease timestamps

Looks like this happens asynchronously on Drop as well? Is there any danger in the user trying something like:

loop {
    Lease::acquire_or_create(...).await?
}

can I define a async block elsewhere than inline with tokio's spawn?

Yes, async { ... } is just a regular expression (that happens to have an unpronouncable type that implements Future).

If this code seems useful, what's the best way to contribute it?

Feel free to send a PR to kube_runtime, or publish it as an independent crate.

xrl commented 3 years ago

Any reason not to watch instead?

The lease holder could fail to update the lease (they crash, bug in lease implementation) and no updates would propagate through the watcher. I could implement a watcher to see if a lease is updated before expiration (the lease holder is done and does the right thing, releasing the lease). I simplify my implementation by doing a relatively short lease duration and just checking more often.

This feels a bit dangerous, since renewing the lease is supposed to mean "yes I'm still here and doing useful work", which seems like something that the application ought to signal explicitly.

I think if you have a valid handle on the lease task (similar to a mutex guard) you are signaling "I am still here using this". Otherwise the lease holder will have to implement timing logic to periodically refresh their handle and then that's work I don't want to sprinkle throughout critical sections.

I'm also personally a bit wary of spawning background tasks without the user's explicit consent (see https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/).

Great point. I would like to move the renewal loop to an async function and perhaps just document "if you want to periodically refresh the token, tokio::spawn(lease.renewal_loop), or perhaps lease.spawn_renewal_loop. What do you think between those APIs?

Looks like this happens asynchronously on Drop as well?

This is a slightly different variant, you drop the Sender type and the Reader in the background job will see a "other end of channel dropped" error type. Which to me seems like a exceptional safeguard and regular use should explicitly send a signal.

loop snippet

I believe this is fine, it will just create a lot of spawned tasks and extra kube API calls. but it should be fine.

I just shipped this to my users yesterday, I'll burn it in and then consider how to move it to a PR for kube_runtime. First get it working, then get it pretty! Thanks for the feedback!

clux commented 3 years ago

@xrl did you get somewhere with this? it would still be a nice feature for the runtime :-)

xrl commented 3 years ago

I've been a happy Lease user since opening this issue. I would be happy to contribute it back to the project sometime next week

On Oct 22, 2021, at 5:05 PM, Eirik A @.***> wrote:

 @xrl did you get somewhere with this? it would still be a nice feature for the runtime :-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Dentrax commented 2 years ago

Looking forward to this feature! Here is an example usage in Go: https://programmer.group/implementation-of-leader-election-in-k8s-client-go.html @xrl Your gist example seems cool! Have you tried it on production clusters?

xrl commented 2 years ago

@Dentrax yes, I use my bit of code in a production app. we have an in-house app release tool, built on actix-web, which uses advisory locks to keep our dev team from running concurrent deploys.

dm3ch commented 11 months ago

As far as I understood there're already 2 PRs with 2 different implementations of this feature.

Is there any chance that one of them would be merged soon?

clux commented 11 months ago

None of the current suggested implementations were in any condition to be merged and it's been a very difficult problem to tackle in a way that suits everyone so going to unmark this as a thing we plan on doing in the near-future.

There were actually 4 PRs trying to do this (in wildly different ways), so listing them here as they now have been closed to reduce confusion;

I think we will re-visit this later, but this is solvable outside kube. We also have more important issues to solve at the moment, and this has been historically unfruitful for us.

Feel free to link to alternatives below, like kubert's lease impl - however this one is out of date with kube.

thedodd commented 11 months ago

Given that this issue is about a client-go like impl, https://github.com/kube-rs/kube/pull/1056 is quite literally a mirror of client-go's impl (adapted to Rust/kube-rs). It focuses just on the use of the coordination.io resources, but we can easily expand it to work with other resources (though I don't think there is any reason to do so given that we have no legacy code which depends upon this).

I don't think there is any confusion. The other PRs were trying to do other, seemingly different things. #1056 literally just implements what client-go implements.

That said, I totally respect if folks would like to have #1056 moved to a separate crate for now.

thedodd commented 11 months ago

Ok, moved the impl in #1056 to a dedicated crate: https://github.com/thedodd/kube-coordinate