How to syncronize request handlers (ie. create a critical section)

nolar / kopf

A Python framework to write Kubernetes operators in just a few lines of code

https://kopf.readthedocs.io/

MIT License

2.02k stars 154 forks source link

How to syncronize request handlers (ie. create a critical section) #1049

Open fireflycons opened 10 months ago

fireflycons commented 10 months ago

Keywords

No response

Problem

Hi,

I'm new to kopf framework (but not Python or the concept of operators).

My assumption is that event handlers are dispatched async and as a fork (much like gunicorn) so that requests on multiple instances of a watched GVK (in this case namespaces) can be processed simultaneously.

I've been given an existing kopf operator to optimize and that change is a requirement to cache a Vault token being used by the application to prevent it performing a vault login every time the handler is invoked.

Given my assumption, there is going to be a race condition if I find I need to refresh the vault token and re-cache it (in a kube secret). Are there any mechanisms within kopf to allow me to create a critical section around the code that will refresh the vault token, or do I need to roll my own e.g. with multiprocess?

Thanks

nolar commented 10 months ago

Hi. By “Vault”, do you mean Kopf’s Vault (an internal class not exposed to users), or some other vault?

If the former (Kopf’s vault), the re-authentication process is already secured and all other API operations are blocked until it is finished, as far as I remember. The existing connections, e.g. watch-streams, will continue working and calling handlers until disconnected.

If the latter (your vault), the global lock, so as global anything (e.g. condition, list, dict, set) can be done via the “global memo” with the global object created in the startup handler and used in all the object handlers. The global (on-startup) “memo” is shallow-copied for each k8s object, so the local keys/attrs remain local, but the initial values and all mutable structures remain shared. See the docs for details and examples.

fireflycons commented 10 months ago

Hi, Thanks for the reply. By "Vault", I mean Hashicorp Vault.

Do you have some links into the docs where this sync mechanism is explained? I basically need to block all threads accessing the piece of code that authenticates with Vault while the Vault token is renewed if needed, so it's basically a mutex operation I need.

The lifetime of the vault token is less than that of the operator, so on entry to the event (it's on on.timer) I need to check the validity of the token and possibly renew it, then update the kube secret that will store it.

fireflycons commented 10 months ago

Or are you suggesting that a memo class may be used as the storage for the token - and this will survive evictions, and the value stored may be updated at runtime?

That would save me from having to manage a kube secret myself, however I still need to block during token renewal.

nolar commented 10 months ago

Maybe. I do not know how Hashicorp Vault & its tokens work. The memo's lifecycle is no longer than the operator's lifecycle: when it exits, the memo is lost. If that is okay, then yes, you can use the memo also to store the token.

But mind that you probably want to store it not directly, but in a 1-item list/set/dict or your own mutable object — since you are going to update it from time to time, and other k8s objects should see the updated value.

What I meant originally is storing an asyncio.Lock/threading.Lock object there, which is then used to prevent multiple handlers from refreshing the token in parallel at once.

https://kopf.readthedocs.io/en/stable/memos/#operator-memos

fireflycons commented 10 months ago

Cool. Makes sense now.

Create one of these locks in the operator startup (in on.startup is ok?) and store in the memo.
Lock/Unlock where I deal with the token acquisition
Then I'll store the token itself in a kube secret so it survives operator pod restarts.

fireflycons commented 10 months ago

Hi @nolar

One last thing. I'd like to expose Prometheus metrics. I found https://github.com/nolar/kopf/issues/289, however have you thought any further on this since then?

nolar commented 10 months ago

I was overly busy with work & life & German bureaucracy in the past couple of years, but I am slowly getting back to Kopf (not full-time of course). A long sequence of refactorings and improvements is being prepared, after which I will be able to look at new features. No timeline though.