Looking for documentation for the effects/load of using the Kubernetes resource lock

abatilo commented 4 years ago

Hey all,

This project looks really neat and conceptually, I love the idea of just making a ConfigMap or similar as my locked resource.

Has there been any testing/verification on the scalability of an approach like this? How long does it take to acquire a lock? How does the control plane behave when you have multiple services trying to ask for the same lock?

Has the feature been extensively tested?

Thanks!

distorhead commented 4 years ago

Hi!

The basic mechanics behind distributed lockgate locker (whether it is kubernetes or http) is that it uses some key-value storage with optimistic locking. Key-value storage with optimistic locking means that one process that tries to update some value by some key:

Should first read current value by this key and handle its value.
May receive an error like "this key has been changed" on update operation by this key if value has been changed since first read. In this case process should reread current value again and handle this new value appropriately.

In the case of kubernetes-locker all locks info stored in such storage with optimistic locking support — resource annotations. Kubernetes itself uses etcd to implement such locks.

We are using this library in the werf project: https://github.com/werf/werf. Which is a cli tool to perform CI/CD operations. It needs distributed locks to implement distributed caching of built images. This library is good enough for such short lived build processes and does it job well.

Note that a client that makes use of lockgate distributed locking should implement a procedure to recover from exceptional situation when a lock lease has been lost. A client which holds a lock should renew given lease periodically or this lock will be overtaken by another client. A client that lost lease due to network problems or smth else will receive exception response from server and should handle this situation. In the werf we crash current image build process immediately when such exception has been detected.

abatilo commented 4 years ago

If I'm understanding correctly, you wouldn't recommend using the Kubernetes locker to do a long term lock that would be used for a case like leader election. Is that right?

distorhead commented 4 years ago

I think for such task it is better to use etcd itself, rather than lockgate.

abatilo commented 4 years ago

Understood! Thank you.

werf / lockgate

Looking for documentation for the effects/load of using the Kubernetes resource lock #30