Documentation Incorrect?

simlu commented 6 years ago

We're using FailOpen and the documentation says:

if process crashes and lock is not released, lock will eventually expire after leaseDurationMs from last heartbeat sent

This is not really true. Rather the it should say

For an unreleased lock, the next request to lock will wait leaseDurationMs to acquire the lock and only acquire it if in that period a heartbeat hasn't altered it.

Very different behavior.

It is very unfortunate since now we can't use this in the way that we wanted to. We might have to implement our own locking library after all :(

Edit: It also seems that the entry is "stuck" in the table for-ever in this case (with a lease duration of 1). Which seems to be a bug I guess? It should be cleaned up?

tristanls commented 6 years ago

Hi @simlu,

The documentation is not wrong, but perhaps it is unclear and doesn't provide enough context. Let me offer a different framing. This is an explanation of what happens at a "logical" or "high level":

If process crashes and lock is not released, lock will eventually expire after leaseDurationMs from last heartbeat sent

What you wrote down is also not wrong, but what you wrote down is the "implementation level" details:

For an unreleased lock, the next request to lock will wait leaseDurationMs to acquire the lock and only acquire it if in that period a heartbeat hasn't altered it.

I think we could reconcile both frames and provide more context by stating:

If process crashes and lock is not released, lock will eventually expire after leaseDurationMs from last heartbeat sent (if any). The way this is implemented is if a client encounters any existing (released or unreleased) lock, the client will wait leaseDurationMs to ensure that the lock is abandoned and then only acquire the lock if in that period a heartbeat hasn't altered it or another client hasn't acquired the lock.

As to the "stuck" entry. What you are seeing is the artifact of the implementation of the fencingToken. From lock.release(callback) documentation:

Fail open lock heartbeats stop, and its leaseDurationMs is set to 1 millisecond so that it expires immediately. The datastructure is left in the datastore in order to provide continuity of fencingToken monotonicity guarantee.

Having said all that, it seems that you are looking for a different expiration implementation? DynamoDB has a TTL functionality, however, it comes with the following caveat:

DynamoDB typically deletes expired items within 48 hours of expiration. The exact duration within which an item truly gets deleted after expiration is specific to the nature of the workload and the size of the table. Items that have expired and not been deleted will still show up in reads, queries, and scans. These items can still be updated and successful updates to change or remove the expiration attribute will be honored.

tristanls commented 6 years ago

More on the "stuck" entry in case my response above didn't address what you've brought up. The 1 millisecond leaseDurationMs is the implementation detail of how lock is marked "released". Every client that will attempt to acquire a lock in this "released" state will have to:

Get current lock state and observe 1 millisecond leaseDurationMs.
Wait 1 millisecond.
Attempt to acquire the lock making sure that nothing changed, that is, no heartbeat updated it and no other client acquired it.

So, every fail open lock acquisition (even if it is considered "released") requires two requests and waiting the entire lease duration.

If you have no use for a fencingToken and your client failures are very rare, then you could probably use the fail closed lock instead. As far as I understand, using a fail open lock without a fencingToken is unsafe.

simlu commented 6 years ago

Apologies for the long silence. We've been busy and I didn't mange to get back to this until now.

I've created a pr to address our use case. This is a proof of concept to see what you think: https://github.com/tristanls/dynamodb-lock-client/pull/10/files

Would much appreciate your input, since we need to push this forward. Cheers, L~

tristanls commented 6 years ago

Thank you @simlu. I think v0.4.0 implements the use case you've demonstrated in #10 ?

simlu commented 5 years ago

@tristanls That is awesome! Thank you very much! I just updated this into dy-alchemy and we are using it in all our projects now. This totally solves our use case 🎉

Closing this.

PS: Would love if this project had a dev branch. Would make reviewing release prs etc much easier. I was struggling to see a full change set of all the changes that made it into the new release.

tristanls / dynamodb-lock-client

Documentation Incorrect? #9