Closed simlu closed 5 years ago
Hi @simlu,
The documentation is not wrong, but perhaps it is unclear and doesn't provide enough context. Let me offer a different framing. This is an explanation of what happens at a "logical" or "high level":
If process crashes and lock is not released, lock will eventually expire after
leaseDurationMs
from last heartbeat sent
What you wrote down is also not wrong, but what you wrote down is the "implementation level" details:
For an unreleased lock, the next request to lock will wait
leaseDurationMs
to acquire the lock and only acquire it if in that period a heartbeat hasn't altered it.
I think we could reconcile both frames and provide more context by stating:
If process crashes and lock is not released, lock will eventually expire after
leaseDurationMs
from last heartbeat sent (if any). The way this is implemented is if a client encounters any existing (released or unreleased) lock, the client will waitleaseDurationMs
to ensure that the lock is abandoned and then only acquire the lock if in that period aheartbeat
hasn't altered it or another client hasn't acquired the lock.
As to the "stuck" entry. What you are seeing is the artifact of the implementation of the fencingToken
. From lock.release(callback) documentation:
Fail open lock heartbeats stop, and its
leaseDurationMs
is set to 1 millisecond so that it expires immediately. The datastructure is left in the datastore in order to provide continuity offencingToken
monotonicity guarantee.
Having said all that, it seems that you are looking for a different expiration implementation? DynamoDB has a TTL functionality, however, it comes with the following caveat:
DynamoDB typically deletes expired items within 48 hours of expiration. The exact duration within which an item truly gets deleted after expiration is specific to the nature of the workload and the size of the table. Items that have expired and not been deleted will still show up in reads, queries, and scans. These items can still be updated and successful updates to change or remove the expiration attribute will be honored.
More on the "stuck" entry in case my response above didn't address what you've brought up. The 1 millisecond leaseDurationMs
is the implementation detail of how lock is marked "released". Every client that will attempt to acquire a lock in this "released" state will have to:
leaseDurationMs
.heartbeat
updated it and no other client acquired it.So, every fail open lock acquisition (even if it is considered "released") requires two requests and waiting the entire lease duration.
If you have no use for a fencingToken
and your client failures are very rare, then you could probably use the fail closed lock instead. As far as I understand, using a fail open lock without a fencingToken
is unsafe.
Apologies for the long silence. We've been busy and I didn't mange to get back to this until now.
I've created a pr to address our use case. This is a proof of concept to see what you think: https://github.com/tristanls/dynamodb-lock-client/pull/10/files
Would much appreciate your input, since we need to push this forward. Cheers, L~
Thank you @simlu. I think v0.4.0 implements the use case you've demonstrated in #10 ?
@tristanls That is awesome! Thank you very much! I just updated this into dy-alchemy and we are using it in all our projects now. This totally solves our use case 🎉
Closing this.
PS: Would love if this project had a dev branch. Would make reviewing release prs etc much easier. I was struggling to see a full change set of all the changes that made it into the new release.
We're using
FailOpen
and the documentation says:This is not really true. Rather the it should say
Very different behavior.
It is very unfortunate since now we can't use this in the way that we wanted to. We might have to implement our own locking library after all :(
Edit: It also seems that the entry is "stuck" in the table for-ever in this case (with a lease duration of 1). Which seems to be a bug I guess? It should be cleaned up?