palantir / atlasdb

Transactional Distributed Database Layer
https://palantir.github.io/atlasdb/
Apache License 2.0
52 stars 9 forks source link

Lock refreshing should hold weak references to locks and expire any not strongly held #4350

Open jkozlowski opened 4 years ago

jkozlowski commented 4 years ago

The point would be to guard against people using the raw lock APIs when they're acquiring a few other resources as well. If they acquire the lock, but then something else fails as part of the same initialization procedure, the HeldLocsToken will be leaked and it will just get refreshed until the service is restarted.

We should detect when there are no more references to the lock token and release it, along with a loud WARNING log, so the service can recover appropriately, without causing an immediate issue.

jkozlowski commented 4 years ago

@clockfort

jkozlowski commented 4 years ago

From a cursory look this feels hard, without changing a few things.

  1. All the types (HeldLocksToken, LockRefreshToken) are API types.
  2. In order to do this, we need some type whose strong count can be the same across refreshes; technically this is the BigInteger tokenId, but right now that is a BigInteger. Likely just anchoring that that might just work, but feels icky.