tristanls / dynamodb-lock-client

A general purpose distributed locking library built for AWS DynamoDB.
MIT License
50 stars 19 forks source link

Feature/range key support #26

Closed jepetko closed 4 years ago

jepetko commented 4 years ago

This will add support for range keys as part of the lock.

Use case: The use case is to design an 1:n relationship between entities. Let's say you want to use the product identifier as the hash key. The mutation of the product serves as the range key.

HASH_KEY (product) RANGE_KEY (product mutation)
1 A
1 B
1 C
2 D
2 E

Since for updating the product mutation a remote HTTP call needs to be performed, you will want to lock it for this period of time in order to avoid race conditions. Let's say the mutation update is related to the mutation 1-B. In this case both, 1-A and 1-C, don't need to be locked.

The PR includes:

Usage without range key:

failOpenClient.acquireLock('myHashKey', (error, lock) => {
....
}

Usage with range key:

failOpenClient.acquireLock(['myHashKey', 'myRangeKey'], (error, lock) => {
....
}

Missing points:

tristanls commented 4 years ago

Hi @jepetko ,

Wow! Tests \(^_^ )/! That's awesome! This module needs some of those, thank you!

With the use case you've outlined, it seems to me that the problem being solved is to guarantee a unique lock per product + product mutation combination. The way I would think of solving this problem is to concatenate product and product mutation to create the lock key. Using your example, if I want to lock product 1 mutation B, I would use 1-B as the unique identifier for the lock.

Distributed locks are advisory only. One implication is that interpretation of lock name is always up to the clients and depends on agreed upon convention. Is there something that prohibits using the convention of "concatenate product with product mutation using - character to create the lock name" in this use case?

Cheers,

Tristan

simlu commented 4 years ago

Tests! That is awesome!

When this is in I'd love to start using node-tdd on top of mocha. I think there is a lot of value here in seeing what requests are actually being made. We can discuss all that separately though

Tldr I don't need this feature, but having tests is awesome and I can't wait to be doing more with them 👌

jepetko commented 4 years ago

hi @tristanls, @simlu ,

first of all, thank you for your feedback!

@tristanls

What you basically propose is that we should use some composed key to temporarily create an entirely new record in the table for locking purposes only. The actual record would be still "unlocked" since .acquireLock would serve as "programmatic" guard for operations on the table.

However, this wouldn't work because of some "NOT NULL constraint " on the range key. In our example, the lock attempt 1_B_LOCK (for locking hash_key=1 + range_key=B) would fail. The attempt 1_LOCK/B_LOCK would succeed.

HASH_KEY (product) RANGE_KEY (product mutation) data owner guid fencingToken leaseDurationMs (more columns)
1 A {some: data, ...} null null null null ...
1 B {some: data, ...} null null null null ...
1 C {some: data, ...} null null null null ...
2 D {some: data, ...} null null null null ...
2 E {some: data, ...} null null null null ...
1_B_LOCK null null root abc 1 10000 ...
1_LOCK B_LOCK null root abc 1 10000 ...

The exception thrown by DynamoDB (in get) for the request

{
  "TableName": "my-lock-table-name",
  "Key": {
    "myPartitionKey": "lockId"
    // myRangeKey is missing here..
  },
  "ConsistentRead": true
}

is

'The number of conditions on the keys is invalid'.

So, obviously range_key needs to be provided (I added a dedicated test for it even if it does not test the actual functionality but the behavior in context of DynamoDB usage).

PS: One possible solution could be to use a dedicated table for locking.

tristanls commented 4 years ago

As far as I understand, PutItem, and in case of DocumentClient: put, does not do a partial update.

On the first FailOpen lock refresh, the data will be overwritten with nothing (line 211, line 260).

Additionally, the first time application does PutItem with data all lock state is overwritten with nothing.

I believe to get a partial update, usage would need to change to use UpdateItem, and in case of DocumentClient: update.

PS: One possible solution could be to use a dedicated table for locking.

This has been my underlying assumption. I did not realize you wanted to maintain the locks along with the data in the same table. This complicates the design somewhat and would require refresh lock heartbeat to PutItem all of the data, not only the lock state, same with lock release logic. (or change to UpdateItem)

If the constraint is to maintain data and the lock state in the same table, then yes, that requires range key support.

Unless I am misunderstanding the code changes, it doesn't seem like the changes return non-lock data (entire Item is present in dataBag.lock but never returned in the callback), in the example, the data column. As such, it still requires a separate round trip to return data once a lock is acquired:

FailOpen.acquireLock(...)
--> getItem (1 request/reply)
--> putItem (1 request/reply)
OtherCode.getDataFromDynamoDB(...) (1 request/reply)

(I am assuming that Conditional Writes are not sufficient for the use cases you have in mind.)

Exposing the original Item in return callback has the benefit of saving a round trip to DynamoDB:

FailOpen.acquireLock(...)
--> (1 request/reply) getItem
--> (1 request/reply) putItem

At this point, you already have data because of first getItem(). The non-lock-related attributes of the Item (like data in the example), if they were exposed, become safe to use after putItem() succeeds and the callback returns the acquired lock.

Even with all of the above, to me, this implies that your application code has to be aware of the details of the lock implementation and never update your data without coordinating with any ongoing lock activity, since every data manipulation like PutItem, and maybe DeleteItem, you'd have to update the lock attributes correctly. Otherwise, the application would be restricted to only doing UpdateItem to ensure lock state is not overwritten, and you'd have to enforce reserved attribute names to prevent collision with lock attributes.

Disclaimer: It took some time to do this write up, but I did it mostly in a single pass. If I made a mistake somewhere and my reasoning and conclusions break down because of it, please do point it out.

As a consequence of the above, I recommend keeping locks and the data separate. However, I think I've outlined the changes necessary to get this to work if the same-table constraint is to be retained.

jepetko commented 4 years ago

Hi @tristanls ,

yes, there is an additional business logic for the locking on our side (sorry, it's typescript):

public async retrieveProductDetails(req: GetProductRequest, retryNumber = 1): Promise<ProductResponse> {
    let lock: Lock;
    try {
        lock = await this.db.getLock(req.product, req.mutation);
        return this._retrieveRemoteProductDetails(req);
    } catch (e) {
        if (!lock) {
            await this.db.blockingSleep(this.config.LOCK_LEASE_DURATION + this.config.LOCK_LEASE_DURATION / 2);

            const record = await this.db.read(req.product, req.mutation) as ProductRecord;
            const dateTime = moment(record.expirationTime).utc();
            if (dateTime.isAfter(moment().utc())) {
                return record.product;
            }
            if (retryNumber === 3) {
                return Promise.reject('max. number of locking trials exhausted');
            }
            return this.retrieveProductDetails(req, ++retryNumber);
        }               
    } finally {
        if (lock) {
            await this.db.releaseLock(lock);
        }
    }
}

Considerations:

So, basically, yes, the actual data and the locks are kept in the same table (business requirement, infrastructural restrictions etc.).

The extension of dynamodb-lock-client in the PR actually assumes that the user knows that the data gets overridden if lockId === hash key of the saved entity because the actual data and the lock cannot be combined in one record. The added functionality only adds support for the usage of the range key as such.

tristanls commented 4 years ago

As I understand it, the forcing constraint is:

and with that being the case and the table used having both hash and a range, dynamodb-lock-client is unusable.

Thanks for helping me understand @jepetko .

jepetko commented 4 years ago

Hi @tristanls,

thank you for your support! I don't know whether closed lock is widely used but I could extend it accordingly as well. Of course, I can update the README too.

Further ideas:

Awesome teamwork. Thank you for that.

jepetko commented 4 years ago

Hi @tristanls , the PR has been already merged; for when do you plan to release it? As temporary solution I'm using a manipulated copy of this repo (in production) and would like to switch to the npm package. Thank's a lot. Let me know if I could contribute.

tristanls commented 4 years ago

Thank you for your patience @jepetko . I integrated and published 0.7.1.

Please note that I modified the interface from an array to an object. I opted for explicit parameter naming in the style of DynamoDB.DocumentClient. It's an arbitrary change but wanted to call it out. Additionally, I use "sort key" instead of "range key" in line with what I think is the latest DynamoDB vocabulary.

I also used a test framework I'm comfortable with, so the quality of tests now in the module may not be what you expect since I made them mostly unit tests instead of relying on an in memory db.

jepetko commented 4 years ago

Hi @tristanls ,

thank you for this release!

I already migrated our stuff to the released npm package. I'm totally fine with switching from mocha to jest. I'm also fine with providing an object (range key, sort key) as configuration for the lock. It's actually much better than an array.

The reason why I was using a real database is the higher confidence when writing DynamoDB statements. I already experienced bugs related to bad statements even with passing tests because mocking/stubbing/spying was used instead of the memory DB.

simlu commented 4 years ago

@tristanls I'd much prefer mocha as we could use node-tdd and we could easily use recordings and hence visualize the requests that are actually being made. Would be best combined with an in memory dynamodb instance for "integration" style tests. Any particular reason you prefer yest?

tristanls commented 4 years ago

I created #27 and #28 to track the need to reintroduce in memory db and to make recording availble.

@jepetko are you experiencing bugs with v0.7.1 presently? or were you relaying your experience regarding why you prefer in-memory db?

jepetko commented 4 years ago

Hi @tristanls, so far no suspicious behavior.