Feature/range key support

jepetko commented 4 years ago

This will add support for range keys as part of the lock.

Use case: The use case is to design an 1:n relationship between entities. Let's say you want to use the product identifier as the hash key. The mutation of the product serves as the range key.

HASH_KEY (product)	RANGE_KEY (product mutation)
1	A
1	B
1	C
2	D
2	E

Since for updating the product mutation a remote HTTP call needs to be performed, you will want to lock it for this period of time in order to avoid race conditions. Let's say the mutation update is related to the mutation 1-B. In this case both, 1-A and 1-C, don't need to be locked.

The PR includes:

backwards compatible extension of the locking. The method signature acquireLock still accepts single lock ID as first and callback as second argument (regression tests were written prior to the changes). If range key needs to be passed then the first argument becomes an array with two elements:

Usage without range key:

failOpenClient.acquireLock('myHashKey', (error, lock) => {
....
}

Usage with range key:

failOpenClient.acquireLock(['myHashKey', 'myRangeKey'], (error, lock) => {
....
}

Missing points:

tests for FailClosed locks
better scenarios for concurrent lock trials

tristanls commented 4 years ago

Hi @jepetko ,

Wow! Tests \(^_^ )/! That's awesome! This module needs some of those, thank you!

With the use case you've outlined, it seems to me that the problem being solved is to guarantee a unique lock per product + product mutation combination. The way I would think of solving this problem is to concatenate product and product mutation to create the lock key. Using your example, if I want to lock product 1 mutation B, I would use 1-B as the unique identifier for the lock.

Distributed locks are advisory only. One implication is that interpretation of lock name is always up to the clients and depends on agreed upon convention. Is there something that prohibits using the convention of "concatenate product with product mutation using - character to create the lock name" in this use case?

Cheers,

Tristan

simlu commented 4 years ago

Tests! That is awesome!

When this is in I'd love to start using node-tdd on top of mocha. I think there is a lot of value here in seeing what requests are actually being made. We can discuss all that separately though

Tldr I don't need this feature, but having tests is awesome and I can't wait to be doing more with them 👌

jepetko commented 4 years ago

hi @tristanls, @simlu ,

first of all, thank you for your feedback!

@tristanls

What you basically propose is that we should use some composed key to temporarily create an entirely new record in the table for locking purposes only. The actual record would be still "unlocked" since .acquireLock would serve as "programmatic" guard for operations on the table.

However, this wouldn't work because of some "NOT NULL constraint " on the range key. In our example, the lock attempt 1_B_LOCK (for locking hash_key=1 + range_key=B) would fail. The attempt 1_LOCK/B_LOCK would succeed.

HASH_KEY (product)	RANGE_KEY (product mutation)	data	owner	guid	fencingToken	leaseDurationMs	(more columns)
1	A	{some: data, ...}	null	null	null	null	...
1	B	{some: data, ...}	null	null	null	null	...
1	C	{some: data, ...}	null	null	null	null	...
2	D	{some: data, ...}	null	null	null	null	...
2	E	{some: data, ...}	null	null	null	null	...
1_B_LOCK	null	null	root	abc	1	10000	...
1_LOCK	B_LOCK	null	root	abc	1	10000	...

The exception thrown by DynamoDB (in get) for the request

{
  "TableName": "my-lock-table-name",
  "Key": {
    "myPartitionKey": "lockId"
    // myRangeKey is missing here..
  },
  "ConsistentRead": true
}

is

'The number of conditions on the keys is invalid'.

So, obviously range_key needs to be provided (I added a dedicated test for it even if it does not test the actual functionality but the behavior in context of DynamoDB usage).

PS: One possible solution could be to use a dedicated table for locking.

tristanls commented 4 years ago

As far as I understand, PutItem, and in case of DocumentClient: put, does not do a partial update.

On the first FailOpen lock refresh, the data will be overwritten with nothing (line 211, line 260).

Additionally, the first time application does PutItem with data all lock state is overwritten with nothing.

I believe to get a partial update, usage would need to change to use UpdateItem, and in case of DocumentClient: update.

PS: One possible solution could be to use a dedicated table for locking.

This has been my underlying assumption. I did not realize you wanted to maintain the locks along with the data in the same table. This complicates the design somewhat and would require refresh lock heartbeat to PutItem all of the data, not only the lock state, same with lock release logic. (or change to UpdateItem)

If the constraint is to maintain data and the lock state in the same table, then yes, that requires range key support.

Unless I am misunderstanding the code changes, it doesn't seem like the changes return non-lock data (entire Item is present in dataBag.lock but never returned in the callback), in the example, the data column. As such, it still requires a separate round trip to return data once a lock is acquired:

FailOpen.acquireLock(...)
--> getItem (1 request/reply)
--> putItem (1 request/reply)
OtherCode.getDataFromDynamoDB(...) (1 request/reply)

(I am assuming that Conditional Writes are not sufficient for the use cases you have in mind.)

Exposing the original Item in return callback has the benefit of saving a round trip to DynamoDB:

FailOpen.acquireLock(...)
--> (1 request/reply) getItem
--> (1 request/reply) putItem

At this point, you already have data because of first getItem(). The non-lock-related attributes of the Item (like data in the example), if they were exposed, become safe to use after putItem() succeeds and the callback returns the acquired lock.

Even with all of the above, to me, this implies that your application code has to be aware of the details of the lock implementation and never update your data without coordinating with any ongoing lock activity, since every data manipulation like PutItem, and maybe DeleteItem, you'd have to update the lock attributes correctly. Otherwise, the application would be restricted to only doing UpdateItem to ensure lock state is not overwritten, and you'd have to enforce reserved attribute names to prevent collision with lock attributes.

Disclaimer: It took some time to do this write up, but I did it mostly in a single pass. If I made a mistake somewhere and my reasoning and conclusions break down because of it, please do point it out.

As a consequence of the above, I recommend keeping locks and the data separate. However, I think I've outlined the changes necessary to get this to work if the same-table constraint is to be retained.

jepetko commented 4 years ago

Hi @tristanls ,

yes, there is an additional business logic for the locking on our side (sorry, it's typescript):

public async retrieveProductDetails(req: GetProductRequest, retryNumber = 1): Promise<ProductResponse> {
    let lock: Lock;
    try {
        lock = await this.db.getLock(req.product, req.mutation);
        return this._retrieveRemoteProductDetails(req);
    } catch (e) {
        if (!lock) {
            await this.db.blockingSleep(this.config.LOCK_LEASE_DURATION + this.config.LOCK_LEASE_DURATION / 2);

            const record = await this.db.read(req.product, req.mutation) as ProductRecord;
            const dateTime = moment(record.expirationTime).utc();
            if (dateTime.isAfter(moment().utc())) {
                return record.product;
            }
            if (retryNumber === 3) {
                return Promise.reject('max. number of locking trials exhausted');
            }
            return this.retrieveProductDetails(req, ++retryNumber);
        }               
    } finally {
        if (lock) {
            await this.db.releaseLock(lock);
        }
    }
}

Considerations:

dedicated table for locking is not an option
this.db is a client on top of DynamoDB aws sdk + dynamodb-lock-client
this.db.getLock acquires lock for the product/mutation combination. This does not mean that it overrides the actual record keeping the data of the product. It actually does the following .acquireLock([product + '_LOCK', mutation], callback).
this._retrieveRemoteProductDetails retrieves the product via HTTP and saves it in the DB (something like this.db.write(req.product, req.mutation, any data))
the orchestration logic does the following in case there is already a lock (which is assumed if the locking fails):
1. this.db.blockingSleep blocks before it tries to read the actual product record from the DB (which is not locked because it co-exists with the dedicated lock record in the same table)
2. if the product has been renewed in the meantime (it has expirationTime in the future) then the product itself is returned (no need for further remote retrieval of the product). This simulates kinda optimistic locking.
3. if the product still needs renewal then we repeat the locking unless the number of retries is exhausted
in any case: we release the lock if lock was acquired

So, basically, yes, the actual data and the locks are kept in the same table (business requirement, infrastructural restrictions etc.).

The extension of dynamodb-lock-client in the PR actually assumes that the user knows that the data gets overridden if lockId === hash key of the saved entity because the actual data and the lock cannot be combined in one record. The added functionality only adds support for the usage of the range key as such.

tristanls commented 4 years ago

As I understand it, the forcing constraint is:

dedicated table for locking is not an option

and with that being the case and the table used having both hash and a range, dynamodb-lock-client is unusable.

Thanks for helping me understand @jepetko .

jepetko commented 4 years ago

Hi @tristanls,

thank you for your support! I don't know whether closed lock is widely used but I could extend it accordingly as well. Of course, I can update the README too.

Further ideas:

migration to typescript (it also gets rid of outdated util.inherits mechanism etc.)
better tests (simulation of parallel locks)
build (circle CI?) should execute tests

Awesome teamwork. Thank you for that.

jepetko commented 4 years ago

Hi @tristanls , the PR has been already merged; for when do you plan to release it? As temporary solution I'm using a manipulated copy of this repo (in production) and would like to switch to the npm package. Thank's a lot. Let me know if I could contribute.

tristanls commented 4 years ago

Thank you for your patience @jepetko . I integrated and published 0.7.1.

Please note that I modified the interface from an array to an object. I opted for explicit parameter naming in the style of DynamoDB.DocumentClient. It's an arbitrary change but wanted to call it out. Additionally, I use "sort key" instead of "range key" in line with what I think is the latest DynamoDB vocabulary.

I also used a test framework I'm comfortable with, so the quality of tests now in the module may not be what you expect since I made them mostly unit tests instead of relying on an in memory db.

jepetko commented 4 years ago

Hi @tristanls ,

thank you for this release!

I already migrated our stuff to the released npm package. I'm totally fine with switching from mocha to jest. I'm also fine with providing an object (range key, sort key) as configuration for the lock. It's actually much better than an array.

The reason why I was using a real database is the higher confidence when writing DynamoDB statements. I already experienced bugs related to bad statements even with passing tests because mocking/stubbing/spying was used instead of the memory DB.

simlu commented 4 years ago

@tristanls I'd much prefer mocha as we could use node-tdd and we could easily use recordings and hence visualize the requests that are actually being made. Would be best combined with an in memory dynamodb instance for "integration" style tests. Any particular reason you prefer yest?

tristanls commented 4 years ago

I created #27 and #28 to track the need to reintroduce in memory db and to make recording availble.

@jepetko are you experiencing bugs with v0.7.1 presently? or were you relaying your experience regarding why you prefer in-memory db?

jepetko commented 4 years ago

Hi @tristanls, so far no suspicious behavior.

tristanls / dynamodb-lock-client

Feature/range key support #26