pynamodb / PynamoDB

A pythonic interface to Amazon's DynamoDB
http://pynamodb.readthedocs.io
MIT License
2.43k stars 427 forks source link

Incomplete Batch Write - Items Missing Without Errors #1257

Closed mattomaton closed 4 hours ago

mattomaton commented 1 day ago

Hello!

I'm encountering an issue where not all items are persisted when using batch_write with large collections. My implementation is as follows (note: AuthenticityCodeRepo is an abstract base class):

class AuthenticityCodeDynamoRepo(AuthenticityCodeRepo):
    def persist_codes(self, codes: list[AuthenticityCode]) -> None:
        # persist the codes
        with AuthenticityCodeDynamoModel.batch_write() as batch:
            for code in codes:
                # convert the code to a db model
                db_code: AuthenticityCodeDynamoModel = AuthenticityCodeDynamoModel()
                db_code.PK = AuthenticityCodeDynamoModel.create_hash()
                db_code.SK = code.Code
                db_code.sku = code.Sku
                db_code.code = code.Code
                db_code.created_at = str(datetime.now())

                # save the db model
                batch.save(db_code)

While this is close to the example in the PynamoDB documentation (link to example), I'm seeing an issue where not all items in large batches are saved. For example, in a recent run with a batch size of 1500 items, only 1393 were persisted, and no exceptions were raised.

Potential Cause:

I suspect that there may be unprocessed items due to throttling or other factors, but these unprocessed items are not surfaced or handled by the PynamoDB batch_write context, and there doesn't seem to be a mechanism to retry them.

Additional Information:

  1. DynamoDB Capacity Mode: On-Demand

  2. Partition Key (PK): A constant string returned by create_hash, which results in all items being written to the same partition. I realize this may not be ideal and could contribute to throttling or hot partitioning.

Is there something in my implementation or configuration that could be causing this? Or is this a limitation of batch_write in PynamoDB that doesn't handle unprocessed items internally?

Thank you for your time, and please let me know if you need more information.

ikonst commented 1 day ago

From a cursory glance, I see the batch context does attempt a certain number of retries, and ultimately provides a list of failed items in the failed_operations attribute: https://github.com/pynamodb/PynamoDB/blob/4b8630409b46cd57ccbb08658c71bda339c9df7e/pynamodb/models.py#L158-L160

This appears to be tested here: https://github.com/pynamodb/PynamoDB/blob/4b8630409b46cd57ccbb08658c71bda339c9df7e/tests/test_model.py#L2039-L2066

Does this not work as expected?

mattomaton commented 1 day ago

Hello @ikonst!

Thank you for taking a look at this issue. I wasn't aware of the failed_operations property, though it only appears to be set at the time a PutError is raised which in my case did not happen.

For some added context, my code runs in a lambda that is triggered via API Gateway. In the case I mentioned before (1393 out of 1500 being saved, but also many other similar cases I'm now discovering), there were no exceptions thrown and a list of the 1500 codes was returned successfully via the API in csv format, though not all of them had actually made it to dynamo.

As far as I can tell there is a case that I am hitting where items aren't making it to Dynamo and no exceptions are being raised.

I haven't set max_retry_attempts so my understanding is that the default is 3.

Thank you!

ikonst commented 1 day ago

... In CSV format?

mattomaton commented 1 day ago

Sorry @ikonst! I was trying to provide some additional context but (as is so often the case) I confused the issue by including unecessary details. The fact that the API endpoint returns a CSV file isn't relevant to the issue at hand, I was just trying to say that no exceptions were thrown even though some records weren't saved.

Thank you!

mattomaton commented 4 hours ago

I've discovered that the issue was on our side from the initial migration (which did not use Pynamo). I'm so sorry for the trouble!

ikonst commented 2 hours ago

😅