terricain / aioboto3

Wrapper to use boto3 resources with the aiobotocore async backend
Apache License 2.0
719 stars 74 forks source link

Potential data-loss bug in BatchWriter flushing function #246

Closed rudaporto closed 1 year ago

rudaporto commented 2 years ago

Description

This bug report is a replica of the same bug reported on boto3:

Since the aioboto3.dynamodb.table.BatchWriter._flush is the same as the one in boto3, it has the same bug:

https://github.com/terrycain/aioboto3/blob/c413766fe9cd95a0889d80eb5099d4296ebeaa1a/aioboto3/dynamodb/table.py#L98-L112

There is a PR request open in boto3 to fix two issues:

We bump in this issue because we are using a shared batch writer between multiple coroutines adding items.

By comparing our old implementation with the new one we discovered that the new one using aioboto3 BatchWriter is loosing data in some cases: some records were never persisted in DynamoDB.

Fix

It looks like the solution proposed in the boto3 PR could easily be back ported here. I volunteer to work on it by creating a PR with the fix and new unit tests to simulate the issues.

terricain commented 2 years ago

Yeah that definitely is a bug, I'd also put the unprocessed items at the start of the items buffer.

I'm in two minds about this. I'd prefer not to diverge too much from the boto3 codebase, but then again, this is clearly a bug.

Yeah, feel free to raise a PR, can you stick a comment in that function linking to this issue so there is a record of why the code diverges slightly from boto3 logic.

tibbe commented 1 year ago

Should this be closed? It seems that a fix has been merged.