Open findepi opened 1 year ago
Hello.
I have seen this behaviour for a while now.
You don't see this behaviour in spark, delta-rs when you do concurrent appends.
The delta connector does not seem to lock in the correct metadata delta version like other engines do.
I have seen this behaviour for
Scenario for realizing that there is a conflict and the query needs to be aborted (error key plays an important role here to notify the client accordingly):
start:
0001.json
table has file A with rows 1-10
User1: update row 5:
0002.json
remove entry: delete A
add entry: add file C (rows 1-4, 6-10 same, 5 modified)
User2: update row 2:
0003.json
remove entry: delete A
add entry: add file B (rows 1, 3-10 same, 2 modified)
realize conflict
When writing on different partitions of the table therefore the connector should be able to cope transparently for the user with concurrent inserts - there are no shared resources which are being modified by the queries running concurrently.
Basic algorithm (shared from an answer of @findepi )
TransactionConflictException
)Overview of the PRs used to cover the concurrent reconciliation functionality:
optimize
procedure@findinpath awesome progress!
- TODO concurrent reconciliation support for the
optimize
procedure
by any chance, is in the works?
by any chance, is in the works?
I plan to create a PR for handling optimize
during this week.
Support transactions that e.g. insert data concurrently, or such that modify data within disjoint data sets (eg partitions).