ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
3.83k stars 532 forks source link

DataShard: add stricter cross-shard guarantees in volatile transactions? #2994

Closed snaury closed 5 months ago

snaury commented 6 months ago

Related to issue https://github.com/ydb-platform/jepsen.ydb/issues/5

Since I want to test stricter model it made me think what are actual implications of the testing would be, i.e. how deep should ordering guarantees be. Even though it wouldn't be directly testable using jepsen append workload, I think there may be an issue towards the future where we want to guarantee cheap strict serializable isolation. Here are some issues I could think about:

In a strict serializable system we have tx1 -> tx2 -> tx3 -> tx4, but currently tx4 might observe either 1 or null, while tx5 (with volatile transactions) is guaranteed to observe key2==2, because read from key3 will only succeed when commit for key3 is confirmed, and that may only happen when readsets from neighboring shards are received, i.e. this implies that key2 changes are persisted, and all new reads will have to wait for it to resolve.

With volatile transactions, when tx3 reads key3 it will observe tx2 as uncommitted, and tx1 will be hidden below that. There will be a dependency on tx2 for that read, but tx2 may resolve before tx1 is resolved, and consequently even before key1 is written (plans may be delayed at the corresponding shard, so the shard time is "in the past" at the time).

To fix this case we may want to resolve "conflicting" volatile transactions in their correct order, i.e. don't reply to tx2 before tx1 is also confirmed (as either committed or aborted). Reordered distributed writes at the same shard already have the correct logical order though, i.e. when reordered previous plan queue transactions are treated as logically complete, and we would include them in all new writes.

There's a different case however, image that tx3 was reading from key2 instead, and observed key2==2, does it guarantee that tx4 will observe key1==1? No, because those shards are completely unrelated, and again it is possible shard1 didn't even receive the plan yet, and has no idea whether it would be executing tx1 or not. Since this stricter guarantee doesn't fix all read reorderings, I'm not sure whether it's really useful.

It is possible to move towards strict serializable by including a "minimum version" for immediate writes (and as it turns out for immediate reads).

snaury commented 5 months ago

I'm closing this issue in favour of #3009.