ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
4k stars 565 forks source link

Got tx with plan step less than last heartbeat in one topic partition stream #6985

Closed dcherednik closed 1 month ago

dcherednik commented 3 months ago

{"resolved":[1721729882000,0]} {"resolved":[1721729924000,0]} {"resolved":[1721729955000,0]} {"update":{"v":"1MkPAAAAAEjUyQ8AAAAASNTJDwAAAABI1MkPAAAAAA=="},"key":[4899916394580082192],"ts":[1721729950551,281475194075732]} {"update":{"v":"wqMAAAAAAJDCowAAAAAA"},"key":[4899916394579553682],"ts":[1721729950556,281475194075747]} {"resolved":[1721729985000,0]} {"resolved":[1721730001000,0]}

dcherednik commented 3 months ago

Same for case of disabled VolatileTx

{"resolved":[1721744218000,0]}

{"update":{"v":"xSoIAAAAAOTFKggAAAAA5MUqCAAAAADkxSoIAAAAAOTFKggAAAAA"},"key":[3458764513821096247],"ts":[1721744215418,844425147299155]}

dcherednik commented 3 months ago

https://github.com/ydb-platform/ydb/pull/7051/files works for case of disabled volatile tx

snaury commented 2 months ago

I tried adding a changefeed to jepsen tables and validating the resulting log, and can easily see the issue. The issue reproduces both with and without volatile transactions. Looks like it's somehow related to splits (more splits -> better chance to reproduce), but it's unclear so far.

snaury commented 2 months ago

Reproduced with a build that logs per-shard heartbeat messages (89d27b20ac5081734898c740846e17c91a30c5fd):

https://gist.github.com/snaury/9245b47b09e7f6c943bfeea134c0111b

It is clearly visible that when 72075186224037898 split into 72075186224037901 and 72075186224037902 they both started streaming their updates and heartbeats. However it is clearly visible that while pq partition waited for new heartbeats from the second shard, it appeared to be ignored when actually producing resolved messages:

{"shard_id":72075186224037898,"resolved_private":[1725549507000,0]}
{"resolved":[1725549507000,0]}
{"shard_id":72075186224037901,"resolved_private":[1725549507000,0]}
{"shard_id":72075186224037901,"resolved_private":[1725549508000,0]}
...
{"shard_id":72075186224037901,"resolved_private":[1725549514000,0]}
{"shard_id":72075186224037902,"resolved_private":[1725549507000,0]}
{"shard_id":72075186224037901,"resolved_private":[1725549515000,0]}
{"shard_id":72075186224037902,"resolved_private":[1725549508000,0]}
{"resolved":[1725549515000,0]}
{"shard_id":72075186224037901,"resolved_private":[1725549516000,0]}
{"shard_id":72075186224037902,"resolved_private":[1725549509000,0]}
{"resolved":[1725549516000,0]}
{"shard_id":72075186224037901,"resolved_private":[1725549517000,0]}
{"shard_id":72075186224037901,"resolved_private":[1725549518000,0]}
{"shard_id":72075186224037902,"resolved_private":[1725549510000,0]}
{"resolved":[1725549518000,0]}

It is very suspicious that a new resolved message is always generated after a new resolved_private from 72075186224037902, however the actual value is from the last resolved_private from 72075186224037901. This also kept happening when 72075186224037902 did split into 72075186224037903 and 72075186224037904.

Even weirder when it backtracks:

{"resolved":[1725549546000,0]}
{"shard_id":72075186224037901,"resolved_private":[1725549547000,0]}
{"shard_id":72075186224037903,"resolved_private":[1725549548000,0]}
{"shard_id":72075186224037901,"resolved_private":[1725549548000,0]}
{"shard_id":72075186224037904,"resolved_private":[1725549534000,0]}
{"resolved":[1725549548000,0]}
{"shard_id":72075186224037904,"resolved_private":[1725549535000,0]}
{"resolved":[1725549547000,0]}

Notice how it first produced [1725549548000,0], but then a new heartbeat from 72075186224037904 appeared and it produced [1725549547000,0], a previous value!