ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
3.78k stars 520 forks source link

Lost CDC heartbeat after cluster restart #6403

Open dcherednik opened 2 months ago

dcherednik commented 2 months ago

2024-07-05T15:40:29.971237Z :PERSQUEUE WARN: THeartbeatEmitter::Process ^@72075186224038185 Step: 1720194030000 TxId: 0 Data: "P\000\372\007\036{\"resolved\":[1720194030000,0]}"

2024-07-05T15:40:29.971243Z :PERSQUEUE WARN: THeartbeatEmitter::Process ^@72075186224038057 Step: 1720194030000 TxId: 0 Data: "P\000\372\007\036{\"resolved\":[1720194030000,0]}"

2024-07-05T15:40:29.971248Z :PERSQUEUE WARN: THeartbeatEmitter::CanEmit ex4 2024-07-05T15:40:29.971251Z :PERSQUEUE WARN: THeartbeatEmitter::CanEmit version, sourceIds1720174735000 0 - - 1 2024-07-05T15:40:29.971253Z :PERSQUEUE WARN: THeartbeatEmitter::CanEmit break 1 2024-07-05T15:40:29.971253Z :PERSQUEUE WARN: THeartbeatEmitter::CanEmit nothing 1 2024-07-05T15:40:29.971254Z :PERSQUEUE WARN: THeartbeatEmitter::CanEmit ex7 2024-07-05T15:40:29.971306Z :PERSQUEUE WARN: THeartbeatEmitter::Process ^@72075186224038186 Step: 1720194030000 TxId: 0 Data: "P\000\372\007\036{\"resolved\":[1720194030000,0]}"

2024-07-05T15:40:29.971309Z :PERSQUEUE WARN: THeartbeatEmitter::Process ^@72075186224038165 Step: 1720194030000 TxId: 0 Data: "P\000\372\007\036{\"resolved\":[1720194030000,0]}"

2024-07-05T15:40:29.971311Z :PERSQUEUE WARN: THeartbeatEmitter::Process ^@72075186224038173 Step: 1720194030000 TxId: 0 Data: "P\000\372\007\036{\"resolved\":[1720194030000,0]}"

2024-07-05T15:40:29.971314Z :PERSQUEUE WARN: THeartbeatEmitter::Process ^@72075186224038188 Step: 1720194030000 TxId: 0 Data: "P\000\372\007\036{\"resolved\":[1720194030000,0]}"

2024-07-05T15:40:29.971316Z :PERSQUEUE WARN: THeartbeatEmitter::Process ^@72075186224038189 Step: 1720194030000 TxId: 0 Data: "P\000\372\007\036{\"resolved\":[1720194030000,0]}"

2024-07-05T15:40:29.971318Z :PERSQUEUE WARN: THeartbeatEmitter::Process ^@72075186224038181 Step: 1720194030000 TxId: 0 Data: "P\000\372\007\036{\"resolved\":[1720194030000,0]}"

2024-07-05T15:40:29.971321Z :PERSQUEUE WARN: THeartbeatEmitter::Process ^@72075186224038172 Step: 1720194030000 TxId: 0 Data: "P\000\372\007\036{\"resolved\":[1720194030000,0]}"

2024-07-05T15:40:29.971323Z :PERSQUEUE WARN: THeartbeatEmitter::Process ^@72075186224038179 Step: 1720194030000 TxId: 0 Data: "P\000\372\007\036{\"resolved\":[1720194030000,0]}"

2024-07-05T15:40:29.971326Z :PERSQUEUE WARN: THeartbeatEmitter::CanEmit version, sourceIds1720174735000 0 - - 1 2024-07-05T15:40:29.971326Z :PERSQUEUE WARN: THeartbeatEmitter::CanEmit break 1 2024-07-05T15:40:29.971327Z :PERSQUEUE WARN: THeartbeatEmitter::CanEmit nothing 1 2024-07-05T15:40:29.971328Z :PERSQUEUE WARN: THeartbeatEmitter::CanEmit ex7

where break1 ex7 mean we break the loop here https://github.com/ydb-platform/ydb/blob/main/ydb/core/persqueue/sourceid.cpp#L560

and after that newVersion was not set. So each time

THeartbeatEmitter::CanEmit ex4 is successful heartbeat for other empty table present on cluster

I am trying to collect more helpful debug data

dcherednik commented 1 month ago

https://pastebin.com/zraJd2uk

dcherednik commented 1 month ago

https://pastebin.com/tT9zQ0yv