Open petuhovskiy opened 4 days ago
(notes chatting with Arthur)
Impact: interferes with writing clean tests. Currently if a safekeeper has stale remote_consistent_lsn for long enough, it will remain active & the pageserver will eventually connect to it. When the pageserver connects it will eventually learn remote_consistent_lsn.
More generally: should we reconsider using remote_consistent_lsn in the safekeeper in our condition for broker_is_active?
I saw this happenning in tests:
remote_consistent_lsn
to matchlast_record_lsn
broker_is_active
tofalse
remote_consistent_lsn
from brokerThe fix is to delay timeline deactivation for some time (30s), so that safekeepers would have a chance to broadcast
remote_consistent_lsn
update to peers. It's not a solution for 100% of cases, but should work good enough.