Closed arssher closed 3 weeks ago
Got a test for this?
This is no different from upstream, right? Can you prepare a patch for pgsql-hackers, please?
Why doesn't the publisher send the keepalive message for a long time?
(discussed verbally a bit)
Got a test for this?
No, looks possible but untrivial. However, we observed latest_end_lsn being stuck for at least 2 hours while received_lsn progressed, and deploy of this patch helped. Normally walsender sends keepalive after it didn't hear from subscriber for more than wal_sender_timeout which triggers reply, but there it didn't happen (I rechecked, while wal_sender_timeout was changed, but only to 5 minutes, so can't explain 2 hours stuckness). Probably because publisher was alloydb, not vanilla postgres, not sure.
This is no different from upstream, right? Can you prepare a patch for pgsql-hackers, please?
Given above in vanilla in theory this shouldn't be a problem, but yes, it makes sense for apply worker to reply sometimes regardless of publisher behaviour, so can do.
Otherwise, if publisher doesn't send keepalive 'k' it won't be able to advance the slot for a long time.