neondatabase / postgres

PostgreSQL in Neon
https://neon.tech/docs/reference/compatibility
Other
26 stars 11 forks source link

logical worker: respond to publisher even under dense stream. #492

Closed arssher closed 3 weeks ago

arssher commented 3 weeks ago

Otherwise, if publisher doesn't send keepalive 'k' it won't be able to advance the slot for a long time.

hlinnaka commented 3 weeks ago

Got a test for this?

This is no different from upstream, right? Can you prepare a patch for pgsql-hackers, please?

hlinnaka commented 3 weeks ago

Why doesn't the publisher send the keepalive message for a long time?

arssher commented 3 weeks ago

(discussed verbally a bit)

Got a test for this?

No, looks possible but untrivial. However, we observed latest_end_lsn being stuck for at least 2 hours while received_lsn progressed, and deploy of this patch helped. Normally walsender sends keepalive after it didn't hear from subscriber for more than wal_sender_timeout which triggers reply, but there it didn't happen (I rechecked, while wal_sender_timeout was changed, but only to 5 minutes, so can't explain 2 hours stuckness). Probably because publisher was alloydb, not vanilla postgres, not sure.

This is no different from upstream, right? Can you prepare a patch for pgsql-hackers, please?

Given above in vanilla in theory this shouldn't be a problem, but yes, it makes sense for apply worker to reply sometimes regardless of publisher behaviour, so can do.