sorintlab / stolon

PostgreSQL cloud native High Availability and more.
https://talk.stolon.io
Apache License 2.0
4.65k stars 446 forks source link

Make max-rate configurable when running pg_basebackup #649

Open aswinkarthik opened 5 years ago

aswinkarthik commented 5 years ago

Submission type

Enhancement Description

In pg_basebackup, there is a flag --max-rate to limit the transfer of data directory when running pg_basebackup. This is useful to limit the load put on the primary server when resyncing.

We have had couple of issues, where we have a slave doing a resync causing huge load on the primary server. What is your opinion on making it configurable?

I would like to contribute this as a PR.

If you are okay with this, could you review this implementation idea?

As per PostgreSQL docs,

-r rate
--max-rate=rate

    The maximum transfer rate of data transferred from the server. Values are in kilobytes per second. Use a suffix of M to indicate megabytes per second. A suffix of k is also accepted, and has no effect. Valid values are between 32 kilobytes per second and 1024 megabytes per second.

    The purpose is to limit the impact of pg_basebackup on the running server.

    This option always affects transfer of the data directory. Transfer of WAL files is only affected if the collection method is fetch.

source

maksm90 commented 5 years ago

Hi @aswinkarthik !

I think the integration of pg_basebackup options to cluster config is bad idea because this clutters up config. IMO It's more appropriate to specify in config a custom script to recover node from some external place (e.g. from backup or pg_basebackup with custom options from master) as described in #389 .

The option --max-rate is not alone that would like to incorporate to stolon, --checkpoint=fast|spread is another one that is not covered. Custom recovery script is the sole way to avoid moving pg_basebackup options to cluster config.

halfa commented 3 years ago

This bit us today. We use small workstations and the almost-1-Gbps recovery rate was pushing latencies above the roof and causing timeouts on top of putting a very heavy burden on the node we were replicating from. We had to manually patch Stolon to rale-limit the recovery and allow the replica to initialize the 40GB database without timeouts in the pg_basebackup replication. The error message in the keeper's log was:

pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

The solution ultimately involved:

As much as I agree that adding more configuration option is not something to do lightly, pg_basebackup is the de-facto replication method right now: not allowing access to those knobs can lead to unrecoverable situations without a custom version of the software, and #739 is maybe setting a very high bar to clear to solve it by generalizing the solution.

mikkorantalainen commented 2 years ago

This bit us today. We use small workstations and the almost-1-Gbps recovery rate was pushing latencies above the roof and causing timeouts on top of putting a very heavy burden on the node we were replicating from.

I think the real fix would be to configure the network to not stall even when one process writes a big stream fast. In practice, this might mean using fq_codel combined with better congestion algorithm such as vegas or cdg.

If you're running an older distro with defaults such as tc qdisc pfifo and tcp_congestion_control cubic the network is tuned for max throughput regardless of latency, which is the actual cause for your problem. The pg_basebackup is just fast enough to fill your pipes and you feel the results but any other process pushing lots of data over a single TPC/IP socket would do exactly the same.

Note that the congestion control algorithm of the sender is the important part of this equation. If sender pushes the pipes full with cubic it doesn't help if receiver is running with vegas or cdg unless they are filling the upstream, too.