postgrespro / pg_probackup

Backup and recovery manager for PostgreSQL
https://postgrespro.github.io/pg_probackup/
Other
711 stars 86 forks source link

s3 traffic / request during delta backup #230

Open mkdel opened 4 years ago

mkdel commented 4 years ago

Hi,

there is a bunch of things you have to pay while using amazon services and couple of them is an outbound traffic from S3 bucket and change requests (PUT/COPY/POST/LIST). We are using AWS Storage Gateway to store our pg_probackup backups in S3. Our current plan price is like this: to store 1Tb of data is about $10 , and about $100 for reading 1Tb from S3. And just right now sysAdmin have provided me with info about last week full db backups (with continuous archival) of one of our data center: image

We didn't consider requests during calculations, but actually they are the biggest part of total price. ($31.5 of $59) more than a half.

We have databases with more than 1m files and it looks like have an ability to store db backup in tar is a way cheaper, otherwise it costs $10 for every 1m requests to save the files.

The other thing was mentioned is outbound traffic from S3: during delta backups (DELTA/PAGE) we have to read what we have in bucket, and question is, what backup type will cause less traffic to read to build delta backup? What I mean, it looks like full backup is cheaper than delta backup because we don't have to read stored data. E.g. we have 100Gb of backuped DB ($1 per month to store) and if we want DELTA we read this 100GB ($10 for a reading backup)

I know if we use PAGE we read only WALs, so it cost will depend on WAL generation activity and other factors.

Please consider this info for cloud backup development/improvements.

Thanks, Mikhail

gsmolk commented 4 years ago

Hello! Can you please provide pg_probackup show output for backup instance, located on s3?

mkdel commented 4 years ago

show.txt

averemee-si commented 4 years ago

Mikhail,

You need to change storage class from S3-IA to S3 standard first. S3 IA storage class is for rare request, but you have in bill millions of requests, and for at least 30 days storage, but you delete objects from S3-IA before 30 days - EarlyDelete line.

Hope this helps, Alexey

averemee-si commented 4 years ago

Mikhail,

Additional question: which type (cached or stored) of AWS Storage are you using?

Regards, Alexey

mkdel commented 4 years ago

Hi Alexey and Grigory, Thank you for provided recommendations!

Sorry for delay, our sysadmin says it's File Storage Gateway, so I just can't tell if it's cached or stored. But I know it caches files.

Moving to S3 standard reduces price coast just in 2 times (in aspect of PUT/Change request) and using DELTA backup reduce requests in 2 times too. In our particular case we have 500Gb DB with 1,3m files => so we can have about 650k requests per week for DELTA backup that cost $3.25 (650 (thousand requests) * $0.005). Unfortunately at the moment we decided to use WAL-G for cloud backups (it created just 601 request per backup >$0.01) because of tarred archive.

Though for backups on our own NAS storage- pg_probackup is the best solution ever! :)

averemee-si commented 4 years ago

Hi Mikhail,

Is this AWS Gateway NFS mount? Or iSCSI mount?

Looks like you are using AWS File Storage Gateway, however AWS recommends for backups using Tape Storage Gateway for VTL-compatible load (https://aws.amazon.com/storagegateway/vtl/?nc=sn&loc=2&dn=3), or Volume Storage Gateway in stored or cached mode (https://aws.amazon.com/storagegateway/volume/). If you need more assistance with AWS storage configuration - drop me email at Aleksej.Veremeev@a2-solutions.eu

Regards, Alexey