tizbac / pmoxs3backuproxy

Proxy written in golang that will emulate a PBS server and work on one or more S3 buckets
GNU General Public License v3.0
66 stars 4 forks source link

Retention and garbage collection #16

Closed tizbac closed 3 months ago

tizbac commented 4 months ago

Easy part Add a parameter for X days of retention Delete snapshots older than X every 24 hours or so

Tricky part - Garbage Collection + Checking Implement a parameter to choose when to do GC When GC is running no backup writing must be possible, 503 service unavailable ( maybe in the future can be done, but keep it simple for now )

GC Will do the following, download one after one all FIDX and DIDX, create an hashmap of all referenced chunks List all S3 Objects in chunks/ prefix Delete unreferenced Send an email or warn critically about referenced missing chunks, and optimally move the impacted backup in a folder like "corrupted" , so that incremental of next ones will not assume that the chunks are ok

Checksumming chunks like happens in PBS imho is useless because all S3 services have internally checksumming, when uploads take place via SSL, corruption will in 99.99% cases just drop then connection , also without SSL , have to check but i think S3 already also uses a checksum Truncated cannot happen because of Content-Length If the data itself is damaged before, well , you got a bigger problem at hand :)

Any suggestions before i work that are welcome

tizbac commented 3 months ago

I've implemented as a separate process meant to be scheduled by using crontab , testing is welcome