tiredofit / docker-db-backup

Backup multiple database types on a scheduled basis with many customizable options
MIT License
884 stars 128 forks source link

Old backups not cleaned up with DEFAULT_CLEANUP_TIME set to 3 and split_db option set. #370

Open usma0118 opened 2 months ago

usma0118 commented 2 months ago

Summary

Old backups not cleaned up with DEFAULT_CLEANUP_TIME set to 3 and split_db option set.

Steps to reproduce

What is the expected correct behavior?

          - name: TEMP_PATH
            value: "/data"
          - name: DEFAULT_BACKUP_LOCATION
            value: "S3"
          - name: DEFAULT_S3_PROTOCOL
            value: "http"
          - name: DEFAULT_CLEANUP_TIME
            value: 4320 # 3 days
          - name: SCHEDULE
            value: "@daily"
          - name: DEFAULT_TYPE
            value: "Postgresql"
          - name: DEFAULT_USER
            valueFrom:
              secretKeyRef:
                name: &db-secret postgres
                key: username
          - name: DEFAULT_PASS
            valueFrom:
              secretKeyRef:
                name: *db-secret
                key: password
          - name: DEFAULT_NAME
            value: "ALL"
          - name: DEFAULT_NAME_EXCLUDE
            value: "postgres"
          - name: DEFAULT_BACKUP_GLOBALS
            value: "false"
          - name: DEFAULT_SPLIT_DB
            value: true
          - name: DEFAULT_HOST
            valueFrom:
              secretKeyRef:
                name: *app
                key: DB_HOST
          - name: DEFAULT_EXTRA_OPTS
            value: "--clean --if-exists"
          - name: CONTAINER_ENABLE_MONITORING
            value: "false"

Relevant logs and/or screenshots

2024-09-20.16:45:31 [INFO] ** [01-postgres-rw.database.svc.cluster.local__ALL] DB Backup of 'pgsql_[redacted]_postgres-rw.database.svc.cluster.local_20240920-164530.sql.zst' completed successfully
2024-09-20.16:45:31 [NOTICE] ** [01-postgres-rw.database.svc.cluster.local__ALL] Encrypting with GPG Passphrase
2024-09-20.16:45:33 [NOTICE] ** [01-postgres-rw.database.svc.cluster.local__ALL] Generating MD5 sum for 'pgsql_[redacted]_postgres-rw.database.svc.cluster.local_20240920-164530.sql.zst.gpg'
2024-09-20.16:45:34 [NOTICE] ** [01-postgres-rw.database.svc.cluster.local__ALL] Backup of 'pgsql_[redacted]_postgres-rw.database.svc.cluster.local_20240920-164530.sql.zst.gpg' created with the size of 891111 bytes
2024-09-20.16:45:43 [NOTICE] ** [01-postgres-rw.database.svc.cluster.local__ALL] DB Backup for '[redacted]' time taken: Hours: 0 Minutes: 00 Seconds: 13
2024-09-20.16:45:43 [INFO] ** [01-postgres-rw.database.svc.cluster.local__ALL] Cleaning up old backups on S3 storage
2024-09-20.16:45:46 [NOTICE] ** [01-postgres-rw.database.svc.cluster.local__ALL] Dumping PostgresSQL globals: with 'pg_dumpall -g' and compressing with 'zstd'
2024-09-20.16:45:46 [NOTICE] ** [01-postgres-rw.database.svc.cluster.local__ALL] Encrypting with GPG Passphrase
2024-09-20.16:45:48 [NOTICE] ** [01-postgres-rw.database.svc.cluster.local__ALL] Generating MD5 sum for 'pgsql_globals_postgres-rw.database.svc.cluster.local_20240920-164546.sql.zst.gpg'
2024-09-20.16:45:49 [NOTICE] ** [01-postgres-rw.database.svc.cluster.local__ALL] Backup of 'pgsql_globals_postgres-rw.database.svc.cluster.local_20240920-164546.sql.zst.gpg' created with the size of 1233 bytes
2024-09-20.16:45:56 [NOTICE] ** [01-postgres-rw.database.svc.cluster.local__ALL] DB Backup for 'globals' time taken: Hours: 0 Minutes: 00 Seconds: 10
2024-09-20.16:45:56 [INFO] ** [01-postgres-rw.database.svc.cluster.local__ALL] Cleaning up old backups on S3 storage
2024-09-20.16:45:58 [INFO] ** [01-postgres-rw.database.svc.cluster.local__ALL] Backup 01 routines finish time: 2024-09-20 16:45:58 CEST with exit code 0
2024-09-20.16:45:58 [NOTICE] ** [01-postgres-rw.database.svc.cluster.local__ALL] Backup 01 routines time taken: Hours: 0 Minutes: 00 Seconds: 46
2024-09-20.16:45:58 [NOTICE] ** [01-postgres-rw.database.svc.cluster.local__ALL] Sleeping for another 86354 seconds. Waking up at 2024-09-21 16:45:12 CEST

Environment

Kubernetes

Any logs | docker-compose.yml

Possible fixes

pimjansen commented 1 month ago

@usma0118 i got something similar:

[DB] Moving backup to external storage with blobxfer
mv: cannot stat '/tmp/backups/01_dbbackup.FrSQds/*.': No such file or directory
mv: preserving times for '/backup/myname_20241022-085939.sql.gz': Operation not permitted
mv: preserving permissions for ‘/backup/myname.sql.gz’: Operation not permitted

and my env vars:

CONTAINER_ENABLE_MONITORING : false
DB_CLEANUP_TIME : 10080
DB_HOST : mysql-xxx.mysql.database.azure.com
DB_NAME : myname01,myname02
DB_PASS : secret(backup-settings)[DB_PASS] 
DB_TYPE : mysql
DB_USER : backup
DEFAULT_BACKUP_BEGIN : 0130
DEFAULT_BACKUP_LOCATION : blobxfer
DEFAULT_BLOBXFER_MODE : file
DEFAULT_BLOBXFER_REMOTE_PATH : my-backup-path
DEFAULT_BLOBXFER_STORAGE_ACCOUNT : myaccount-dev001
DEFAULT_BLOBXFER_STORAGE_ACCOUNT_KEY : secret(backup-settings)[BLOBXFER_STORAGE_ACCOUNT_KEY] 
DEFAULT_CHECKSUM : NONE
DEFAULT_COMPRESSION : GZ
DEFAULT_DEBUG_MODE : false
DEFAULT_EXTRA_OPTS : --complete-insert --no-create-db
DEFAULT_MYSQL_CLIENT : mysql
DEFAULT_SPLIT_DB : true
TIMEZONE : Europe/Amsterdam

As far as i can see nothing fancy so no idea what goes wrong here. I end up with broken backups all the time. The backup itself is fine however the copy goes completely wrong here and the remote receives a file with 0 bytes with exit code 0

@tiredofit got an idea what this could be? Seems user related? Then i go inside the container i start the process default as "root"?

UPDATE What i do notice is this line:

if [ "${backup_job_checksum}" != "none" ] ; then run_as_user mv "${temporary_directory}"/*."${checksum_extension}" "${backup_job_filesystem_path}"/; fi

While there is no checksum_extension set it will move *. which ofc causes an error i think. When digging further i noticed the main issue is the permissions and the user. The scripts runs things as a different user and the Azure volume in this case is SMB. Therefor it cannot access the files generated. I now set the user DBBACKUP_USER to be root and all works fine (and the checksum is just throwing an error and therefor skipped, guess that needs to be addressed).