Open vfedotovs opened 3 months ago
Containerised solution proposal:
Following code will :
# backup_upload.py
import boto3
import subprocess
import datetime
import os
def backup_postgres():
# Define backup filename with current date
date_str = datetime.datetime.now().strftime("%Y_%m_%d")
backup_filename = f"/tmp/pg_backup_{date_str}.sql"
# Run pg_dump command to create the backup
subprocess.run([
"pg_dump",
"-U", "DB-USER",
"-h", "db", # Assuming the db container is named `db` in the network
"-d", "DB-NAME",
"-f", backup_filename
], check=True)
return backup_filename
def upload_to_s3(file_path, bucket_name, object_name=None):
# Initialize the boto3 client
s3_client = boto3.client('s3', region_name='your-region') # Replace 'your-region' with your S3 region
# Define object name in S3 if not provided
if not object_name:
object_name = os.path.basename(file_path)
# Upload the file to the specified S3 bucket
s3_client.upload_file(file_path, bucket_name, object_name)
print(f"Uploaded {file_path} to S3 bucket {bucket_name}")
if __name__ == "__main__":
# Generate the backup
backup_file = backup_postgres()
# Upload backup to S3
s3_bucket_name = "bucket-name-pg-backups" # Replace with your bucket name
upload_to_s3(backup_file, s3_bucket_name)
Dockerfile for conatianer:
# Dockerfile
FROM python:3.9
# Install PostgreSQL client
RUN apt-get update && apt-get install -y postgresql-client && rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy the Python script into the container
COPY backup_upload.py /app
# Install Python dependencies
RUN pip install boto3
# Environment variables for AWS credentials
ENV AWS_ACCESS_KEY_ID=your-access-key-id
ENV AWS_SECRET_ACCESS_KEY=your-secret-access-key
# Run the Python script
CMD ["python", "backup_upload.py"]
Docker compose example
version: '3.8'
services:
db-backup:
build: .
environment:
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
depends_on:
- db
restart: on-failure
entrypoint: ["cron", "-f"]
volumes:
- db_data:/var/lib/postgresql/data
volumes:
db_data:
external: true
# cronfile
0 6 * * * python /app/backup_upload.py
Run docker compose
docker-compose up --build -d
Updated docker compose that will run sheduled cron backup
FROM python:3.9
# Install PostgreSQL client and cron
RUN apt-get update && apt-get install -y postgresql-client cron && rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy application code and cronfile into the container
COPY backup_upload.py /app
COPY cronfile /etc/cron.d/db-backup-cron
# Install Python dependencies
RUN pip install boto3
# Set permissions for cronfile and give execution permission
RUN chmod 0644 /etc/cron.d/db-backup-cron && crontab /etc/cron.d/db-backup-cron
# Create a log file for cron output
RUN touch /var/log/cron.log
# Run the cron daemon in foreground (for Docker compatibility)
CMD cron -f
New docker compose file:
version: '3.8'
services:
db-backup:
build: .
environment:
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
depends_on:
- db
restart: on-failure
volumes:
- db_data:/var/lib/postgresql/data
volumes:
db_data:
external: true
cronfile
0 6 * * * python /app/backup_upload.py >> /var/log/cron.log 2>&1
How the Cron Job Works Inside the Container Container Starts and Runs cron: When the container starts, the CMD cron -f in the Dockerfile starts the cron daemon in the foreground, allowing it to continue running as the container’s main process.
Cron Reads the cronfile: The crontab loads the cronfile with the scheduled task(s), setting it to execute the Python backup script at the specified time.
Daily Execution: At the scheduled time, cron triggers the backup_upload.py script inside the container. The Python script runs, generates a backup, and uploads it to S3, as defined in your script.
Log Output: The cron task’s output is logged to /var/log/cron.log inside the container, which can be reviewed for success or errors
Current behavior backup files do not contain hour_min_sec in filename 122712 Aug 8 08:05 pg_backup_2024_08_08.sql 122689 Aug 7 08:05 pg_backup_2024_08_07.sql
Backup does not have logic to upload last file
Possible solution
Improved cron job version example 5 6 * \ docker exec -t $(docker ps --filter "name=db-1" --format "{{.Names}}") \ pg_dump -U DB-USER -d DB-NAME | gzip > /tmp/pgbackup$(date +\%Y\%m\%d).sql.gz \ 2>> /var/log/pg_backuperror.log && echo "$(date +\%Y\%m\%d\%H:%M:%S) Backup successful" >> /var/log/pg_backup_success.log
notes $(docker ps --filter "name=db-1" --format "{{.Names}}"): This refines the docker ps command to specifically target the container name that matches "db-1", which is more reliable than using grep.
gzip: The backup is compressed with gzip to save space.
/tmp/pgbackup$(date +\%Y\%m\%d).sql.gz: The backup file is saved with a .sql.gz extension to indicate that it’s compressed.
Error Logging: 2>> /var/log/pg_backup_error.log redirects any errors to a specific log file.
Success Logging: echo "$(date +\%Y\%m\%d_\%H:%M:%S) Backup successful" >> /var/log/pg_backup_success.log logs a success message along with a timestamp if the backup completes successfully.
Backup Retention Policy: Consider setting up a job to remove old backups after a certain period (e.g., 30 days).
bash Copy code 0 0 find /tmp/pgbackup.sql.gz -mtime +30 -exec rm {} \;