maxisam / mgob

MongoDB dockerized backup agent. Runs schedule backups with retention, S3 & SFTP upload, notifications, instrumentation with Prometheus and more.
https://maxisam.github.io/mgob/
MIT License
123 stars 16 forks source link

improve docs and error handling #111

Open madhums opened 1 month ago

madhums commented 1 month ago

Hi, I am running this as a docker swarm service on the manager node. Nothing really happens after it says "backup has started".

Here's the service I create:

docker service create --name mgob \
  -p 8090:8090 \
  --mount type=bind,src=./mgob/config,dst=/config \
  --mount type=bind,src=./mgob/storage,dst=/storage \
  -e backup__TARGET_URI=mongodb://user:password@192.168.124.111:27017,192.168.124.140:27027,192.168.124.139:27037/mongo_staging?replicaSet=mongo_staging_rs \
  -e backup__S3_BUCKET=backups \
  -e backup__S3_ACCESS_KEY=xxx \
  -e backup__S3_SECRET_KEY=xxx \
  --constraint node.labels.mongo.replica==1 \
  maxisam/mgob:1.10.14-aws -LogLevel=info

Config file:

# mgob/config/backup.yml
# 
scheduler:
  cron: "*/2 * * * *" # run every two minutes for testing
  retention: 14 # Retains 14 local backups
  timeout: 60 # Operation timeout: 60 minutes

target:
  host: "192.168.124.111,192.168.124.140"
  database: "mongo_staging"
  port: 27017
  # params: "--ssl --authenticationDatabase admin" # Additional mongodump params, leave blank if not needed
  noGzip: false # Disable gzip compression (false means compression is enabled)

s3:
  # accessKey and secretKey are optional for AWS, if your Docker image has awscli
  accessKey:
  secretKey:
  api: "S3v4"

Here are the logs:

[vagrant@manager1 ~]$ docker service logs -f mgob
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:54Z" level=info msg="mgob 1.10.14"
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:54Z" level=info msg="starting with config: &{LogLevel:info JSONLog:false Host: Port:8090 ConfigPath:/config StoragePath:/storage TmpPath:/tmp DataPath:/data Version:1.10.14 UseAwsCli:false HasGpg:false}"
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:54Z" level=info msg="mongodump version: 100.6.1 git version: 6d9d341edd33b892a2ded7bae529b0e2a96aae01 Go version: go1.19.3    os: linux    arch: amd64    compiler: gc "
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:54Z" level=info msg="Minio Client is disabled."
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="aws-cli/1.22.46 Python/3.9.16 Linux/6.5.6-300.fc39.x86_64 botocore/1.23.46 "
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="gpg (GnuPG) 2.2.31 libgcrypt 1.9.4 Copyright (C) 2021 Free Software Foundation, Inc. License GNU GPL-3.0-or-later <https://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.  Home: /root/.gnupg Supported algorithms: Pubkey: RSA, ELG, DSA, ECDH, ECDSA, EDDSA Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,         CAMELLIA128, CAMELLIA192, CAMELLIA256 Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224 Compression: Uncompressed, ZIP, ZLIB, BZIP2 "
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="Google Storage is disabled."
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="Azure Storage is disabled."
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="RClone is disabled."
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="Next tmp cleanup run at 2024-05-08 13:00:00 +0000 UTC"
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="New job found, saving to store" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="Next run at 2024-05-08 12:14:00 +0000 UTC" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:13:55Z" level=info msg="starting http server on port 8090"
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:14:00Z" level=info msg="Backup started" plan=backup

mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:16:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:18:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:20:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:22:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:24:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:26:00Z" level=info msg="Backup started" plan=backup

I see nothing in /storage nor any errors. I don't receive any mongodb requests to my mongodb service.

When I docker exec -it 23a bash into the mgob container and run a mongodump --uri=<uri>, that works. Also, going to the url 192.168.124.111:8090/backup/backup, I'd expect a request to run an ondemand backup - which also does not happen. I receive a 405 with nothing in the logs.

Is this the right image? maxisam/mgob:1.10.14-aws, I am using mongodb version 5.

madhums commented 1 month ago

Ok now I get these but no clue why it is failing!!

mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:28:00Z" level=info msg="new dump" archive= err="after 3 retries, mongodump log : exit status 1" mlog=
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:28:00Z" level=error msg="BACKUP FAILED: after 3 retries, mongodump log : exit status 1" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:28:00Z" level=info msg="Next run at 2024-05-08 12:30:00 +0000 UTC" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:30:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:30:00Z" level=info msg="new dump" archive= err="after 3 retries, mongodump log : exit status 1" mlog=
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:30:00Z" level=error msg="BACKUP FAILED: after 3 retries, mongodump log : exit status 1" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:30:00Z" level=info msg="Next run at 2024-05-08 12:32:00 +0000 UTC" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:32:00Z" level=info msg="Backup started" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:32:00Z" level=info msg="new dump" archive= err="after 3 retries, mongodump log : exit status 1" mlog=
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:32:00Z" level=error msg="BACKUP FAILED: after 3 retries, mongodump log : exit status 1" plan=backup
mgob.1.mgrroaog849a@manager1    | time="2024-05-08T12:32:00Z" level=info msg="Next run at 2024-05-08 12:34:00 +0000 UTC" plan=backup

What can I do?

madhums commented 1 month ago

Trying maxisam/mgob:2.0.20-aws but same issue.

madhums commented 1 month ago

I got it to work with this config

# mgob/config/backup.yml
# 
scheduler:
  cron: "*/2 * * * *" # run every day at 6:00 and 18:00 UTC
  retention: 14 # Retains 14 local backups
  timeout: 60 # Operation timeout: 60 minutes

target:
  uri:
  noGzip: false # Disable gzip compression (false means compression is enabled)

retry:
  attempts: 3 # number of retries
  backoffFactor: 60 # backoff factor  * (2 ^ attemptCount) seconds

validation:
  database:
    uri:
    noGzip: false

s3:
  url:
  bucket:
  accessKey:
  secretKey:
  api: "S3v4"

and changing the environment variables like so

docker service create --name mgob \
  -p 8090:8090 \
  --mount type=bind,src=./mgob/config,dst=/config \
  --mount type=bind,src=./mgob/storage,dst=/storage \
  -e BACKUP__TARGET_URI=xxx \
  -e BACKUP__VALIDATION_DATABASE_URI=xxx \
  -e BACKUP__S3_BUCKET=xxx \
  -e BACKUP__S3_ACCESSKEY=xxx \
  -e BACKUP__S3_SECRETKEY=xxx \
  --constraint node.labels.mongo.replica==1 \
  maxisam/mgob:2.0.20-aws -LogLevel=debug

Readme uses lowercase for plan names however, the PLAN.md file suggests using all caps. So some discrepency in the docs

madhums commented 1 month ago

I was having some issues with the validation step, so I removed validation block completely and now I get this weird error which I can't figure out

level=error msg="BACKUP FAILED: mc config host for plan backup failed /bin/sh: mc: not found : exit status 127" plan=backup

what is mc?

madhums commented 1 month ago

Can we improve documentation? It looks like there's one more issue #108 where validation isn't that clear and I couldn't find anything in the documentation about this step.

madhums commented 1 month ago

Looks like minioUpload function is being called here

https://github.com/maxisam/mgob/blob/0bf29263228455012e473aa7e61aedc1f62ba782/pkg/backup/s3.go#L89

which is weird because I am using s3 in my config and I am using maxisam/mgob:2.0.20-aws image.

Should mc be included in this image by default?

madhums commented 1 month ago

It looks like after including BACKUP__S3_URL environment variable, s3Upload is called and we don't have the mc not found error anymore.

However, now we get

level=error msg="BACKUP FAILED: S3 uploading /tmp/backup-1715193000.gz to backup/backup-mongo-staging-backups failed : exit status 1" plan=backup

for

docker service create --name mgob \
  -p 8090:8090 \
  --mount type=bind,src=./mgob/config,dst=/config \
  --mount type=bind,src=./mgob/storage,dst=/storage \
  -e BACKUP__TARGET_URI=xxx \
  -e BACKUP__S3_BUCKET=xxx \
  -e BACKUP__S3_ACCESSKEY=xxx \
  -e BACKUP__S3_SECRETKEY=xxx \
  -e BACKUP__S3_URL=https://s3.amazonaws.com \
  --constraint node.labels.mongo.replica==1 \
  maxisam/mgob:2.0.20-aws -LogLevel=debug

is the url supposed to be https://s3.amazonaws.com? and can the error be any more verbose than "exit status 1"??

madhums commented 1 month ago

I believe two things can help me:

madhums commented 1 month ago

Digging further, the error seems to be

An error occurred (AccessDenied) when calling the CreateMultipartUpload operation: Access Denied`

When I try to execute aws s3 cp /tmp/planxxx.gz s3://bucket_name/ I get the above.

This is an issue with the configured AWS S3 bucket policy. However, in mgob, I get the output to be simply exit status 1. It would be nice to show the actual error message.

maxisam commented 1 month ago

Thanks for all your feedback. I will try to get to them when I have time. Welcome to make a PR as well. validation step is just for testing the backup is correct or not, it basically just restore the backup to a temporary collection if I remember correctly.

You should check the test file and how it works from the Github Action. There are integration tests, which means it works for real environment.

MC is https://min.io/docs/minio/linux/reference/minio-mc.html