maxisam / mgob

MongoDB dockerized backup agent. Runs schedule backups with retention, S3 & SFTP upload, notifications, instrumentation with Prometheus and more.
https://maxisam.github.io/mgob/
MIT License
133 stars 19 forks source link

The `/status` endpoint provides incorrect information. #57

Open bodsch opened 1 year ago

bodsch commented 1 year ago

I force a backup via the API:

# curl -XPOST http://localhost:8090/backup/coremedia
{"plan":"coremedia","file":"coremedia-1681481854.gz","duration":"73.335066ms","size":"1.9 kB","timestamp":"2023-04-14T14:17:34.175514639Z"}

The dump was - for me - created correctly:

# ls -l /opt/mgob/storage/coremedia/coremedia-1681481854*
-rw-r--r-- 1 root root 1876 Apr 14 16:17 /opt/mgob/storage/coremedia/coremedia-1681481854.gz
-rw-r--r-- 1 root root 1134 Apr 14 16:17 /opt/mgob/storage/coremedia/coremedia-1681481854.log

But the call to /status returns a 500:

# curl -s http://localhost:8090/status/ | jq
[
  {
    "plan": "coremedia",
    "next_run": "2023-04-14T18:10:00Z",
    "last_run": "2023-04-12T18:10:00.071630282Z",
    "last_run_status": "500",
    "last_run_log": "BACKUP FAILED: after 0 retries, mongodump log : exit status 1"
  }
]

The log file shows no problems:

# cat /opt/mgob/storage/coremedia/coremedia-1681481854.log 
2023-04-14T14:17:34.202+0000    writing admin.system.users to archive '/tmp/coremedia-1681481854.gz'
2023-04-14T14:17:34.206+0000    done dumping admin.system.users (3 documents)
2023-04-14T14:17:34.206+0000    writing admin.system.version to archive '/tmp/coremedia-1681481854.gz'
2023-04-14T14:17:34.209+0000    done dumping admin.system.version (2 documents)
2023-04-14T14:17:34.209+0000    writing cms_internal_models.processes to archive '/tmp/coremedia-1681481854.gz'
2023-04-14T14:17:34.216+0000    writing cms_internal_models.processes_deleted to archive '/tmp/coremedia-1681481854.gz'
2023-04-14T14:17:34.221+0000    writing cms_internal_tasks.counter to archive '/tmp/coremedia-1681481854.gz'
2023-04-14T14:17:34.230+0000    done dumping cms_internal_models.processes (0 documents)
2023-04-14T14:17:34.231+0000    writing cms_internal_tasks.default to archive '/tmp/coremedia-1681481854.gz'
2023-04-14T14:17:34.232+0000    done dumping cms_internal_models.processes_deleted (0 documents)
2023-04-14T14:17:34.233+0000    done dumping cms_internal_tasks.counter (0 documents)
2023-04-14T14:17:34.236+0000    done dumping cms_internal_tasks.default (0 documents)

Docker output also looks good so far:

{"level":"info","msg":"Clean up temp finished Temp folder cleanup finished, `/tmp/coremedia-1681481718.gz` is removed.","plan":"coremedia","time":"2023-04-14T14:15:18Z"}
{"level":"info","msg":"On demand backup finished in 81.955669ms archive coremedia-1681481718.gz size 1.9 kB","plan":"coremedia","time":"2023-04-14T14:15:18Z"}
2023/04/14 14:15:18 "POST http://localhost:8090/backup/coremedia HTTP/1.1" from 172.17.0.1:35980 - 200 140B in 83.319544ms
2023/04/14 14:15:34 "GET http://localhost:8090/status/ HTTP/1.1" from 172.17.0.1:49964 - 200 204B in 78.22µs
{"level":"info","msg":"On demand backup started","plan":"coremedia","time":"2023-04-14T14:17:34Z"}
{"level":"debug","msg":"dump cmd: mongodump --archive=/tmp/coremedia-1681481854.gz --gzip --host mongodb --port 27017 -u \"xxx\" -p xxxx --authenticationDatabase admin ","time":"2023-04-14T14:17:34Z"}
{"archive":"/tmp/coremedia-1681481854.gz","err":null,"level":"info","mlog":"/tmp/coremedia-1681481854.log","msg":"new dump","time":"2023-04-14T14:17:34Z"}
ls: *.gz.encrypted: No such file or directory
{"level":"debug","msg":"apply retention","time":"2023-04-14T14:17:34Z"}
{"level":"info","msg":"Local backup finished Local backup finished `/tmp/coremedia-1681481854.gz` -\u003e `/storage/coremedia/coremedia-1681481854.gz` Duration: 7.381566ms","plan":"coremedia","time":"2023-04-14T14:17:34Z"}
{"level":"info","msg":"Clean up temp finished Temp folder cleanup finished, `/tmp/coremedia-1681481854.gz` is removed.","plan":"coremedia","time":"2023-04-14T14:17:34Z"}
{"level":"info","msg":"On demand backup finished in 73.335066ms archive coremedia-1681481854.gz size 1.9 kB","plan":"coremedia","time":"2023-04-14T14:17:34Z"}
2023/04/14 14:17:34 "POST http://localhost:8090/backup/coremedia HTTP/1.1" from 172.17.0.1:33498 - 200 140B in 73.68587ms
2023/04/14 14:17:40 "GET http://localhost:8090/status/ HTTP/1.1" from 172.17.0.1:33512 - 200 204B in 65.491µs
jamesholcomb commented 1 year ago

Hitting the same issue as well on 1.9.15-gcloud. Happens intermittently, sometimes the backup is successful and pushed to gcloud, sometimes it fails abut the error message does not include why.