ngosang / restic-exporter

Prometheus exporter for the Restic backup system
MIT License
79 stars 17 forks source link

Add support for reporting number of (stale) locks #10

Closed rmi1974 closed 1 year ago

rmi1974 commented 1 year ago

Hi, first thanks for this great exporter.

Sometimes I get stale locks during container maintenance other activities. I'm using this docker container for making backups from multiple machines to minio instance:

https://github.com/djmaze/resticker/blob/master/docker-compose.example.yml

...
Backup successful
Forget about old snapshots based on RESTIC_FORGET_ARGS = --prune --keep-last 10 --keep-daily 7 --keep-weekly 5 --keep-monthly 12
Fatal: unable to create lock in backend: repository is already locked by PID 839 on restic-magician by root (UID 0, GID 0)
lock was created at 2023-03-14 18:00:00 (134h1m34.969888434s ago)
storage ID 288af877

Apart from above output, stale locks can be detected by running

$ restic -q --no-lock list locks

288af8777712c23b6e4268bc20249d5a8d18c00c0b7d1ef4034dc474ba1af727

--no-lock needs to be passed because the list command creates a lock on its own.

See:

Then I remove them manually:

$ restic unlock

repository 6239aadd opened (version 2, compression level auto)
successfully removed 1 locks

$ restic -q --no-lock list locks

It doesn't happen too frequently so I'm not bothering modifying the docker image for backup job to add such mechanism to remove locks by default. Essentially running restic unlock before the prune command.

Still, I would like to keep track/detect those with Prometheus alert rule. Would it be possible to expose this number via metrics?

ngosang commented 1 year ago

Yes, it's a good idea and it could be implemented. I'm accepting PRs but some considerations:

ngosang commented 1 year ago

Included in release 1.3.0. It's enabled by default but it can be disabled with an environment variable. I will update the Grafana Dashboard soon.