pgstef / check_pgbackrest

pgBackRest backup check plugin for Nagios
PostgreSQL License
35 stars 14 forks source link

New release planned soon? #23

Closed anayrat closed 2 years ago

anayrat commented 2 years ago

Hello!

Any new release planned soon since support of multi-repo feature? :)

Thanks

pgstef commented 2 years ago

Hi,

Thanks for asking :-) Have you tried the dev version yet?

I'm currently working on improving the multi-repo support. I'm looking how to operate on all repo defined dynamically. But it has some major code impacts mostly for the archives service.

I'm aiming to get a new release ready by the end of this month, with or without that improvement.

Kind regards

pgstef commented 2 years ago

I've just pushed the initial rework of that implementation. You already can try the dev version of this PR :-) I'll look at adding extra tests for this feature next week.

anayrat commented 2 years ago

Hi,

Hi!

Thanks for asking :-) Have you tried the dev version yet?

Yes, I tried dev version and one from #24.

I like the idea to not make --repo mandatory. This avoid forgetting to add a new check service when we add a new repository for backup. I just wonder if the perdata are meaningful as the counter aggregates all repositories. Maybe we should have separate counters:

Before:

full=5 diff=12 incr=0 latest=diff,20210906-170207F_20210913-060002D latest_age=27631s latest_full=20210912-050002F latest_full_age=113017s

After:

repo1-full=1  repo1-diff=6 repo1-incr=0 repo1-latest=diff,20210906-170207F_20210913-060002D repo1-latest_age=27631s repo1-latest_full=20210912-050002F repo1-latest_full_age=113017s
repo2-full=4  repo2-diff=6 repo2-incr=0 repo2-latest=diff,20210912-050002F_20210913-050001D repo2-latest_age=27631s repo2-latest_full=20210912-050002F repo2-latest_full_age=113017s

I also tried archive service and I think we should separate counters or make --repo mandatory. Actual counters are wrong:

check_pgbackrest-dev -S archive  -s archives  --repo 2
WAL_ARCHIVES OK - 2512 WAL archived, latest archived since 7h35m44s | latest_archive_age=27344s num_archives=2512
check_pgbackrest-dev -S archive  -s archives  --repo 1
WAL_ARCHIVES OK - 8009 WAL archived, latest archived since 7h37m17s | latest_archive_age=27437s num_archives=8009

# Without repo option
check_pgbackrest-dev -S archive  -s archives 
WAL_ARCHIVES OK - 8009 WAL archived, latest archived since 7h40m6s | latest_archive_age=27606s num_archives=8009

I'm currently working on improving the multi-repo support. I'm looking how to operate on all repo defined dynamically. But it has some major code impacts mostly for the archives service.

I'm aiming to get a new release ready by the end of this month, with or without that improvement.

Thanks!

pgstef commented 2 years ago

I just wonder if the perdata are meaningful as the counter aggregates all repositories.

I believe there's a purpose to keep the counters aggregated. If you want to know how many backups your have inside all your repos and which one is the latest. If you want to enforce a retention check on a specific repo, you'll need the --repo option.

So it's not really for cases where a user would "forget" to add the extra service but more about checking the consistency accros the repos.

I also tried archive service and I think we should separate counters or make --repo mandatory. Actual counters are wrong:

check_pgbackrest-dev -S archive  -s archives  --repo 2
WAL_ARCHIVES OK - 2512 WAL archived, latest archived since 7h35m44s | latest_archive_age=27344s num_archives=2512
check_pgbackrest-dev -S archive  -s archives  --repo 1
WAL_ARCHIVES OK - 8009 WAL archived, latest archived since 7h37m17s | latest_archive_age=27437s num_archives=8009

# Without repo option
check_pgbackrest-dev -S archive  -s archives 
WAL_ARCHIVES OK - 8009 WAL archived, latest archived since 7h40m6s | latest_archive_age=27606s num_archives=8009

Actually, I don't believe the counter is wrong. It's the number of "unique" wal archives you have accros your repos. --repo allow to check the repo content for each repo, but without the repo option, it allows to check the consistency accross all of them.

Imagine you miss 1 wal archive in repo1 but it is actually present in repo2. You'd have to check that manually and ack the alert. Where here, since you have the archive, archive-get will find it and allow the recovery.

I strongly believe having the "per repo" and the "accros repos" checks are complimentary, not exclusives. However, I agree the readme or at least the output messages could be improved to reflect that behavior.

Perhaps "8009 WAL archived" -> "8009 unique WAL archived", and num_archives -> num_unique_archives ?

anayrat commented 2 years ago

Indeed, it is more clear. +1 to your proposal for README and output message.

pgstef commented 2 years ago

Indeed, it is more clear. +1 to your proposal for README and output message.

Changed in https://github.com/pgstef/check_pgbackrest/commit/2b8d950b5aca72cf617a0cbffa3f5d0bb1012eda. Thanks for your feedback ;-)

I'll work on the tests and prepare a new release asap then :-)

pgstef commented 2 years ago

Hi, 2.1 has just been released ;-)

anayrat commented 2 years ago

Thanks! I will try.