Closed karlism closed 3 years ago
Can you check the results for the following query: SELECT j.Name, j.Type, j.ClientId, j.FileSetId, COALESCE(c.Name, ''), COALESCE(f.FileSet, ''), COUNT(*), SUM(j.JobBytes), SUM(j.JobFiles) FROM job j LEFT JOIN client c ON c.ClientId = j.ClientId LEFT JOIN fileset f ON f.FileSetId = j.FileSetId GROUP BY j.Name, j.Type, j.ClientId, j.FileSetId, c.Name, f.FileSet HAVING MAX(j.SchedTime) >= '2021-07-15 00:00:00'
For my case, the first column (j.Name
) (the job ID) is unique, but from the query output you posted, it looks like you have multiple different filesets for the same job name, which I did not know could happen during normal operations.
To determine how bareos_exporter can handle this situation, it would be helpful to know how this situation occured.
It looks like you were trying to get a job working, and trying to get it working. So I wonder if you modified the fileset at som
e point?
In that case, the failed jobs are just not the actual problem, but multiple rows for a job name being returned. As a quick fix, you can delete the failed jobs with delete jobid=?
in bconsole
. Or you can wait for 7 days (default, or the number specified with the -job-discovery-days
argument) until the jobs with an obsolete fileset ID are outside the window considered by bareos_exporter.
Hmm, I tried to reproduce this case, where a job has different filesets, and that appears to work.
I have no clue what exactly goes wrong here.
@karlism if you can run the query from my previous comment, that would be very helpful.
I don't know what to look for exactly yet, but the combination (j.Name, j.Type, c.Name, f.FileSet)
should be unique in the results.
Although it is not enforced in the database schema, I believe j.ClientId
and c.Name
(client name) map one-to-one to each other, as well as j.FileSetId
and f.FileSet
. It might be interesting to look at your client
and fileset
tables to see if my assumption there is correct.
@vierbergenlars, thanks for looking into this!
echo "SELECT j.Name, j.Type, j.ClientId, j.FileSetId, COALESCE(c.Name, ''), COALESCE(f.FileSet, ''), COUNT(*), SUM(j.JobBytes), SUM(j.JobFiles) FROM job j LEFT JOIN client c ON c.ClientId = j.ClientId LEFT JOIN fileset f ON f.FileSetId = j.FileSetId GROUP BY j.Name, j.Type, j.ClientId, j.FileSetId, c.Name, f.FileSet HAVING MAX(j.SchedTime) >= '2021-07-15 00:00:00';" | psql bareos
name | type | clientid | filesetid | coalesce | coalesce | count | sum | sum
-----------------------------------------+------+----------+-----------+----------------+--------------------------+-------+-------------+---------
backup-xxxdeploytest1-vm-xxxdeploytest1 | B | 11 | 21 | xxxdeploytest1 | vm-xxxdeploytest1 | 6 | 8274234063 | 10
backup-xxxdevketcd1-etc | B | 2 | 2 | xxxdevketcd1 | LinuxEtc | 23 | 11123529 | 1682
backup-xxxawx1-postgresql | B | 13 | 28 | xxxawx1 | PostgreSQL_13_RedHat | 4 | 0 | 0
backup-xxxdevdb2-postgresql | B | 4 | 17 | xxxdevdb2 | PostgreSQL | 2 | 0 | 0
backup-xxxdevdb2-postgresql | B | 4 | 27 | xxxdevdb2 | PostgreSQL | 1 | 0 | 0
backup-yyydevketcd1-etc | B | 3 | 2 | yyydevketcd1 | LinuxEtc | 22 | 11122419 | 1681
backup-xxxdevdb2-postgresqlbasebackup | B | 4 | 3 | xxxdevdb2 | PostgreSQL_pg_basebackup | 29 | 160831503 | 36801
backup-xxxdevdb2-postgresql | B | 4 | 26 | xxxdevdb2 | PostgreSQL | 1 | 0 | 0
backup-xxxdevdb2-postgresql | B | 4 | 14 | xxxdevdb2 | PostgreSQL | 9 | 5516504 | 1287
backup-xxxdevdb2-postgresql | B | 4 | 24 | xxxdevdb2 | PostgreSQL | 1 | 0 | 0
backup-xxxexchange1-exchangeconf | B | 8 | 10 | xxxexchange1 | Exchange_Conf | 7 | 10254884770 | 133607
backup-yyyexchange1-windowsc | B | 5 | 9 | yyyexchange1 | WindowsC | 14 | 61807025613 | 1053778
backup-zzzdeploytest3-etc | B | 6 | 2 | zzzdeploytest3 | LinuxEtc | 17 | 539737 | 1315
backup-xxxexchange1-windowsc | B | 8 | 9 | xxxexchange1 | WindowsC | 9 | 28717961249 | 526179
backup-xxxdevdb2-postgresql | B | 4 | 16 | xxxdevdb2 | PostgreSQL | 1 | 0 | 0
backup-yyydeploytest1-etc | B | 12 | 2 | yyydeploytest1 | LinuxEtc | 3 | 5424299 | 795
backup-xxxdeploytest1-etc | B | 11 | 2 | xxxdeploytest1 | LinuxEtc | 5 | 5436981 | 829
backup-xxxdevdb2-postgresql | B | 4 | 25 | xxxdevdb2 | PostgreSQL | 1 | 0 | 0
backup-xxxdevdb2-postgresql | B | 4 | 19 | xxxdevdb2 | PostgreSQL | 8 | 27681914 | 6439
backup-xxxad1-windowsc | B | 9 | 9 | xxxad1 | WindowsC | 7 | 17714683477 | 397606
RestoreFiles | R | 4 | 0 | xxxdevdb2 | | 2 | 107559404 | 2559
backup-yyydevketcd1-etcd | B | 3 | 13 | yyydevketcd1 | etcd_all | 13 | 132666174 | 76
backup-yyygit1-etc | B | 7 | 2 | yyygit1 | LinuxEtc | 17 | 7283331 | 5728
backup-xxxdevdb2-postgresql | B | 4 | 18 | xxxdevdb2 | PostgreSQL | 3 | 0 | 0
backup-yyyexchange1-exchangedb | B | 5 | 11 | yyyexchange1 | Exchange_DB | 38 | 3148882950 | 67267
backup-yyyexchange1-exchangeconf | B | 5 | 10 | yyyexchange1 | Exchange_Conf | 15 | 21539936089 | 251007
backup-yyydeploytest1-vm-yyydeploytest1 | B | 12 | 23 | yyydeploytest1 | vm-yyydeploytest1 | 8 | 8725194082 | 10
backup-xxxexchange1-exchangedb | B | 8 | 11 | xxxexchange1 | Exchange_DB | 10 | 1954248567 | 42427
backup-yyygit1-gitlab_backup | B | 7 | 6 | yyygit1 | GitLab_backup | 17 | 42480888703 | 37
backup-yyyad1-windowsc | B | 10 | 9 | yyyad1 | WindowsC | 7 | 16640927093 | 391981
RestoreFiles | R | 12 | 0 | yyydeploytest1 | | 9 | 13031844378 | 6
backup-yyybackup1-mycatalog | B | 1 | 1 | yyybackup1 | Catalog | 22 | 1469221104 | 3303
backup-xxxdeploytest1-vm-xxxdeploytest1 | B | 11 | 20 | xxxdeploytest1 | vm-xxxdeploytest1 | 1 | 0 | 0
backup-xxxdevdb2-postgresql | B | 4 | 28 | xxxdevdb2 | PostgreSQL_13_RedHat | 4 | 0 | 0
backup-yyydeploytest1-vm-yyydeploytest1 | B | 12 | 22 | yyydeploytest1 | vm-yyydeploytest1 | 1 | 0 | 0
backup-xxxdevdb2-postgresql | B | 4 | 15 | xxxdevdb2 | PostgreSQL | 2 | 0 | 0
(36 rows)
Yes, it is possible that fileset has been changed at some point as we are now in process of switching from different backup solution to BareOS so me and my colleagues are doing lots of tests and adjustments to existing BareOS configuration.
According to this, bareos can have multiple entries with the same fileset name. Bareos will select the newest one.
When creating a new fileset record, bareos will first check if one with the same name and MD5 sum exists (https://github.com/bareos/bareos/blob/a89617abc0f40425a1a9faa00c1825200be6777a/core/src/cats/sql_create.cc#L702-L705), if it does not, it will insert a new record (https://github.com/bareos/bareos/blob/a89617abc0f40425a1a9faa00c1825200be6777a/core/src/cats/sql_create.cc#L747-L750).
So, multiple filesets with the same name and different ids should be handled. Ideally, they are handled as the single fileset they are. The difference between these is the contents of include/exclude rules, which we don't care about as a metrics exporter.
@vierbergenlars, thanks a lot for fixing this! I will try it out as soon as 0.5.7 binaries are published.
BareOS exporter (version 0.5.6) stops working when there are failed backup jobs:
Here's database entries related to this backup job:
Judging by Prometheus
up
metric, I can see that exporter stopped working around 10:08, thus I conclude that this failure was triggered by jobid 243. I'm using BareOS 20.0.1 and PostgreSQL 13 backend database.