pgstef / check_pgbackrest

pgBackRest backup check plugin for Nagios
PostgreSQL License
36 stars 14 forks source link

--service=archives list missing archives but it's not true #15

Closed sebastienruiz closed 3 years ago

sebastienruiz commented 3 years ago

Hi,

(i'm french)

this is my pgbackrest (v1.9) command : check_pgbackrest --stanza=IPMUTX6 --service=archives --output=human --ignore-archived-before='900s' --debug --repo-path=/mnt/backup_postgresql_production/MUT/IPMUTX6/archive

this the output : DEBUG: pgBackRest info command was : 'pgbackrest info --stanza=IPMUTX6 --output=json --log-level-console=error' DEBUG: !> pgBackRest info took 0s DEBUG: archives_dir: /mnt/backup_postgresql_production/MUT/IPMUTX6//archive/IPMUTX6/11-1 DEBUG: Get all the WAL archives and history files... DEBUG: pgBackRest version command was : 'pgbackrest version' DEBUG: history file to open : 00000011.history DEBUG: ignored file 000000110000008A00000000-7bcad28777aaac37c0d195b974e12c52280d67fc.gz as interval since epoch : 21h47m31s DEBUG: ignored file 000000110000008A00000001-b0657c4b22dff3c995dc5dd27112e596c8ce1ea2.gz as interval since epoch : 21h44m15s etc... DEBUG: ignored file 0000001100000089000000FC-64f4a3d396b2838d845cfe545ab5972ef5a2c624.gz as interval since epoch : 21h55m39s DEBUG: ignored file 0000001100000089000000FD-5b43e547aba7d5b7169407dd02ad6a3c48114327.gz as interval since epoch : 21h55m27s DEBUG: ignored file 0000001100000089000000FE-c69c730adb4a9aa9bfb8605815bd16cf541a417b.gz as interval since epoch : 21h54m45s DEBUG: ignored file 0000001100000089000000FF-167f9b5f590ab6f64aba04fe0515934168f00c90.gz as interval since epoch : 21h51m31s DEBUG: !> Get all the WAL archives and history files took 1s DEBUG: min_wal changed to 000000110000008A00000095 DEBUG: Get all the needed wal archives... DEBUG: !> Get all the needed wal archives took 0s DEBUG: !> Go through needed wal list and check took 0s DEBUG: Get all the needed wal archives for 20201108-220002F... DEBUG: Get all the needed wal archives for 20201108-220002F_20201109-222502I... DEBUG: Get all the needed wal archives for 20201108-220002F_20201110-222503I... DEBUG: Get all the needed wal archives for 20201108-220002F_20201111-222501I... DEBUG: Get all the needed wal archives for 20201108-220002F_20201112-222602I... DEBUG: Get all the needed wal archives for 20201108-220002F_20201113-222602I... DEBUG: Get all the needed wal archives for 20201108-220002F_20201114-222601I... DEBUG: Get all the needed wal archives for 20201115-220102F... DEBUG: Get all the needed wal archives for 20201115-220102F_20201116-222602I... DEBUG: Get all the needed wal archives for 20201115-220102F_20201117-222603I... DEBUG: Get all the needed wal archives for 20201115-220102F_20201118-222602I... DEBUG: Get all the needed wal archives for 20201115-220102F_20201119-222602I... DEBUG: Get all the needed wal archives for 20201115-220102F_20201120-222602I... DEBUG: Get all the needed wal archives for 20201115-220102F_20201121-222602I... DEBUG: Get all the needed wal archives for 20201122-220102F... DEBUG: Get all the needed wal archives for 20201122-220102F_20201123-222602I... DEBUG: Get all the needed wal archives for 20201122-220102F_20201124-112313I... DEBUG: Get all the needed wal archives for 20201122-220102F_20201124-222502I... DEBUG: Get all the needed wal archives for 20201122-220102F_20201125-222501I... DEBUG: Get all the needed wal archives for 20201122-220102F_20201126-222502I... DEBUG: Get all the needed wal archives for 20201122-220102F_20201127-222502I... DEBUG: Get all the needed wal archives for 20201122-220102F_20201128-222502I... DEBUG: Get all the needed wal archives for 20201129-220002F... DEBUG: Get all the needed wal archives for 20201129-220002F_20201130-222502I... DEBUG: Get all the needed wal archives for 20201129-220002F_20201201-222502I... DEBUG: Get all the needed wal archives for 20201129-220002F_20201202-222502I... DEBUG: Get all the needed wal archives for 20201129-220002F_20201203-222502I... DEBUG: Get all the needed wal archives for 20201129-220002F_20201204-222502I... DEBUG: Get all the needed wal archives for 20201129-220002F_20201205-170350I... DEBUG: Get all the needed wal archives for 20201129-220002F_20201205-222502I... DEBUG: Get all the needed wal archives for 20201206-220003F... DEBUG: Get all the needed wal archives for 20201206-220003F_20201207-222502I... DEBUG: !> Go through each backup, get the needed wal and check took 0s DEBUG: missing 0000000F00000074000000F1 DEBUG: missing 0000000F000000750000009B DEBUG: missing 0000000F000000760000004E DEBUG: missing 0000000F0000007600000094 DEBUG: missing 0000000F000000770000003D DEBUG: missing 0000000F0000007800000009 DEBUG: missing 0000000F0000007800000058 DEBUG: missing 0000000F0000007800000089 DEBUG: missing 0000000F000000780000008A DEBUG: missing 0000000F000000780000008B DEBUG: missing 0000000F0000007900000041 DEBUG: missing 0000000F0000007A0000001A DEBUG: missing 0000000F0000007A000000E4 DEBUG: missing 0000000F0000007B000000A9 DEBUG: missing 0000000F0000007C0000006D DEBUG: missing 0000000F0000007C000000C3 DEBUG: missing 0000000F0000007C000000F8 DEBUG: missing 0000000F0000007C000000F9 DEBUG: missing 0000000F0000007C000000FA DEBUG: missing 0000000F0000007D000000BB DEBUG: missing 0000000F000000810000004A DEBUG: missing 0000000F000000810000004B DEBUG: missing 0000000F000000810000004C DEBUG: missing 0000000F000000810000004D DEBUG: missing 0000000F000000810000004E DEBUG: missing 0000000F00000081000000E8 DEBUG: missing 0000000F00000082000000B6 DEBUG: missing 0000000F000000830000007F DEBUG: missing 0000000F000000840000003A DEBUG: missing 0000000F000000840000008B DEBUG: missing 0000000F00000084000000BC DEBUG: missing 0000000F00000084000000BD DEBUG: missing 0000000F00000084000000BE DEBUG: missing 0000000F0000008500000074 DEBUG: missing 0000000F0000008600000052 DEBUG: missing 0000000F0000008700000049 DEBUG: missing 0000000F000000880000002A DEBUG: missing 0000000F00000088000000FD DEBUG: missing 000000100000008900000049 DEBUG: missing 00000010000000890000004A DEBUG: missing 00000010000000890000004B DEBUG: missing 00000010000000890000005E DEBUG: missing 00000010000000890000008F DEBUG: missing 000000100000008900000090 DEBUG: missing 000000100000008900000091 DEBUG: missing 000000110000008A00000058 Service : WAL_ARCHIVES Returns : 2 (CRITICAL) Message : wrong sequence, 46 missing file(s) (0000000F00000074000000F1 / 000000110000008A00000058) Long message : latest_archive_age=2s Long message : num_archives=14 Long message : num_missing_archives=46 Long message : oldest_missing_archive=0000000F00000074000000F1 Long message : latest_missing_archive=000000110000008A00000058 Long message : archives_dir=/mnt/backup_postgresql_production/MUT/IPMUTX6//archive/IPMUTX6/11-1 Long message : min_wal=000000110000008A00000095 Long message : max_wal=000000110000008A000000A2 Long message : latest_archive=000000110000008A000000A2 Long message : latest_bck_archive_start=000000110000008A00000058 Long message : latest_bck_type=incr Long message : oldest_archive=000000110000008A00000095 Long message : oldest_bck_archive_start=0000000F00000074000000F1 Long message : oldest_bck_type=full

But its not true, missing files exist in reality : example : DEBUG: missing 0000000F000000750000009B

-rw-r----- 1 postgres dba 608K Nov 9 22:27 0000000F000000750000009B-b7b4b8416ee840cb3d18692223fbd1f1021d4fd6.gz

or

DEBUG: missing 0000000F00000084000000BD

-rw-r----- 1 postgres dba 17K Nov 29 22:05 0000000F00000084000000BD-998288ec10b0e82ecd993294665ac3c8bb533dee.gz

So what's wrong ?

Thanks for your help.

pgstef commented 3 years ago

Bonjour,

Pourriez-vous fournir la sortie de pgbackrest info --stanza=IPMUTX6 --output=json ?

J'ai l'impression que les archives mentionnées comme manquantes ont été générées avant "900s" et que du coup l'option --ignore-archived-before='900s' ne joue pas complètement son rôle. Sachant que toutes les archives nécessaires à la cohérence des backups doivent être présentes, elles seront d'office vérifées mais manquantes dans la liste à cause de cette option.

Avez-vous réellement l'utilité de cette option ? Je songe en effet très sérieusement à la déprécier en version 2.

sebastienruiz commented 3 years ago

bonjour,

voici :

[{"archive":[{"database":{"id":1},"id":"11-1","max":"000000110000008A000000B2","min":"0000000F00000074000000F1"}],"backup":[{"archive":{"start":"0000000F00000074000000F1","stop":"0000000F00000074000000F1"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":24328283589,"repository":{"delta":8373400243,"size":8373400243},"size":24328283589},"label":"20201108-220002F","prior":null,"reference":null,"timestamp":{"start":1604869202,"stop":1604869492},"type":"full"},{"archive":{"start":"0000000F000000750000009B","stop":"0000000F000000750000009B"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":12112650613,"repository":{"delta":2790748869,"size":8715231956},"size":24791918021},"label":"20201108-220002F_20201109-222502I","prior":"20201108-220002F","reference":["20201108-220002F"],"timestamp":{"start":1604957102,"stop":1604957234},"type":"incr"},{"archive":{"start":"0000000F000000760000004E","stop":"0000000F000000760000004E"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":12298609013,"repository":{"delta":2888260142,"size":8811349160},"size":24969848261},"label":"20201108-220002F_20201110-222503I","prior":"20201108-220002F_20201109-222502I","reference":["20201108-220002F","20201108-220002F_20201109-222502I"],"timestamp":{"start":1605043503,"stop":1605043639},"type":"incr"},{"archive":{"start":"0000000F0000007600000094","stop":"0000000F0000007600000094"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":10349683061,"repository":{"delta":1928293108,"size":8691907654},"size":24838170053},"label":"20201108-220002F_20201111-222501I","prior":"20201108-220002F_20201110-222503I","reference":["20201108-220002F","20201108-220002F_20201109-222502I","20201108-220002F_20201110-222503I"],"timestamp":{"start":1605129901,"stop":1605130012},"type":"incr"},{"archive":{"start":"0000000F000000770000003D","stop":"0000000F000000770000003D"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":12492374389,"repository":{"delta":2981651006,"size":8904245168},"size":25157166533},"label":"20201108-220002F_20201112-222602I","prior":"20201108-220002F_20201111-222501I","reference":["20201108-220002F","20201108-220002F_20201109-222502I","20201108-220002F_20201110-222503I","20201108-220002F_20201111-222501I"],"timestamp":{"start":1605216362,"stop":1605216501},"type":"incr"},{"archive":{"start":"0000000F0000007800000009","stop":"0000000F0000007800000009"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":14119903603,"repository":{"delta":3969439575,"size":9263757139},"size":25717843395},"label":"20201108-220002F_20201113-222602I","prior":"20201108-220002F_20201112-222602I","reference":["20201108-220002F","20201108-220002F_20201109-222502I","20201108-220002F_20201110-222503I","20201108-220002F_20201111-222501I","20201108-220002F_20201112-222602I"],"timestamp":{"start":1605302762,"stop":1605302919},"type":"incr"},{"archive":{"start":"0000000F0000007800000058","stop":"0000000F0000007800000058"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":11025801589,"repository":{"delta":2339669421,"size":9108086518},"size":25548408261},"label":"20201108-220002F_20201114-222601I","prior":"20201108-220002F_20201113-222602I","reference":["20201108-220002F","20201108-220002F_20201109-222502I","20201108-220002F_20201110-222503I","20201108-220002F_20201111-222501I","20201108-220002F_20201112-222602I","20201108-220002F_20201113-222602I"],"timestamp":{"start":1605389161,"stop":1605389281},"type":"incr"},{"archive":{"start":"0000000F0000007800000089","stop":"0000000F000000780000008B"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":25232680389,"repository":{"delta":8822613924,"size":8822613924},"size":25232680389},"label":"20201115-220102F","prior":null,"reference":null,"timestamp":{"start":1605474062,"stop":1605474364},"type":"full"},{"archive":{"start":"0000000F0000007900000041","stop":"0000000F0000007900000041"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":14318231925,"repository":{"delta":4079909513,"size":9378128486},"size":25938822597},"label":"20201115-220102F_20201116-222602I","prior":"20201115-220102F","reference":["20201115-220102F"],"timestamp":{"start":1605561962,"stop":1605562123},"type":"incr"},{"archive":{"start":"0000000F0000007A0000001A","stop":"0000000F0000007A0000001A"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":14561812853,"repository":{"delta":4214148820,"size":9511053761},"size":26176423365},"label":"20201115-220102F_20201117-222603I","prior":"20201115-220102F_20201116-222602I","reference":["20201115-220102F","20201115-220102F_20201116-222602I"],"timestamp":{"start":1605648363,"stop":1605648524},"type":"incr"},{"archive":{"start":"0000000F0000007A000000E4","stop":"0000000F0000007A000000E4"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":14654095733,"repository":{"delta":4323743128,"size":9504525648},"size":26258220485},"label":"20201115-220102F_20201118-222602I","prior":"20201115-220102F_20201117-222603I","reference":["20201115-220102F","20201115-220102F_20201116-222602I","20201115-220102F_20201117-222603I"],"timestamp":{"start":1605734762,"stop":1605734929},"type":"incr"},{"archive":{"start":"0000000F0000007B000000A9","stop":"0000000F0000007B000000A9"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":13943161205,"repository":{"delta":3859405422,"size":9743465399},"size":26628335045},"label":"20201115-220102F_20201119-222602I","prior":"20201115-220102F_20201118-222602I","reference":["20201115-220102F","20201115-220102F_20201116-222602I","20201115-220102F_20201117-222603I","20201115-220102F_20201118-222602I"],"timestamp":{"start":1605821162,"stop":1605821317},"type":"incr"},{"archive":{"start":"0000000F0000007C0000006D","stop":"0000000F0000007C0000006D"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":14581113205,"repository":{"delta":4471971713,"size":9733555908},"size":26698311109},"label":"20201115-220102F_20201120-222602I","prior":"20201115-220102F_20201119-222602I","reference":["20201115-220102F","20201115-220102F_20201116-222602I","20201115-220102F_20201117-222603I","20201115-220102F_20201118-222602I","20201115-220102F_20201119-222602I"],"timestamp":{"start":1605907562,"stop":1605907731},"type":"incr"},{"archive":{"start":"0000000F0000007C000000C3","stop":"0000000F0000007C000000C3"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":11141767541,"repository":{"delta":2316009720,"size":9765874020},"size":26736805317},"label":"20201115-220102F_20201121-222602I","prior":"20201115-220102F_20201120-222602I","reference":["20201115-220102F","20201115-220102F_20201116-222602I","20201115-220102F_20201117-222603I","20201115-220102F_20201118-222602I","20201115-220102F_20201119-222602I","20201115-220102F_20201120-222602I"],"timestamp":{"start":1605993962,"stop":1605994081},"type":"incr"},{"archive":{"start":"0000000F0000007C000000F8","stop":"0000000F0000007C000000FA"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":26028189125,"repository":{"delta":9117366778,"size":9117366778},"size":26028189125},"label":"20201122-220102F","prior":null,"reference":null,"timestamp":{"start":1606078862,"stop":1606079177},"type":"full"},{"archive":{"start":"0000000F0000007D000000BB","stop":"0000000F0000007D000000BB"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":14004101493,"repository":{"delta":3765404912,"size":9652125038},"size":26704217541},"label":"20201122-220102F_20201123-222602I","prior":"20201122-220102F","reference":["20201122-220102F"],"timestamp":{"start":1606166762,"stop":1606166921},"type":"incr"},{"archive":{"start":"0000000F000000810000004A","stop":"0000000F000000810000004E"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":25376334197,"repository":{"delta":8771385840,"size":10282442809},"size":29421561285},"label":"20201122-220102F_20201124-112313I","prior":"20201122-220102F_20201123-222602I","reference":["20201122-220102F","20201122-220102F_20201123-222602I"],"timestamp":{"start":1606213393,"stop":1606213906},"type":"incr"},{"archive":{"start":"0000000F00000081000000E8","stop":"0000000F00000081000000E8"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":23915258229,"repository":{"delta":9505855710,"size":10304828173},"size":29492807109},"label":"20201122-220102F_20201124-222502I","prior":"20201122-220102F_20201124-112313I","reference":["20201122-220102F","20201122-220102F_20201123-222602I","20201122-220102F_20201124-112313I"],"timestamp":{"start":1606253102,"stop":1606253381},"type":"incr"},{"archive":{"start":"0000000F00000082000000B6","stop":"0000000F00000082000000B6"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":12541133173,"repository":{"delta":2936988962,"size":10184984262},"size":29428295109},"label":"20201122-220102F_20201125-222501I","prior":"20201122-220102F_20201124-222502I","reference":["20201122-220102F","20201122-220102F_20201123-222602I","20201122-220102F_20201124-112313I","20201122-220102F_20201124-222502I"],"timestamp":{"start":1606339501,"stop":1606339640},"type":"incr"},{"archive":{"start":"0000000F000000830000007F","stop":"0000000F000000830000007F"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":13133685108,"repository":{"delta":3017479051,"size":10182725297},"size":29500253636},"label":"20201122-220102F_20201126-222502I","prior":"20201122-220102F_20201125-222501I","reference":["20201122-220102F","20201122-220102F_20201123-222602I","20201122-220102F_20201124-112313I","20201122-220102F_20201124-222502I","20201122-220102F_20201125-222501I"],"timestamp":{"start":1606425902,"stop":1606426046},"type":"incr"},{"archive":{"start":"0000000F000000840000003A","stop":"0000000F000000840000003A"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":13153001845,"repository":{"delta":2990430727,"size":10157049171},"size":29530236357},"label":"20201122-220102F_20201127-222502I","prior":"20201122-220102F_20201126-222502I","reference":["20201122-220102F","20201122-220102F_20201123-222602I","20201122-220102F_20201124-112313I","20201122-220102F_20201124-222502I","20201122-220102F_20201125-222501I","20201122-220102F_20201126-222502I"],"timestamp":{"start":1606512302,"stop":1606512448},"type":"incr"},{"archive":{"start":"0000000F000000840000008B","stop":"0000000F000000840000008B"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":12050497909,"repository":{"delta":2548682227,"size":9859131164},"size":29200172485},"label":"20201122-220102F_20201128-222502I","prior":"20201122-220102F_20201127-222502I","reference":["20201122-220102F","20201122-220102F_20201123-222602I","20201122-220102F_20201124-112313I","20201122-220102F_20201124-222502I","20201122-220102F_20201125-222501I","20201122-220102F_20201126-222502I","20201122-220102F_20201127-222502I"],"timestamp":{"start":1606598702,"stop":1606598840},"type":"incr"},{"archive":{"start":"0000000F00000084000000BC","stop":"0000000F00000084000000BE"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":29200467397,"repository":{"delta":9859125551,"size":9859125551},"size":29200467397},"label":"20201129-220002F","prior":null,"reference":null,"timestamp":{"start":1606683602,"stop":1606683945},"type":"full"},{"archive":{"start":"0000000F0000008500000074","stop":"0000000F0000008500000074"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":13311140213,"repository":{"delta":3066800077,"size":10232809605},"size":29680903621},"label":"20201129-220002F_20201130-222502I","prior":"20201129-220002F","reference":["20201129-220002F"],"timestamp":{"start":1606771502,"stop":1606771648},"type":"incr"},{"archive":{"start":"0000000F0000008600000052","stop":"0000000F0000008600000052"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":13252460916,"repository":{"delta":3139726823,"size":10367189831},"size":29975250372},"label":"20201129-220002F_20201201-222502I","prior":"20201129-220002F_20201130-222502I","reference":["20201129-220002F","20201129-220002F_20201130-222502I"],"timestamp":{"start":1606857902,"stop":1606858052},"type":"incr"},{"archive":{"start":"0000000F0000008700000049","stop":"0000000F0000008700000049"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":13560283509,"repository":{"delta":3068196073,"size":10232054030},"size":29917185477},"label":"20201129-220002F_20201202-222502I","prior":"20201129-220002F_20201201-222502I","reference":["20201129-220002F","20201129-220002F_20201130-222502I","20201129-220002F_20201201-222502I"],"timestamp":{"start":1606944302,"stop":1606944453},"type":"incr"},{"archive":{"start":"0000000F000000880000002A","stop":"0000000F000000880000002A"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":13659922805,"repository":{"delta":3106987420,"size":10271407256},"size":30020044229},"label":"20201129-220002F_20201203-222502I","prior":"20201129-220002F_20201202-222502I","reference":["20201129-220002F","20201129-220002F_20201130-222502I","20201129-220002F_20201201-222502I","20201129-220002F_20201202-222502I"],"timestamp":{"start":1607030702,"stop":1607030855},"type":"incr"},{"archive":{"start":"0000000F00000088000000FD","stop":"0000000F00000088000000FD"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":13074743669,"repository":{"delta":3000522544,"size":10273968514},"size":30070998469},"label":"20201129-220002F_20201204-222502I","prior":"20201129-220002F_20201203-222502I","reference":["20201129-220002F","20201129-220002F_20201130-222502I","20201129-220002F_20201201-222502I","20201129-220002F_20201202-222502I","20201129-220002F_20201203-222502I"],"timestamp":{"start":1607117102,"stop":1607117249},"type":"incr"},{"archive":{"start":"000000100000008900000049","stop":"00000010000000890000004B"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":12845516434,"repository":{"delta":2954907378,"size":9712267720},"size":25638192645},"label":"20201129-220002F_20201205-170350I","prior":"20201129-220002F_20201204-222502I","reference":["20201129-220002F","20201129-220002F_20201130-222502I","20201129-220002F_20201201-222502I","20201129-220002F_20201202-222502I","20201129-220002F_20201203-222502I","20201129-220002F_20201204-222502I"],"timestamp":{"start":1607184230,"stop":1607184387},"type":"incr"},{"archive":{"start":"00000010000000890000005E","stop":"00000010000000890000005E"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":5383848308,"repository":{"delta":1142433342,"size":9442042983},"size":25339307524},"label":"20201129-220002F_20201205-222502I","prior":"20201129-220002F_20201205-170350I","reference":["20201129-220002F","20201129-220002F_20201130-222502I","20201129-220002F_20201201-222502I","20201129-220002F_20201202-222502I","20201129-220002F_20201203-222502I","20201129-220002F_20201204-222502I","20201129-220002F_20201205-170350I"],"timestamp":{"start":1607203502,"stop":1607203561},"type":"incr"},{"archive":{"start":"00000010000000890000008F","stop":"000000100000008900000091"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":25305572869,"repository":{"delta":9411224497,"size":9411224497},"size":25305572869},"label":"20201206-220003F","prior":null,"reference":null,"timestamp":{"start":1607288403,"stop":1607288738},"type":"full"},{"archive":{"start":"000000110000008A00000058","stop":"000000110000008A00000058"},"backrest":{"format":5,"version":"2.27"},"database":{"id":1},"info":{"delta":13849715053,"repository":{"delta":3193296422,"size":9799427954},"size":25790146053},"label":"20201206-220003F_20201207-222502I","prior":"20201206-220003F","reference":["20201206-220003F"],"timestamp":{"start":1607376302,"stop":1607376468},"type":"incr"}],"cipher":"none","db":[{"id":1,"system-id":6702023997101941935,"version":"11"}],"name":"IPMUTX6","status":{"code":0,"lock":{"backup":{"held":false}},"message":"ok"}}]

sebastienruiz commented 3 years ago

et pour la question : "Avez-vous réellement l'utilité de cette option ? Je songe en effet très sérieusement à la déprécier en version 2."

Oui j'ai utilisé cette option car je cherchais à éviter d'autres faux positifs liés à un changement de timeline (bascule master/slave)

pgstef commented 3 years ago

Effectivement, il semble y avoir eu 2 changements de timeline:

Clairement, ici le comportement erroné vient de cette option. Elle ignore les archives présentes sur disque mais a besoin de ces archives pour la cohérence des backups. Donc clairement, pas faite pour ce cas précis. Difficile d'en imaginer un cas concret d'utilisation d'ailleurs, d'où l'idée de la déprécier dans la prochaine version.

Dans votre cas, il serait donc plus intéressant de vous passer de cette option et de travailler sur ces éventuels faux positifs liés au changement de timeline. En effet, il s'agit là d'un très bon exemple d'utilisation. Cela devrait fonctionner sans faux positif.

L'idée de la sonde est de récupérer le .history de la dernière timeline (ici 00000011), d'en extraire pour chaque saut de timeline le WAL "frontière" et ainsi pouvoir remonter la chaîne depuis la timeline de départ 0000000F. En mode debug, le message found a boundary.. devrait ainsi apparaître.

Pourriez-vous réessayer sans l'option ignore-archived-before du coup ? Si la sonde remonte des alertes, on pourra alors s'y pencher.

sebastienruiz commented 3 years ago

Quand je n'utilise pas cette option, la commande tombe (out of memory) c'est pourquoi je n'utilise pas l'option --extended-check.

Bon pour l'exemple du faux positif sur la timeline, voici un exemple sur une autre instance plus petite :

check_pgbackrest --stanza=IPHDRX1 --service=archives --output=human --debug --repo-path=/mnt/backup_postgresql_production/HDR/IPHDRX1/archive

DEBUG: pgBackRest info command was : 'pgbackrest info --stanza=IPHDRX1 --output=json --log-level-console=error' DEBUG: !> pgBackRest info took 0s DEBUG: archives_dir: /mnt/backup_postgresql_production/HDR/IPHDRX1/archive/IPHDRX1/11-1 DEBUG: min_wal changed to 000000030000000B0000008D DEBUG: Get all the WAL archives and history files... DEBUG: pgBackRest version command was : 'pgbackrest version' DEBUG: history file to open : 00000004.history DEBUG: ignored file 000000030000000B0000008B older than 000000030000000B0000008D DEBUG: ignored file 000000030000000B0000008C older than 000000030000000B0000008D DEBUG: !> Get all the WAL archives and history files took 1s DEBUG: Get all the needed wal archives... DEBUG: found a boundary @ '000000030000000E000000F9' ! DEBUG: !> Get all the needed wal archives took 0s DEBUG: !> Go through needed wal list and check took 0s DEBUG: Get all the needed wal archives for 20201118-144551F... DEBUG: Get all the needed wal archives for 20201118-144551F_20201118-150045I... DEBUG: Get all the needed wal archives for 20201118-144551F_20201118-212502I... DEBUG: Get all the needed wal archives for 20201118-144551F_20201119-212502I... DEBUG: Get all the needed wal archives for 20201118-144551F_20201120-212502I... DEBUG: Get all the needed wal archives for 20201118-144551F_20201121-212501I... DEBUG: Get all the needed wal archives for 20201122-210002F... DEBUG: Get all the needed wal archives for 20201122-210002F_20201123-212502I... DEBUG: Get all the needed wal archives for 20201122-210002F_20201124-212502I... DEBUG: Get all the needed wal archives for 20201122-210002F_20201125-212502I... DEBUG: Get all the needed wal archives for 20201122-210002F_20201126-212502I... DEBUG: Get all the needed wal archives for 20201122-210002F_20201127-212502I... DEBUG: Get all the needed wal archives for 20201122-210002F_20201128-212502I... DEBUG: Get all the needed wal archives for 20201129-210002F... DEBUG: Get all the needed wal archives for 20201129-210002F_20201130-212502I... DEBUG: Get all the needed wal archives for 20201129-210002F_20201201-212502I... DEBUG: Get all the needed wal archives for 20201129-210002F_20201202-212501I... DEBUG: Get all the needed wal archives for 20201129-210002F_20201203-212502I... DEBUG: Get all the needed wal archives for 20201129-210002F_20201204-212503I... DEBUG: Get all the needed wal archives for 20201129-210002F_20201205-212502I... DEBUG: Get all the needed wal archives for 20201206-210002F... DEBUG: Get all the needed wal archives for 20201206-210002F_20201207-212502I... DEBUG: !> Go through each backup, get the needed wal and check took 0s DEBUG: missing 000000030000000E000000F9 Service : WAL_ARCHIVES Returns : 1 (WARNING) Message : wrong sequence, 1 missing file(s) (000000030000000E000000F9 / 000000030000000E000000F9) Long message : latest_archive_age=57m6s Long message : num_archives=1022 Long message : archives_dir=/mnt/backup_postgresql_production/HDR/IPHDRX1/archive/IPHDRX1/11-1 Long message : min_wal=000000030000000B0000008D Long message : max_wal=000000040000000F0000008A Long message : latest_archive=000000040000000F0000008A Long message : latest_bck_archive_start=000000040000000F0000006A Long message : latest_bck_type=incr Long message : oldest_archive=000000030000000B0000008D Long message : oldest_bck_archive_start=000000030000000B0000008D Long message : oldest_bck_type=full

total 188K drwxr-x--- 2 postgres dba 20K Nov 20 17:05 000000030000000B drwxr-x--- 2 postgres dba 40K Nov 25 17:05 000000030000000C drwxr-x--- 2 postgres dba 40K Nov 30 18:05 000000030000000D drwxr-x--- 2 postgres dba 40K Dec 5 16:20 000000030000000E -rw-r----- 1 postgres dba 127 Dec 5 16:20 00000004.history drwxr-x--- 2 postgres dba 4.0K Dec 5 20:05 000000040000000E drwxr-x--- 2 postgres dba 24K Dec 8 14:05 000000040000000F

cd 000000040000000E ll total 140K Le fichier est bien là mais avec une autre timeline (4 au lieu de 3, ce que semble chercher check_pgbackrest) : -rw-r----- 1 postgres dba 17K Dec 5 17:05 000000040000000E000000F9-f1c388a60c4c141858d7f4cdf2bd0e6044620868.gz -rw-r----- 1 postgres dba 17K Dec 5 17:05 000000040000000E000000FA-dd0d80ca3c8deb40c4547ecc98724be728cad48f.gz -rw-r----- 1 postgres dba 17K Dec 5 18:05 000000040000000E000000FB-8d4f865ae599249520299322d2847814395d4382.gz -rw-r----- 1 postgres dba 17K Dec 5 18:05 000000040000000E000000FC-56b4628a4bda36248596f4801e72d345858fb472.gz -rw-r----- 1 postgres dba 17K Dec 5 19:05 000000040000000E000000FD-50f98042d0dc5da375ab784ad081c60ed918b45e.gz -rw-r----- 1 postgres dba 17K Dec 5 19:05 000000040000000E000000FE-d076dfc8cd789d7c5a581653203a4e449aa2be97.gz -rw-r----- 1 postgres dba 17K Dec 5 20:05 000000040000000E000000FF-4a873d6d2d808178f09836877f24b6f84d1266ab.gz

pgstef commented 3 years ago

Tout d'abord, le --extended-check n'apporterait rien ici. Il s'agit d'étendre la vérification aux archives trouvées sur disque qui seraient < le start WAL du 1er backup et > au max WAL trouvé par la commande pgbackrest info.

Une "boucle infinie" serait possible dans la fonction generate_needed_wal_archives_list si la timeline de fin est plus grande que celle de départ et qu'aucun WAL "frontière" pour joindre ces différentes timelines n'est trouvé dans les fichiers .history. Je pourrais ajouter un nombre maximum de WAL à vérifier pour éviter ça.

Sinon avec pgBackRest >= 2.28, il est désormais possible d'utiliser les commandes internes repo-ls et repo-get qui s'avèrent plus rapide que d'utiliser Perl pour ça. En cas de très nombreuses archives, utiliser le mode --enable-internal-pgbr-cmds est en fait beaucoup plus performant. (cela sera d'ailleurs le seul mode supporté pour la prochaine version de check_pgbackrest)

Pour en revenir au WAL "frontière" de votre exemple, en cas de promotion, il devrait normalement exister sur les 2 timelines.

Voici par exemple sur un cluster demo:

DEBUG: 000000010000000000000015 WAL needed
DEBUG: 000000010000000000000016 WAL needed
DEBUG: 000000010000000000000017 WAL needed
DEBUG: found a boundary @ '000000010000000000000017' !
DEBUG: 000000020000000000000017 WAL needed
DEBUG: 000000020000000000000018 WAL needed
DEBUG: 000000020000000000000019 WAL needed
DEBUG: 00000002000000000000001A WAL needed
DEBUG: 00000002000000000000001B WAL needed
DEBUG: found a boundary @ '00000002000000000000001B' !
DEBUG: 00000003000000000000001B WAL needed

Ces WALs frontières existent généralement au moins en tant que .partial:

$ find /var/lib/pgbackrest/archive/my_stanza/13-1/ -name *0000000000000017*
/var/lib/pgbackrest/archive/my_stanza/13-1/0000000100000000/000000010000000000000017-a85e296c05bf75e62dcab3ed8128b03ae6fbb07b.gz
/var/lib/pgbackrest/archive/my_stanza/13-1/0000000200000000/000000020000000000000017-f6e6a6cb1e16ebde1054ad234621c8182f88cbbc.gz
$ find /var/lib/pgbackrest/archive/my_stanza/13-1/ -name *000000000000001B*
/var/lib/pgbackrest/archive/my_stanza/13-1/0000000200000000/00000002000000000000001B.partial-5a0e63dec3a8a7a1195b22b4aa0e78de7a06e3ba.gz
/var/lib/pgbackrest/archive/my_stanza/13-1/0000000300000000/00000003000000000000001B-d4eba3a7431438bba70d1509175c0da3407a5a7a.gz

Il se pourrait donc qu'il ne s'agisse pas d'un faux positif et que le WAL 000000030000000E000000F9 soit bien requis pour restorer depuis une sauvegarde antérieure à celui-ci. Avez-vous tenté un pgbackrest restore avec un --set antérieur à 000000030000000E000000F9 ? Il faudrait du coup chercher à comprendre pourquoi ce WAL n'existe pas, au moins en tant que .partial.

Puisqu'au moins 1 sauvegarde plus récente que ce WAL manquant existe, un "WARNING" est émis, pas un "CRITICAL". Ce qui vous permet d'ack l'alerte et d'être tout de même informé si un autre WAL venait à manquer.

pgstef commented 3 years ago

Après plus ample vérification, il y a effectivement des cas où le .partial pourrait ne pas être généré. Le recovery devrait s'en sortir avec seulement 000000040000000E000000F9 dans votre exemple.

Je relance quelques tests concernant les options ignore pour ignorer également les sauvegarde générées dans cette fenêtre avant de publier la nouvelle version.

D'ici là, vous pouvez adapter la fonction generate_needed_wal_archives_list par le code qui sera dans la prochaine version:

        if ( grep /$curr/, @branch_wals ) {
            dprint("found a boundary @ '$curr' !\n");
            $timeline++;
            $j--;
            next;

        }else{
            # dprint("$curr WAL needed\n");
            push @needed_wal_archives_list, $curr;
        }
sebastienruiz commented 3 years ago

Bonjour Malheureusement je suis en version 2.27 de pgbackrest. J'essaie de restaurer ma sauvegarde sur un autre serveur, donc j'ai copié le repository pgbackrest de mon instance ailleurs et j'ai configuré correctement tout ce qui va bien dans une instance de test. Mais lors du restore j'ai une erreur "ERROR: [075]: no backup sets to restore". Vous n'êtes pas obligé de m'aider sur cette partie mais savez-vous comment faire pour que pgbackrest reconnaisse un repository copié depuis un autre endroit, ce qui permettrait de copier une instance sur un autre serveur ?

pgstef commented 3 years ago

Bonjour,

La v2.27 a quand même plus de 6 mois. La 2.31 est sortie cette semaine. Il y a eu de nombreuses améliorations et corrections depuis la 2.27. Je ne peux que vous encourager à mettre à jour rapidement et régulièrement (un peu comme pour les versions mineures de PG d'ailleurs).

Pour savoir si votre repo est bien copié/complet, il faut lancer la commande info d'abord. Il y a énormément de liens entre les différents fichiers manifest. Il est généralement possible de dupliquer un repo en copiant l'intégralité du répertoire "stanza" (puisque tous les fichiers sont liés à la stanza). Tout est question de permissions après la copie du coup.

En tout cas votre cas de figure aura soulevé la remise en question des options ignore dans check_pgbackrest ainsi que l'amélioration de la détection des wals frontières entre timelines. Je sortirai la v2 dès que possible. D'ici là, vous pouvez modifier directement la fonction generate_needed_wal_archives_list avec le code mentionné plus haut.

Bien cordialement

sebastienruiz commented 3 years ago

merci pour toutes ces infos. Bon je n'arrive pas à me dépatouiller avec ma copie de repo/stanza. Mais comme ça ne concerne pas directement check_pgbackrest, je vais chercher ailleurs pour ne pas vous mobiliser davantage ;-) En tout cas merci et bravo pour vos développements et votre outil check_pgbackrest qui est très utile !

sebastienruiz commented 3 years ago

Bonjour Stéphane, je vous apporte des nouvelles pour peut être creuser un bug potentiel de check_pgbackrest. Souvenez-vous, le problème d'origine c'est qu'il semblait manquer une archive lors du changement de timeline (Message : wrong sequence, 1 missing file(s) (000000030000000E000000F9 / 000000030000000E000000F9)) Hors nous avons vu que l'archive portait le nom de la nouvelle timeline 000000040000000E000000F9. Et vou me disiez que ce n'était pas normal, qu'il devriait avoir des fichiers archives "partial". Je n'en ai pas. J'ai donc tenté de restaurer cette instance de prod avant puis après 000000030000000E000000F9 pour voir si il y avait une erreur d'archive manquante. J'ai restauré dans les 2 cas avec succès :

TENTATIVE PITR Dec 5 16:05

2020-12-10 15:42:40 CET [68387]: [2-1] user=,db= LOG: starting point-in-time recovery to 2020-12-05 16:05:00+01 2020-12-10 15:42:40.890 P00 INFO: archive-get command begin 2.27: [00000003.history, pg_wal/RECOVERYHISTORY] --log-level-console=info --pg1-path=/pgbd/IPHDRX1/admin --process-max=3 --repo1-path=/mnt/backup_postgresql_production/HDR/IPHDRX1 --stanza=IPHDRX1 2020-12-10 15:42:40.893 P00 INFO: unable to find 00000003.history in the archive 2020-12-10 15:42:40.893 P00 INFO: archive-get command end: completed successfully (4ms) 2020-12-10 15:42:40.900 P00 INFO: archive-get command begin 2.27: [000000030000000E000000D0, pg_wal/RECOVERYXLOG] --log-level-console=info --pg1-path=/pgbd/IPHDRX1/admin --process-max=3 --repo1-path=/mnt/backup_postgresql_production/HDR/IPHDRX1 --stanza=IPHDRX1 2020-12-10 15:42:40.962 P00 INFO: found 000000030000000E000000D0 in the archive 2020-12-10 15:42:40.962 P00 INFO: archive-get command end: completed successfully (63ms) 2020-12-10 15:42:40 CET [68387]: [3-1] user=,db= LOG: restored log file "000000030000000E000000D0" from archive 2020-12-10 15:42:40 CET [68387]: [4-1] user=,db= LOG: redo starts at E/D0000028 2020-12-10 15:42:40 CET [68387]: [5-1] user=,db= LOG: consistent recovery state reached at E/D00000F8 2020-12-10 15:42:40 CET [68385]: [7-1] user=,db= LOG: database system is ready to accept read only connections 2020-12-10 15:42:40.987 P00 INFO: archive-get command begin 2.27: [000000030000000E000000D1, pg_wal/RECOVERYXLOG] --log-level-console=info --pg1-path=/pgbd/IPHDRX1/admin --process-max=3 --repo1-path=/mnt/backup_postgresql_production/HDR/IPHDRX1 --stanza=IPHDRX1 2020-12-10 15:42:41.035 P00 INFO: found 000000030000000E000000D1 in the archive 2020-12-10 15:42:41.036 P00 INFO: archive-get command end: completed successfully (50ms) 2020-12-10 15:42:41 CET [68387]: [6-1] user=,db= LOG: restored log file "000000030000000E000000D1" from archive 2020-12-10 15:42:41.054 P00 INFO: archive-get command begin 2.27: [000000030000000E000000D2, pg_wal/RECOVERYXLOG] --log-level-console=info --pg1-path=/pgbd/IPHDRX1/admin --process-max=3 --repo1-path=/mnt/backup_postgresql_production/HDR/IPHDRX1 --stanza=IPHDRX1 2020-12-10 15:42:41.103 P00 INFO: found 000000030000000E000000D2 in the archive 2020-12-10 15:42:41.103 P00 INFO: archive-get command end: completed successfully (50ms) 2020-12-10 15:42:41 CET [68387]: [7-1] user=,db= LOG: restored log file "000000030000000E000000D2" from archive

...... 2020-12-10 15:42:44 CET [68387]: [49-1] user=,db= LOG: restored log file "00000004.history" from archive 2020-12-10 15:42:44.132 P00 INFO: archive-get command begin 2.27: [00000005.history, pg_wal/RECOVERYHISTORY] --log-level-console=info --pg1-path=/pgbd/IPHDRX1/admin --process-max=3 --repo1-path=/mnt/backup_postgresql_production/HDR/IPHDRX1 --stanza=IPHDRX1 2020-12-10 15:42:44.134 P00 INFO: unable to find 00000005.history in the archive 2020-12-10 15:42:44.134 P00 INFO: archive-get command end: completed successfully (2ms) 2020-12-10 15:42:44 CET [68387]: [50-1] user=,db= LOG: selected new timeline ID: 5 2020-12-10 15:42:44 CET [68387]: [51-1] user=,db= LOG: archive recovery complete 2020-12-10 15:42:44.162 P00 INFO: archive-get command begin 2.27: [00000003.history, pg_wal/RECOVERYHISTORY] --log-level-console=info --pg1-path=/pgbd/IPHDRX1/admin --process-max=3 --repo1-path=/mnt/backup_postgresql_production/HDR/IPHDRX1 --stanza=IPHDRX1 2020-12-10 15:42:44.164 P00 INFO: unable to find 00000003.history in the archive 2020-12-10 15:42:44.164 P00 INFO: archive-get command end: completed successfully (2ms) 2020-12-10 15:42:44 CET [68390]: [1-1] user=,db= LOG: checkpoint starting: end-of-recovery immediate wait 2020-12-10 15:42:44 CET [68390]: [2-1] user=,db= LOG: checkpoint complete: wrote 1 buffers (0.0%); 0 WAL file(s) added, 0 removed, 41 recycled; write=0.002 s, sync=0.000 s, total=0.035 s; sync files=1, longest=0.000 s, average=0.000 s; distance=671744 kB, estimate=671744 kB 2020-12-10 15:42:44 CET [68385]: [8-1] user=,db= LOG: database system is ready to accept connections

TENTATIVE PITR Dec 5 18:05 : 2020-12-10 16:06:45 CET [71058]: [1-1] user=,db= LOG: database system was interrupted; last known up at 2020-12-04 21:25:03 CET 2020-12-10 16:06:45 CET [71058]: [2-1] user=,db= LOG: starting point-in-time recovery to 2020-12-05 18:05:00+01 2020-12-10 16:06:45.361 P00 INFO: archive-get command begin 2.27: [00000003.history, pg_wal/RECOVERYHISTORY] --log-level-console=info --pg1-path=/pgbd/IPHDRX1/admin --process-max=3 --repo1-path=/mnt/backup_postgresql_production/HDR/IPHDRX1 --stanza=IPHDRX1 2020-12-10 16:06:45.365 P00 INFO: unable to find 00000003.history in the archive 2020-12-10 16:06:45.365 P00 INFO: archive-get command end: completed successfully (4ms) 2020-12-10 16:06:45.371 P00 INFO: archive-get command begin 2.27: [000000030000000E000000D0, pg_wal/RECOVERYXLOG] --log-level-console=info --pg1-path=/pgbd/IPHDRX1/admin --process-max=3 --repo1-path=/mnt/backup_postgresql_production/HDR/IPHDRX1 --stanza=IPHDRX1 2020-12-10 16:06:45.420 P00 INFO: found 000000030000000E000000D0 in the archive 2020-12-10 16:06:45.421 P00 INFO: archive-get command end: completed successfully (51ms) 2020-12-10 16:06:45 CET [71058]: [3-1] user=,db= LOG: restored log file "000000030000000E000000D0" from archive 2020-12-10 16:06:45 CET [71058]: [4-1] user=,db= LOG: redo starts at E/D0000028 2020-12-10 16:06:45 CET [71058]: [5-1] user=,db= LOG: consistent recovery state reached at E/D00000F8 2020-12-10 16:06:45 CET [71056]: [7-1] user=,db= LOG: database system is ready to accept read only connections 2020-12-10 16:06:45.444 P00 INFO: archive-get command begin 2.27: [000000030000000E000000D1, pg_wal/RECOVERYXLOG] --log-level-console=info --pg1-path=/pgbd/IPHDRX1/admin --process-max=3 --repo1-path=/mnt/backup_postgresql_production/HDR/IPHDRX1 --stanza=IPHDRX1 2020-12-10 16:06:45.493 P00 INFO: found 000000030000000E000000D1 in the archive 2020-12-10 16:06:45.494 P00 INFO: archive-get command end: completed successfully (50ms) 2020-12-10 16:06:45 CET [71058]: [6-1] user=,db= LOG: restored log file "000000030000000E000000D1" from archive ... 2020-12-10 16:06:48 CET [71058]: [50-1] user=,db= LOG: selected new timeline ID: 5 2020-12-10 16:06:48 CET [71058]: [51-1] user=,db= LOG: archive recovery complete 2020-12-10 16:06:48.265 P00 INFO: archive-get command begin 2.27: [00000003.history, pg_wal/RECOVERYHISTORY] --log-level-console=info --pg1-path=/pgbd/IPHDRX1/admin --process-max=3 --repo1-path=/mnt/backup_postgresql_production/HDR/IPHDRX1 --stanza=IPHDRX1 2020-12-10 16:06:48.268 P00 INFO: unable to find 00000003.history in the archive 2020-12-10 16:06:48.268 P00 INFO: archive-get command end: completed successfully (3ms) 2020-12-10 16:06:48 CET [71061]: [1-1] user=,db= LOG: checkpoint starting: end-of-recovery immediate wait 2020-12-10 16:06:48 CET [71061]: [2-1] user=,db= LOG: checkpoint complete: wrote 1 buffers (0.0%); 0 WAL file(s) added, 0 removed, 41 recycled; write=0.002 s, sync=0.000 s, total=0.038 s; sync files=1, longest=0.000 s, average=0.000 s; distance=671744 kB, estimate=671744 kB 2020-12-10 16:06:48 CET [71056]: [8-1] user=,db= LOG: database system is ready to accept connections

On semble être fasse à un cas particulier où check_pgbackrest ne semble pas à resoudre le changement de timeline et la succession des WAL. Je suis en version PG 11.10 (redhat7)

pgstef commented 3 years ago

Bonjour,

Merci pour ces précisions. Effectivement, comme je l'indiquais dans un commentaire précédent, il y a plusieurs cas de figure où ce wal (qu'il soit partial ou non) ne sera pas généré.

La sonde détecte bien le bon wal frontière mais jusque là vérifier les 2 côtés de la frontière (timeline basse et timeline haute). Or, comme vous venez de le confirmer, celui de la timeline basse peut être ignoré. Cela sera corrigé dans la prochaine version avec le code transmis plus haut si vous souhaitez déjà tester le correctif (check_pgbackrest n'étant qu'un simple script perl, vous pouvez le copier/éditer sans soucis).

Je clôturerai cette issue dès que la v2 sera disponible.

Bien cordialement

sebastienruiz commented 3 years ago

bien excellent ! bonne continuation, j'attendrais avec impatiente la v2 pour l'implémenter dans notre production. Merci encore pour votre disponibilité :

pgstef commented 3 years ago

Bonjour,

Pour info, je viens de pousser la modification (à la fois ignorer le wal inutile en cas de saut de timeline et ignorer la vérification de cohérence des backups en cas de wal ignoré avec les options ignore-archived-*) dans la branche de développement de la v2.

N'hésitez pas à la tester avant la release officielle ;-)

pgstef commented 3 years ago

Bonjour,

La v2.0 est enfin disponible et devrait donc résoudre cette discussion. N'hésitez pas à la rouvrir si ce n'est pas le cas. Bien cordialement,