olahallengren / sql-server-maintenance-solution

SQL Server Maintenance Solution
https://ola.hallengren.com
MIT License
2.88k stars 740 forks source link

Missing entries in "msdb.dbo.backupset" for user databases in an AG #642

Open denniskrispliebherr opened 2 years ago

denniskrispliebherr commented 2 years ago

Hey Ola,

hope you are doing fine :)

So my problem is that our monitoring system "Checkmk" is not able to determine the date & time of user databases that are in an AG because there are no corresponding entries in the "msdb.dbo.backupset" table after the Full- and Log backup jobs ran.

Backup preference for the AG is set to "Prefer Secondary".

When I run the selects that are being executed via the monitoring system I get the following results: PRIMARY NODE: last_backup_date type machine_name is_primary_replica is_local replica_id 2022-04-13 12:07:57 D COBSVSQLAAG01 True 1
2022-04-24 07:45:01 L COBSVSQLAAG01 True 1

last_backup_date type machine_name is_primary_replica is_local replica_id 2022-04-27 04:45:00 D COBSVSQLAAG01 0 NULL
2022-04-27 04:45:01 D COBSVSQLAAG01 0 NULL
2022-04-27 04:45:02 D COBSVSQLAAG01 0 NULL
2022-04-13 12:07:57 D COBSVSQLAAG01 1 1 7BA734C9-BA78-4318-8B62-7FA215725BB6 2022-04-24 07:45:01 L COBSVSQLAAG01 1 1 7BA734C9-BA78-4318-8B62-7FA215725BB6

SECONDARY NODE: last_backup_date type machine_name is_primary_replica is_local replica_id 2022-04-27 05:00:01 D COBSVSQLAAG02 True 1
2022-04-27 11:00:01 L COBSVSQLAAG02 True 1

last_backup_date type machine_name is_primary_replica is_local replica_id 2022-04-27 04:45:01 D COBSVSQLAAG02 0 NULL
2022-04-27 04:45:02 D COBSVSQLAAG02 0 NULL
2022-04-27 04:45:02 D COBSVSQLAAG02 0 NULL

The entries for the time 04:45 a.m. are for the full backups of the system databases. FULL backups of the user database are created every day at 05:00 a.m. LOG backups of the user database are created every 15 minutes.

The first two entries for the first select from the PRIMARY NODE are probably from a manual failover where it became the secondary node, thus creating the backups from this one because of the "Prefer Secondary" value. Same goes for the last two entries in the secondary select for the PRIMARY NODE.

The first two entries for the first select from the SECONDARY NODE show Full and Log backups correctly, but they are completely missing from the secondary select and that is what I am assuming my main problem.

As for comparison, this is what it looks like on a SQL instance with no AG: last_backup_date type machine_name is_primary_replica is_local replica_id 2022-04-27 09:00:13 I COBSVBDESQL01 True 1
2022-04-27 11:15:00 L COBSVBDESQL01 True 1
2022-04-27 04:45:02 D COBSVBDESQL01 True 1

last_backup_date type machine_name is_primary_replica is_local replica_id 2022-04-27 03:21:44 D COBSVBDESQL01 0 NULL
2022-04-27 04:45:00 D COBSVBDESQL01 0 NULL
2022-04-27 04:45:01 D COBSVBDESQL01 0 NULL
2022-04-27 04:45:02 D COBSVBDESQL01 0 NULL
2022-04-27 09:00:13 I COBSVBDESQL01 0 NULL
2022-04-27 11:15:00 L COBSVBDESQL01 0 NULL

The select statements are as follows: --HADRStatus IS NULL OR HADRStatus <> 1 SELECT CONVERT(VARCHAR, DATEADD(s, DATEDIFF(s, '19700101', MAX(backup_finish_date)), '19700101'), 120) AS last_backup_date, type, machine_name, 'True' as is_primary_replica, '1' as is_local, '' as replica_id FROM msdb.dbo.backupset WHERE machine_name = SERVERPROPERTY('Machinename') GROUP BY type, machine_name

--HADRStatus = 1 SELECT CONVERT(VARCHAR, DATEADD(s, DATEDIFF(s, '19700101', MAX(b.backup_finish_date)), '19700101'), 120) AS last_backup_date,
b.type, b.machine_name, isnull(rep.is_primary_replica,0) as is_primary_replica, rep.is_local, isnull(convert(varchar(40), rep.replica_id), '') AS replica_id FROM msdb.dbo.backupset b
LEFT OUTER JOIN sys.databases db ON b.database_name = db.name
LEFT OUTER JOIN sys.dm_hadr_database_replica_states rep ON db.database_id = rep.database_id
WHERE (rep.is_local is null or rep.is_local = 1) AND (rep.is_primary_replica is null or rep.is_primary_replica = 'True') and machine_name = SERVERPROPERTY('Machinename') GROUP BY type, rep.replica_id, rep.is_primary_replica, rep.is_local, b.database_name, b.machine_name, rep.synchronization_state, rep.synchronization_health

SQL Server version and edition Microsoft SQL Server 2019 (RTM-CU15) (KB5008996) - 15.0.4198.2 (X64) - Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2022 Standard 10.0 (Build 20348: ) (Hypervisor) => same issue on Windows Server 2019 Standard with SQL Server 2019 Enterprise Edition

Version of the script 2022-01-02 13:58:13

What command are you executing? EXECUTE [dbo].[DatabaseBackup] @Databases = 'USER_DATABASES', @Directory = NULL, @BackupType = 'FULL', @Verify = 'Y', @CleanupTime = 47, @CheckSum = 'Y', @LogToTable = 'Y', @CopyOnly='Y'

...and...

EXECUTE [dbo].[DatabaseBackup] @Databases = 'USER_DATABASES', @Directory = NULL, @BackupType = 'LOG', @Verify = 'Y', @CleanupTime = 47, @CheckSum = 'Y', @LogToTable = 'Y', @ChangeBackupType = 'Y'

What output are you getting? See statements above. Side note: Full- and Log-Backups are created all created successfully on the file system, but as I described before, I am missing the entries in the "msdb.dbo.backupset" table. Log entries for backup jobs where the databases are NOT running in an AAG are all working perfectly fine.

Unfortunately, altering the selects of the script executed from the monitoring system won't be that easy because it is implemented in their standard rollout. Thanks in advance for your help :)

Greetings from Germany Dennis

denniskrispliebherr commented 2 years ago

Hi Ola,

still having the issue. Any help or advice would help tremendously.

Thanks again in advance Dennis

pbarryuk commented 2 years ago

I would suggest you capture the backup_restore_progress_trace Extended Event (see https://techcommunity.microsoft.com/t5/sql-server-blog/new-extended-event-to-track-backup-and-restore-progress/ba-p/384447 for more details) as it captures the part where the back history is attempted to be written and potential reasons why that step failed.

denniskrispliebherr commented 2 years ago

pbarryuk

Hi pbarryuk,

Thanks for your reply. Unfortunately, as I wrote in my original post, altering the selects of the script executed from the monitoring system won't be that easy because it is implemented in their standard rollout and we are using this one for hundreds of systems.

Regards Dennis

pbarryuk commented 2 years ago

Sorry if it wasn’t clear , I wasn’t suggesting changing the selects of the monitoring system. I was suggesting that you capture the xEvent for backups as there are entries of that xEvent that are fired when the backup tries to write the backup history to msdb and if it was successful or not or, if it was not successful, then it can give you an idea of why (e.g. could not obtain an update lock). As you mentioned there were no records in msdb this is to get to the root cause of the issue - it is outside of the Ola H scripts as backup history is a SQL Server activity.

Otherwise, if you have a support contract with Microsoft then open a case with them for them to help in the investigation.

denniskrispliebherr commented 2 years ago

Hi pbarryuk,

maybe I was writing your first reply not clearly enough. Yesterday was my first day back from the holidays and there was so much work on the table :D I'll try your suggestions now. Thanks again.

Regards Dennis