olahallengren / sql-server-maintenance-solution

SQL Server Maintenance Solution
https://ola.hallengren.com
MIT License
2.91k stars 756 forks source link

IndexOptimize job randomly reporting failure even when the log file says the Outcome succeeded #807

Open gkshitij opened 5 months ago

gkshitij commented 5 months ago

Description of the issue IndexOptimize Job randomly reports failures in SQL Agent history even when it runs all the Index maintenance commands successfully. I checked the logs.txt files for a successful run and failed run and all the steps look similar beside the different in statistics. For example, last successful run completed in around 3:46 hours and the recent 'failed' job ran for 3:40 hours. Upon checking the logs in command log table and the .TXT file, the commands seems to be running successfully until the end. One job was reported failed whereas the other was reported successful

SQL Server version and edition Execute SELECT @@VERSION SQL Server Standard 2018 14.0.2052.1

Version of the script Check the header of the stored procedure Version: 2019-12-31 15:48:33

What command are you executing? EXECUTE [dbo].[IndexOptimize] @Databases = 'USER_DATABASES', @FragmentationLow = NULL, @FragmentationMedium = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE', @FragmentationHigh = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE', @FragmentationLevel1 = 5, @FragmentationLevel2 = 30, @UpdateStatistics = 'ALL', @OnlyModifiedStatistics = 'Y', @LogToTable = 'Y', @Indexes = 'ALL_INDEXES, -%.%.%.IX_PerfSample, and bunch other specific indexes'

What output are you getting?

This is the end of .txt log file

"ate and time: 2024-05-27 03:58:27 [SQLSTATE 01000] Command: UPDATE STATISTICS [redacted].[VeeamOD].[Staging] [PK_OD_Staging] [SQLSTATE 01000] Comment: ObjectType: Table, IndexType: Index, IndexType: Clustered, Incremental: N, RowCount: 235, ModificationCounter: 658 [SQLSTATE 01000] Outcome: Succeeded [SQLSTATE 01000] Duration: 00:00:00 [SQLSTATE 01000] Date and time: 2024-05-27 03:58:27 [SQLSTATE 01000] [SQLSTATE 01000] Date and time: 2024-05-27 03:58:27 [SQLSTATE 01000] [SQLSTATE 01000]"

This is the end of Message in SQL Agent Job history, which I believe is truncated

" Date and time: 2024-05-27 00:18:25 [SQLSTATE 01000] (Message 50000) [SQLSTATE 01000] (Message 50000) Date and time: 2024-05-27 00:18:25 [SQLSTATE 01000] (Message 50000) Command: UPDATE STATISTICS [redacted].[dbo].[DatabaseStatus] [_WA_Sys_00000002_5AEE82B9] [SQLSTATE 01000] (Message 50000) Comment: ObjectType: Table, IndexType: Column, Incremental: N, RowCount: 352, ModificationCounter: 33 [SQLSTATE 01000] (Message 50000) Outcome: Succeeded [SQLSTATE 01000] (Message 50000) Duration: 00:00:00 [SQLSTATE 01000] (Message 50000) Date and time: 2024-05-27 00:18:25 [SQLSTATE 01000] (Message 50000) [SQLSTATE 01000] (Message 50000) Date and time: 2024-05-27 00:18:25 [SQLS... The step failed."

ampetelin commented 4 months ago

Dear @gkshitij

Have you found a solution? I ran into the same problem, but with a backup script. Everything is successful in the logs, but in the history of the SQL agent [SQLSTATE 01000... The step failed.

Stiffboard commented 4 months ago

Search for msg in the output file. Or "come: F" as in Outcome: Failed and you will find a deadlock or something similar.

drstonephd commented 4 months ago

The step did not fail so perhaps it's not the step. Did you check the agent error log? Perhaps the agent is failing to do a final activity like a notification. I've seen similar in the past when the agent could not log to the file, but you are looking at the file so.... Is there a message at the end of the log file that's missing when it does not fail? Is there another process running that can lock your log files? Are the recommended virus scan exclusions for SQL exclusions in place?

There is a subtle difference in "non-error" severity levels in jobs. Severities 0-10 are informational in the script. However, severities 2-9 cause the job step to fail. I don't think the step is failing, so this would not be the cause. Besides, I doubt there are messages with levels 2-9. It would not be random unless the message was random.

https://chadbaldwin.net/2021/01/15/raiserror-cheatsheet.html

ampetelin commented 4 months ago

Search for msg in the output file. Or "come: F" as in Outcome: Failed and you will find a deadlock or something similar.

Hmm, you were right. I thought that if an error occurs on some database, then the script is not executed on the following databases and stops.

I did find an error message on the RESTORE VERIFYONLY FROM DISK command: Command: RESTORE VERIFYONLY FROM DISK = N'\my_path\mydatabase.bak' WITH CHECKSUM [SQLSTATE 01000] Msg3203, Sev 16, State 1, Line 1 : Read on "\my_path\mydatabase.bak" failed: 13(The data is invalid.) [SQLSTATE 42000]

I will deal with it, thanks for the help

Stiffboard commented 4 months ago

I did find an error message on the RESTORE VERIFYONLY FROM DISK command: Command: RESTORE VERIFYONLY FROM DISK = N'\my_path\mydatabase.bak' WITH CHECKSUM [SQLSTATE 01000] Msg3203, Sev 16, State 1, Line 1 : Read on "\my_path\mydatabase.bak" failed: 13(The data is invalid.) [SQLSTATE 4200

Well, this is not from the index job, this is output file from the backup job.

ampetelin commented 4 months ago

I did find an error message on the RESTORE VERIFYONLY FROM DISK command: Command: RESTORE VERIFYONLY FROM DISK = N'\my_path\mydatabase.bak' WITH CHECKSUM [SQLSTATE 01000] Msg3203, Sev 16, State 1, Line 1 : Read on "\my_path\mydatabase.bak" failed: 13(The data is invalid.) [SQLSTATE 4200

Well, this is not from the index job, this is output file from the backup job.

Yep, I initially said that I encountered a similar problem, only with a backup script