trimble-oss / dba-dash

DBA Dash - SQL Server Monitoring Tool
MIT License
254 stars 58 forks source link

IOStats_Upd issue #973

Closed DavidWiseman closed 4 days ago

DavidWiseman commented 3 weeks ago

This issue was reported on SQL slack

Version 3.2.0.0

Microsoft.Data.SqlClient.SqlException (0x80131904): The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, SqlCommand command, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteNonQuery()
   at DBADash.DBImporter.Update(String tableName) in D:\a\dba-dash\dba-dash\DBADash\DBImporter.cs:line 513
   at DBADash.DBImporter.<>c__DisplayClass17_0.<TryUpdate>b__0(Context context) in D:\a\dba-dash\dba-dash\DBADash\DBImporter.cs:line 463
   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)
   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider)
   at Polly.Retry.RetryPolicy.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken)
   at Polly.Policy.Implementation(Action`2 action, Context context, CancellationToken cancellationToken)
   at Polly.Policy.Execute(Action`2 action, Context context, CancellationToken cancellationToken)
   at DBADash.DBImporter.TryUpdate(String tableName, List`1& exceptions) in D:\a\dba-dash\dba-dash\DBADash\DBImporter.cs:line 462
ClientConnectionId:2aa55f3d-e85d-49f9-aa20-53e06e5861a4
Error Number:8672,State:1,Class:16

source is IOStats  Error Context :  Import

image Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64) Oct 8 2022 05:58:25 Copyright (C) 2022 Microsoft Corporation Standard Edition (64-bit) on Windows Server 2016 Standard 10.0 (Build 14393: ) (Hypervisor)

Note: IOStats_Upd runs every 1min by default so the issue doesn't occur each time the SP is called. When the issue occurs it fails the retry operations. This is an older version of DBA Dash and a RTM version of SQL 2022. There doesn't seem to be any fixes in DBA Dash or SQL 2022 that might address this issue.

There is a primary key on the temp table used by the merge on: InstanceID,DatabaseID,Drive,FileID,SnapshotDate The join condition is: ON S.InstanceID= T.InstanceID AND S.DatabaseID = T.DatabaseID AND S.Drive = T.Drive AND S.FileID = T.FileID AND S.SnapshotDate = T.SnapshotDate

The SnapshotDate is changed changed to the nearest 60min with: CROSS APPLY [dbo].[DateGroupingMins](SnapshotDate,60) DG

This would be the most obvious place an issue could occur. It's just converting the date to the nearest hour which could result in some duplicates - but it's expected that SnapshotDate is the same value in all rows.

MERGE is known to have some bugs so it's possible the issue is related to a MERGE bug. Although the issue is occurring frequently for the user, the circumstances that trigger it must be quite rare as this issue hasn't previously been observed.

Replacing the MERGE statement with a separate INSERT/UPDATE might not be a bad idea.

DavidWiseman commented 3 weeks ago

Evaluation of new IOStats_Upd:

Validation check:

SELECT *
FROM [DBADashDB].dbo.DBIOStats_60MIN
WHERE SnapshotDate >= '20240819 10:00'
EXCEPT
SELECT *
FROM [DBADashDB_Clone].dbo.DBIOStats_60MIN
WHERE SnapshotDate >= '20240819 10:00';

SELECT *
FROM [DBADashDB_Clone].dbo.DBIOStats_60MIN
WHERE SnapshotDate >= '20240819 10:00'
EXCEPT
SELECT *
FROM [DBADashDB].dbo.DBIOStats_60MIN
WHERE SnapshotDate >= '20240819 10:00';

Performance check: image

The new version appears to be more efficient. Lower duration, CPU & logical reads.

DavidWiseman commented 4 days ago

Included in 3.9 🚀