marcingminski / sqlwatch

SQL Server Performance Monitor
https://docs.sqlwatch.io
Other
432 stars 170 forks source link

SQLWATCH_health Extended Event cause CPU related performance issues #403

Open mmbednarz opened 3 years ago

mmbednarz commented 3 years ago

The problem has been noticed for sql instances hosting system databases where many queries use scalar UDF functions. After installing SQLWATCH, the CPU load increased approximately 10 times. After excluding sql_statement_completed from SQL_WATCH_health extended events, the problem was resolved. The easiest way to replicate the problem:

  1. SQLWATCH installation.
  2. Build extensive (lots of SQL to compile) scalar UDF.
  3. Running multiple function calls.

Results for:

declare @result varchar(1000);
select top 1000 @result = dbo.UDFFunc('997#texten$$pl$$textpl','pl')
from sys.objects a  CROSS JOIN sys.objects b
GO

XE SQLWATCH_health disabled

SQL Server parse and compile time:
   CPU time = 10 ms, elapsed time = 10 ms.

 SQL Server Execution Times:
   CPU time = 78 ms, elapsed time = 86 ms.

Completion time: 2021-07-21T14: 25: 54.9491058 + 02: 00

XE SQLWATCH_health enabled:

SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 1000 ms, elapsed time = 1012 ms.

Completion time: 2021-07-21T14: 27: 21.2272292 + 02: 00

Function definition:

/****** Object:  UserDefinedFunction [dbo].[UDFFunc]    Script Date: 21.07.2021 14:27:59 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[UDFFunc] (
    @Value VARCHAR(1000)
    ,@Language VARCHAR(10) = ''
    )
RETURNS VARCHAR(1000)
AS
BEGIN
    SET @language = '$$' + lower(left(@Language, 2)) + '$$'

    DECLARE @indexstart INT
    DECLARE @indexend INT
    DECLARE @output VARCHAR(1000)
    DECLARE @out VARCHAR(1000)

    SET @output = ''
    SET @indexstart = 1

    WHILE (len(@value) > 0)
    BEGIN
        SET @indexstart = patindex('%#%', @Value)
        SET @indexend = patindex('%;%', @Value)

        IF @indexend <= 0
            SET @indexend = 1000

        IF @indexstart <= 0
            SET @indexstart = @indexend;

        IF @indexstart >= @indexend
        BEGIN
            IF @indexend > 1
                SET @out = substring(@value, 1, @indexend - 1)
        END
        ELSE
            SET @out = substring(@value, @indexstart + 1, @indexend - @indexstart - 1)

        SET @value = substring(@value, @indexend + 1, 1000)
        SET @indexstart = patindex('%' + @Language + '%', @out)

        IF @indexstart > 0
        BEGIN
            SET @indexend = patindex('%$$%', SUBSTRING(@out, @indexstart + len(@language), 1000))

            IF @indexend <= 0
                SET @indexend = 1000
            SET @out = SUBSTRING(@out, @indexstart + len(@language), @indexend - 1)
        END
        ELSE
        BEGIN
            SET @indexend = patindex('%$$%', @out)

            IF @indexend > 0
                SET @out = SUBSTRING(@out, 0, @indexend)
        END

        SET @output = @output + @out + ';'
    END

    IF @output <> ''
        RETURN substring(@output, 0, len(@output))

    RETURN @value
END
GO

SQL Server and OS version:

Microsoft SQL Server 2019 (RTM-CU8) (KB4577194) - 15.0.4073.23 (X64)   Sep 23 2020 16:03:08   Copyright (C) 2019 Microsoft Corporation  Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2019 Standard 10.0 <X64> (Build 17763: ) (Hypervisor) 

SQLWATCH Version: https://github.com/marcingminski/sqlwatch/releases/download/4.2/SQLWATCH.4.2.0.37410.20210518204739.zip

marcingminski commented 3 years ago

The session is only meant to monitor SQLWATCH objects that run for longer than 1 second. I am surprised that the session even evaluates your UDF at all based on the criteria. Does the session log anything? Is the CPU purely caused by evaluation criteria? You can safely disable this session.

mmbednarz commented 3 years ago

The session is logging no data, all SQL WATCH jobs are disabled for the test. In my opinion, the CPU load is caused by evaluating the conditions.

marcingminski commented 3 years ago

In that case, how does the SQLWATCH_long_queries] behave? I would expect similar evaluation and similar cpu increase?

mmbednarz commented 3 years ago

There is no problem with that one. The main difference is that health one uses sqlserver.sql_text and the long_queries not.

Edit: The query_problems use sqlserver.sql_text but it is enabled and there is no perf issue.. Edit2: But not for statement_completed so maybe the problem is evaluating sqlserver.sql_text on statement_completed.

marcingminski commented 3 years ago

Yeah, I wonder if there is a bug in SQL 2019. I will do some more research. Thank you for raising this.

mmbednarz commented 3 years ago

I tested on SQL Server version: Microsoft SQL Server 2016 (SP2-CU17) (KB5001092) - 13.0.5888.11 (X64) Mar 19 2021 19:41:38 Copyright (c) Microsoft Corporation Standard Edition (64-bit) on Windows Server 2012 R2 Datacenter 6.3 <X64> (Build 9600: ) (Hypervisor) The problem also occurs.

marcingminski commented 3 years ago

Thank you for letting me know.

WinterIssue commented 3 years ago

Have you thought about filtering on database name as well.

see attached. SQLWatch Health2 XES.txt

mmbednarz commented 3 years ago

I checked and the above XE causes only a twofold extension of the UDF function execution time. Better but still unacceptable with my workload.

WinterIssue commented 3 years ago

Might have messed up with the brackets. This might work better.

SQLWatch Health2 XES - v2.txt

WinterIssue commented 3 years ago

You can give that last version a try but I am not seeing any improvement over the previous version. I guess twice as slow is better than ten times slower for the original XE. I am also seeing it twice as slow when the SQLWATCH_long_queries XE session is enabled as well. Hopefully, someone can advance this further.

marcingminski commented 3 years ago

Thank you both for your input. I will seek improvements. You can disable both sessions as they are not required to run, you will just miss the long queries graphs but likely not a big deal since your queries seem to be in milliseconds rather than seconds.

marcingminski commented 3 years ago

Hi @mmbednarz, can you show me baseline time stats with all SQLWATCH XE sessions disabled please?

mmbednarz commented 3 years ago

Yep,

All XE disabled (for 1000 function execs):

SQL Server parse and compile time: 
   CPU time = 11 ms, elapsed time = 11 ms.

 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 38 ms.

Completion time: 2021-07-28T11:51:28.8599997+02:00
OliverUweHahn commented 2 years ago

Hi @marcingminski , you should set the numeric filters at the top. This will reduce the impact of the Extended Events session for workloads with many small CPU intensive queries.

If you change the definition like this, the impact nearly disappears:

CREATE EVENT SESSION [SQLWATCH_health] ON SERVER ADD EVENT sqlserver.error_reported( ACTION(sqlserver.sql_text) WHERE [sqlserver].like_i_sql_unicode_string AND (NOT [sqlserver].like_i_sql_unicode_string)),

ADD EVENT sqlserver.sp_statement_completed(SET collect_statement=(1) ACTION(sqlserver.sql_text) WHERE ( --log any sqlwatch procedure running for longer than 1 second (there will be genuine exceptions) [package0].greater_than_int64 AND [sqlserver].like_i_sql_unicode_string AND (NOT [sqlserver].like_i_sql_unicode_string)) OR (

    -- log performnace logger if it runs for over 500ms - it should take approx 100ms on average.
   [package0].[greater_than_int64]([duration],(500000)) 
   AND    [sqlserver].[like_i_sql_unicode_string]([sqlserver].[sql_text],N'%usp_sqlwatch_logger_performance%') 
))

ADD TARGET package0.event_file(SET filename=N'SQLWATCH_internal_performance',max_file_size=(50),max_rollover_files=(1)) WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF); GO

HTH,

Best regards, Oliver

OliverUweHahn commented 2 years ago

Without Extended Events Session: CPU time = 31 ms, elapsed time = 47 ms.

With default Extended Events Session: CPU time = 735 ms, elapsed time = 754 ms.

With optimized Extended Events Session: CPU time = 63 ms, elapsed time = 61 ms.

marcingminski commented 2 years ago

Thanks, @OliverUweHahn I thought that was already the case. I will take a look at it. You are more than welcome to submit PR to change it.

OliverUweHahn commented 2 years ago

OK. I will learn how to do that. :-)