spaghettidba / WorkloadTools

A collection of tools to collect, analyze and replay SQL Server workloads, on premises and in the cloud
MIT License
230 stars 51 forks source link

Unable to capture Baseline data #122

Closed rreggie-rmd closed 1 year ago

rreggie-rmd commented 1 year ago

While replaying in our test environment we are unable to capture the baseline data for comparison with our production load. Can you elaborate where exactly the baseline data gets written to. Although creating the schema a head of time still it is not capturing the data.

rreggie-rmd commented 1 year ago

https://spaghettidba.com/2019/02/15/benchmarking-with-workloadtools/ in the above link , I am trying to understand difference between SQL WorkLoad A and SQL WorkLoad B. As per the example config Json file I am unable to write it to the baseline schema. Any help is greatly appreciated.

spaghettidba commented 1 year ago

Can you please share your (redacted) JSON files? It would also be helpful if you could describe what you are trying to achieve. What I understand from your messages is that you want to replay a (previously captured?) workload to a test environment, to compare the workload analysis data with what you already have captured in production. Is this correct? The workload analysis data gets written wherever you find appropriate: the production server (I would not do that), the test server (a possible good choice), your laptop (also a good choice). Where are you attempting to save the workload analysis data?

rreggie-rmd commented 1 year ago

{ "Controller": {

    "Listener":
    {
        "__type": "ExtendedEventsWorkloadListener",
        "ConnectionInfo":
        {
            "ServerName": "ProdServer",
            "DatabaseName": "ProdDB",
            "UserName": "sa",
            "Password": "P4$$w0rd!"
        },
        "DatabaseFilter": "SourceDatabase"
    },

    "Consumers":
    [
        {
            "__type": "ReplayConsumer",
            "ConnectionInfo":
            {
                "ServerName": "TestServer",
                "DatabaseName": "ProdDB_Copy",
                "UserName": "sa",
                "Password": "Pa$$w0rd!"
            }
        },
        {
            "__type": "AnalysisConsumer",
            "ConnectionInfo":
            {
                "ServerName": "TestServer",
                "DatabaseName": "SQLWorkLoad",
                "SchemaName": "baseline",
                "UserName": "sa",
                "Password": "P4$$w0rd!"
            },
            "UploadIntervalSeconds": 60
        }
    ]
}

}

if you look at the above config which is SQLWorkload A , in the Listener component is my production and in Consumer Replay i have a copy of prod backup restored and running replay against it and in analysis If my understanding is correct the it is the stats of the production workload . Is my understanding correct ?

rreggie-rmd commented 1 year ago

image

Or can you give me sample config files for the scenario in Picture I pasted.

spaghettidba commented 1 year ago

If my understanding is correct the it is the stats of the production workload . Is my understanding correct ?

Yes, this is correct. If you're not seeing the data in test server, SqlWorkload database and baseline schema then something is wrong and the logs might have some relevant info.

If you want to record the performance of the replay on the test server, then you need a second sqlworkload.exe running at the same time as the capture/replay (the one for which you already posted the JSON configuration). The configuration of the second sqlworkload should look as described in this blog post: https://spaghettidba.com/2020/03/03/performing-a-real-time-replay-with-workloadtools/

Hope this makes sense!

rreggie-rmd commented 1 year ago

I am able to capture the sqlworkload 2 replay analysis via second command workload exe. But I am not able to write the prod baseline in the first command line workload exe. which is as below

"Listener": { "__type": "ExtendedEventsWorkloadListener", "ConnectionInfo": { "ServerName": "ProdServer", "DatabaseName": "ProdDB", "UserName": "sa", "Password": "P4$$w0rd!" }, "DatabaseFilter": "SourceDatabase" },

"Consumers":
[
    {
        "__type": "ReplayConsumer",
        "ConnectionInfo":
        {
            "ServerName": "TestServer",
            "DatabaseName": "ProdDB_Copy",
            "UserName": "sa",
            "Password": "Pa$$w0rd!"
        }
    },
    {
        "__type": "AnalysisConsumer",
        "ConnectionInfo":
        {
            "ServerName": "TestServer",
            "DatabaseName": "SQLWorkLoad",
            "SchemaName": "baseline",
            "UserName": "sa",
            "Password": "P4$$w0rd!"
        },
        "UploadIntervalSeconds": 60
    }
]

}

spaghettidba commented 1 year ago

What happens? Do you get an error? Do you have a log file?

rreggie-rmd commented 1 year ago

nothing happens I even see a connection but nothing gets written. I restarted the server where the workload tool is installed, it still does not work.

rreggie-rmd commented 1 year ago

"__type": "AnalysisConsumer", "ConnectionInfo": { "ServerName": "TestServer", "DatabaseName": "SQLWorkLoad", "SchemaName": "baseline", "UserName": "sa", "Password": "P4$$w0rd!" }, "UploadIntervalSeconds": 60

This part does not work.

spaghettidba commented 1 year ago

So sqlworkload.exe outputs absolutely nothing on stdout? That's weird. The only thing I can think of is that it can't cope with the load. Is the process eating up a lot of RAM/CPU?

rreggie-rmd commented 1 year ago

nope . Memory utilization is only 30 GB tops and CPU is about 30 - 50 % utilization. Do you want me to try running the Anylze component separately for prod load as a seperate cmdline?

spaghettidba commented 1 year ago

nope . Memory utilization is only 30 GB tops and CPU is about 30 - 50 % utilization.

sqlworkload.exe uses 30 GB of RAM? That's a lot of memory! How many batches per second?

Do you want me to try running the Anylze component separately for prod load as a seperate cmdline?

This may help. I would rather focus on why you get so much load and maybe add some filters to the capture, in order to reduce it.

rreggie-rmd commented 1 year ago

the load is pretty much high on prod. The whole idea we are doing this is we want to analyze the impact of moving from Nvme local storage to SAN

spaghettidba commented 1 year ago

OK, makes sense. However capturing everything will likely choke sqlworkload to death. I would start with a subset of the workload (filter by host or application or whatever makes sense) and work from there. I'm pretty sure you will get the numbers you're looking for even with a filtered workload

rreggie-rmd commented 1 year ago

still the same bud .. no out put even tried directly just to capture the baseline load it does not work ..

spaghettidba commented 1 year ago

Have you tried to filter the workload? Or did you try to capture the whole thing?

rreggie-rmd commented 1 year ago

i tried both , on a specific DB and on whole ... still the same ..

rreggie-rmd commented 1 year ago

does it require clr enabled?

spaghettidba commented 1 year ago

does it require clr enabled?

Nope

spaghettidba commented 1 year ago

i tried both , on a specific DB and on whole ... still the same ..

OK, I would try to limit the workload as much as possible to understand what is going on. Try to limit to one single client host, like your computer. It is not going to be useful, but it will at least confirm it is something related to the size of the workload.

ddubick commented 1 year ago

Hi @spaghettidba, I'm helping @rreggie-rmd with this problem.

It seems that the issue we're seeing is that whenever using the Analysis consumer some of our queries are choking on a regex that was recently added to SqlTextNormalizer.cs _paramNameValue. I discovered this by adding logging at point throughout the code - my C# isn't the best ;).

It appears that when we run sqlworkload that none or very few records are added to the workload database. When I take that regex definition and the replace out and recompile the Analysis consumers process fine.

Here is an example of the SQL code that it was hanging on, possibly the large number of parameters is confusing the regex: declare @p2 int set @p2=2 declare @p13 int set @p13=1234567 exec Invoice_Insert @ClientNo=123456,@SeqNo=@p2 output,@Desc=N'2022 XX',@Status=N'D',@InvoiceNo=0,@Disburse=12.34,@GST=1.32,@PST=3.00,@PSTProv=N'SK',@UpdateBy=1234,@HST=0,@IsReversal=0,@InvoiceKey=@p13 output,@DatePrinted=N'',@Interim=0,@Locked=0,@FromProfile=0,@Retainer=0,@Address=N'',@HSTBalance=0,@GSTBalance=0,@ReversedInvoiceKey=0,@Language=N'en',@TotalForMiscDisb=250.00,@ParentInvoiceKey=default,@eInvoice=0,@ClientProv=N'SK',@TaxOverrideReason=N'',@TaxOverrideFlag=0,@SensitiveFlag=0,@AcknowlegdeRealization=default,@AcknowlegdeRealizationReason=default,@AcknowlegdeRealizationDateTime=default,@AcknowlegdeRealizationCompletedBy=default select @p2, @p13

I still see a few discrepancies between a live capture (XE listener to workload file and analysis consumer) and another (workload file listener to analysis consumer) but one thing at a time...

spaghettidba commented 1 year ago

Hey @ddubick, thank you very much for pointing this out! I'm working on a fix right now and will keep you posted. Thanks a lot!

spaghettidba commented 1 year ago

OK, could you please test this release and see if it works for you? https://github.com/spaghettidba/WorkloadTools/releases/tag/v1.6.2

Please keep me posted! :)

ddubick commented 1 year ago

result.NormalizedText = _paramNameValueStr.Replace(result.NormalizedText, "@${paramname} = {NUM}"); Was that supposed to be using the _paramNameValueNum that you added?

spaghettidba commented 1 year ago

Whoops... Thanks for spotting it. Release updated

rreggie-rmd commented 1 year ago

great collaboration guys thank you .. !!!! @ddubick @spaghettidba

spaghettidba commented 1 year ago

Hey! Any updates? Is this working for you now? If so, I could realese the fix to the public

ddubick commented 1 year ago

Yes sorry, seems to be working fine. Thank you!