spaghettidba / WorkloadTools

A collection of tools to collect, analyze and replay SQL Server workloads, on premises and in the cloud
MIT License
230 stars 51 forks source link

ReplayConsumer not updating database #148

Open SimonBillinghurst opened 6 months ago

SimonBillinghurst commented 6 months ago

I have been reading your blog post 'Workload replay with workload tools', specifically the part 'Recording multiple benchmarks for the same workload'

The issue I am having is the Replay Consumer is not updating the target database, some updates seem to be happening but 99% of them are not, I cannot work out what I am doing wrong.

The process I undertook was:

  1. Taken the application offline which communicates with the database.
  2. Backup database of production database on production SQL server
  3. Used a ExtendedEventWorkloadListener and recorded events in Workload FileWriterConsumer, which worked and when I query the sqlite file I can see approx 55000 events returned for this required database.
  4. Bring application online
  5. ExtendedEventWorkListeneer ran for 8 hours in our core hours while application was used by clients.
  6. Stopped EventWorkListener
  7. Restored backup of the database to a test database (named the same as prod database) and test SQL server
  8. Using SQLWorkload I replayed the events from the sqlite file
{
    "Controller": {
        "Listener":
        {
            "__type": "FileWorkloadListener",
            "Source": "E:\\Workload Tools Files\\SqlWorkload.sqlite",
            // in this case you want to simulate the original query rate
            "SynchronizationMode": "true" 
        },
        "Consumers":
        [
            {
                "__type": "ReplayConsumer",
                "ConnectionInfo": 
                {
                    "ServerName": "TesServerName",
                    "DatabaseName": "DatabaseName",
            "UserName":"SQLServerLoginName",
            "Password":"SQLServerLoginPassword"
                },
            "ConsumeResults": "true",
            "QueryTimeoutSeconds": 60,
        "DisplayWorkerStats": "true"
            }
        ]
    }
}

The log file of the output of replay consumer

2024-01-08 14:44:05.9879 - Info - SqlWorkload.Program : Reading configuration from 'E:\Workload Tools Files\workload.json'
2024-01-08 14:44:06.0973 - Info - WorkloadTools.Listener.File.FileWorkloadListener : The source file contains 101432 events
2024-01-08 14:44:06.1129 - Info - WorkloadTools.Listener.File.FileWorkloadListener : Reading the full data for every event that matches filters. This may take awhile on large trace files please be patient
2024-01-08 14:44:06.1129 - Info - WorkloadTools.WorkloadController : Listener of type "FileWorkloadListener" initialized correctly
2024-01-08 14:44:06.1129 - Info - WorkloadTools.WorkloadController : Event collection starts at 8/01/2024 2:44:06 PM
2024-01-08 14:44:06.1129 - Info - WorkloadTools.WorkloadController : Waiting for events
2024-01-08 14:44:06.1442 - Info - WorkloadTools.Consumer.Replay.ReplayConsumer : All future delays will be calculated from this point + 1s, triggered by event {"Text":"use [PROD DATABASE NAME]\n", "SPID":346, "ApplicationName":".Net SqlClient Data Provider", "DatabaseName":"[PROD DATABASE NAME]", "LoginName":"[PROD SQL Login Name]", "HostName":"[PROD APPS SERVER NAME]", "Reads":0, "Writes":0, "CPU":0, "Duration":123, "EventSequence":1, "ReplayOffset":0, "StartTime":"2023-12-17T10:49:49.6990329Z", "Type":"RPCStarting"}
2024-01-08 14:44:06.1910 - Info - WorkloadTools.Consumer.Replay.ReplayConsumer : 101 (0.10%) events replayed - 57 events buffered
2024-01-08 14:44:06.2223 - Info - WorkloadTools.Consumer.Replay.ReplayConsumer : 202 (0.20%) events replayed - 33 events buffered
.
.
.
2024-01-08 14:44:25.6132 - Info - WorkloadTools.Consumer.Replay.ReplayConsumer : 101000 (99.57%) events replayed - 101 events buffered
2024-01-08 14:45:25.6125 - Info - SqlWorkload.Program : Controller stopped.
2024-01-08 14:45:25.6125 - Info - WorkloadTools.Consumer.Replay.ReplayConsumer : Disposing ReplayConsumer
2024-01-08 14:45:25.6125 - Info - WorkloadTools.Consumer.Replay.ReplayWorker(720_TMS) : Disposing ReplayWorker
2024-01-08 14:45:27.7790 - Info - WorkloadTools.Consumer.Replay.ReplayWorker(416_TMS) : Disposing ReplayWorker
.
.
.
2024-01-08 16:17:58.6225 - Info - WorkloadTools.Consumer.Replay.ReplayWorker(260_TMS) : Disposing ReplayWorker
2024-01-08 16:18:06.0259 - Info - WorkloadTools.Consumer.Replay.ReplayWorker(437_TMS) : Disposing ReplayWorker
2024-01-08 16:18:07.0896 - Info - SqlWorkload.Program : Controller disposed.

No errors were displayed

I want to test performance changes when changing database compatibility mode from 2008 to 2016, so this is only part of the process I wanted to undertake, I wanted to then use the analysis consumer for this, and then repeat the process for 2016 and then compare.

Any ideas why the replayconsumer isn't working for me?

P.S. Noticed an error in the documentation for the ReplayConsumer, think this should be an int not bool [bool QueryTimeoutSeconds = 30]

Appreciate any help

spaghettidba commented 5 months ago

Hi Simon, Your process looks ok, so I'm not really sure what could be the problem there. Does the sqlite database contain the commands you expect to run? Maybe they were not captured. Just a guess. Can you please check?

SimonBillinghurst commented 5 months ago

Yes it does, the 55000 events are queries associated with the required databases, think there were double that when I included other databases.

When I run this query on the sqlite database I get 10888 records, so it has recorded the details. SELECT COUNT(*) FROM EVents WHERE sql_text like '%INSERT%' AND database_name = 'TMS'

Question, this is recorded about the ReplayConsumer

WorkloadTools.Consumer.Replay.ReplayConsumer : All future delays will be calculated from this point + 1s, triggered by event {"Text":"use [PROD DATABASE NAME]\n", "SPID":346, "ApplicationName":".Net SqlClient Data Provider", "DatabaseName":"[PROD DATABASE NAME]", "LoginName":"[PROD SQL Login Name]", "HostName":"[PROD APPS SERVER NAME]", "Reads":0, "Writes":0, "CPU":0, "Duration":123, "EventSequence":1, "ReplayOffset":0, "StartTime":"2023-12-17T10:49:49.6990329Z", "Type":"RPCStarting"}

What does this mean, as it is listing production details, which do not exist in the Target, e.g. the production user does not exist on the target server.

Grasping at straws here.