spaghettidba / WorkloadTools

A collection of tools to collect, analyze and replay SQL Server workloads, on premises and in the cloud
MIT License
232 stars 52 forks source link

Workload Replay hangs with analysis consumer #78

Closed KitLemmonds closed 4 years ago

KitLemmonds commented 4 years ago

I've found that workload replay tends to hang 30-50% of the time during large (1gb+) replays to an analysis consumer. Instead of terminating at the end of the workload file, the consumer keeps "collecting" nonexistent data, printing "0 rows aggregated" and "0 rows written" until a manual shutdown is initiated with ctrl+c.

To salvage at least part of the comparison, it would be helpful if the workload viewer had a control for "compare first n events." If I know I have around 110 million events in my playback and the second playback hangs for 2 days, I could still just compare the 109 million events while ignoring the empty collection time.

I found the following lines containing an unhandled exception in the log of my last hung replay (perhaps the exception is causing a worker to remain active, preventing the replay from ending?):

2020-03-04 16:12:50.0613 - Info - WorkloadTools.Consumer.Replay.ReplayConsumer : 1100000 events queued for replay 2020-03-04 16:13:09.2510 - Info - WorkloadTools.Consumer.Analysis.WorkloadAnalyzer : 858 rows aggregated 2020-03-04 16:13:09.2799 - Info - WorkloadTools.Consumer.Analysis.WorkloadAnalyzer : 131 rows written 2020-03-04 16:13:12.0545 - Error - WorkloadTools.Consumer.Replay.ReplayConsumer : Unhandled exception in TraceManager.RunWorkers 2020-03-04 16:13:12.0545 - Warn - WorkloadTools.Consumer.Replay.ReplayConsumer : Sequence contains no elements 2020-03-04 16:13:41.7411 - Info - WorkloadTools.Consumer.Replay.ReplayConsumer : Worker [79] - Starting 2020-03-04 16:14:09.3710 - Info - WorkloadTools.Consumer.Analysis.WorkloadAnalyzer : 6 rows aggregated 2020-03-04 16:14:09.3710 - Info - WorkloadTools.Consumer.Analysis.WorkloadAnalyzer : 3 rows written 2020-03-04 16:15:09.4378 - Info - WorkloadTools.Consumer.Analysis.WorkloadAnalyzer : 0 rows aggregated 2020-03-04 16:15:09.4378 - Info - WorkloadTools.Consumer.Analysis.WorkloadAnalyzer : 0 rows written

spaghettidba commented 4 years ago

Hi Kit, The problem here is that you are probably trying to replay and analyze with the same instance of SqlWorkload. While this is certainly feasible, I suppose that this is not what you want to do. I could be wrong, but I would have to see your .json files to be sure.

When you perform a replay, you generally want to have two instances of SqlWorkload running: one that performs the replay and one that analyzes the replayed workload against the target server.

This image summarizes what it should look like:

Replay Architecture

In this case, the events are collected directly from a live SQL Server, but the source of "SqlWorkload1" can be a listener attached to a .sqlite file that you collected upfront. Is this your case?

KitLemmonds commented 4 years ago

@spaghettidba : Yes, I'm running the replay & analysis from the same instance of SQLWorkload. The .json file is identical to the example file here, with a listener and two consumers (replay and analysis). If I'm understanding you correctly, I should separate out the analysis consumer into it's own session and .json file, which I would run at the same time as the replay session. Is that correct?

spaghettidba commented 4 years ago

Correct. In the example file that you linked, the replay and analysis are on the same instance of SqlWorkload, but the analysis receives the same events that are being replayed, so the analysis contains the events in the source sqlite file. To analyze the events at the destination server (the one where the events are replayed), you need to set up a separate instance of SqlWorkload that listens to the events on the destination server. See this post for guidance. Hope this helps