spaghettidba / WorkloadTools

A collection of tools to collect, analyze and replay SQL Server workloads, on premises and in the cloud
MIT License
232 stars 52 forks source link

Possible incorrect avgduration values #88

Closed fvanderhaegen closed 4 years ago

fvanderhaegen commented 4 years ago

Hi,

I'm testing the realtime replay functionality and when I checked the queries with WorkloadViewer I noticed something strange.

I had a query like 'select * from dbo.table1 where id=@id', on the source database it took 4 avgreads (execution count=1) but on the target it took 187 avgreads (execution count=1) . This looked like an incorrect result so I ran the query in SSMS on the target with 'set statistics io,time on' and this showed also 4 logical reads.

Next I watched the live data of the Exteneded Event that is created by WorkloadTools and here also it shows 4 logical reads.

Where do the 187 avgreads come from? Is this a incorrect result or am I missing something? Maybe the same occurs for avgduration, but I couldn't validate this because the two machines are quite different.

spaghettidba commented 4 years ago

Hi Frederik,

The tool takes the data from the underlying extended events session / trace. I'm quite confident that if you got this number it is the exact number that you got from the replay. Some checks that you could perform: 1) Look up the data in the workload analysis database. If it correct in the database and it is wrong in WorkloadViewer, then we have a bug in the viewer 2) If the data is correct in the database (187 reads), add a WorkloadFileWriterConsumer to the instance of SqlWorkload that connects to the instance where you're replaying. This should also produce a .sqlite file that you can use to validate what happened for each query being replayed. If you get 187 reads here as well, we could have a bug in AnalysisConsumer 3) You could also add a File target to the sqlworkload xe session on the fly, while it is running. This could be used as one more source of validation. If you get 187 reads also here, there is no bug. If you get 4 reads, there we have a bug. 4) SSMS could be using different SET options from the replay and you could be getting different plans, one with 4 reads and one with 187 reads. For this specific query it is quite unlikely, but it is worth checking (is table1 really a table or is it a view?). Enabling Query Store on the target server could help you with this.

Can you do these checks for me please? Thank you

fvanderhaegen commented 4 years ago

Hi,

Here are some findings of the tests I did:

It has something to do with the fact the page is already is present in the buffer or not. I'll do some more research about the cause of this.

spaghettidba commented 4 years ago

Excellent, thank you for the quick turnaround. I'll close this issue for the moment. If you find more evidence of something getting captured or interpreted wrong please drop a comment here and we can reopen it.