spaghettidba / WorkloadTools

A collection of tools to collect, analyze and replay SQL Server workloads, on premises and in the cloud
MIT License
232 stars 52 forks source link

Power BI Report Calculation - possible errors #83

Closed cairneym closed 1 year ago

cairneym commented 4 years ago

From my workload, the summary calculations for the averages appear to use the total number of times the query has executed between BOTH the baseline and the benchmark rather than working out the average per workload.

The attached shows the Query Detail from drillthrough for a selected query hash with only a single execution for each of the baseline and benchmark traces, but the values shown in the table in the Queries summary is 50% of what would be expected, suggesting that both entries from each trace are used to calculate the average.

query_summary

query detail

Trovalo commented 4 years ago

Hi, Each measure filters only a single workload, therefore this should not happen.

Can you share your pbix file or the data related to that Sql hash so I can have a better look at your issue?

cairneym commented 4 years ago

I started digging in further to trace this back to source. The SQLite database DID actually have 2 queries that matched this query text pattern - it just so happened that both of them were executed within 1 second of each other.

The aggregation in the analysis therefore put them in the same bucket and hence the charts "appearing" to only show 1 value while making averages.

A suggestion to make this more obvious would be to include the number of executions over time as an additional chart on the Drillthrough report.

cairneym commented 4 years ago

I'm assuming that the number of executions feeds into the TOTALS calculations as well? If I copy the data from the Table from the "Queries" report tab and paste into Excel, I see what appears to be some rounding differences in the "Total" type columns, but I cannot make the "Average" type columns get even close to the same values. Using the (Total Duration/CPU/Reads)/(Number of Batches) gets closer values but still quite a way out.

Comparison: Averages.pdf

Trovalo commented 4 years ago

Glad to hear that you have some additional info. About the first suggestion, the number of executions can be put in the chart itself as a tooltip (it will show when you hover the value).

It's still not easy to check it without the data, I should be able to replicate it just by using the data related to that "sql hash", a raw extraction from the database should be enough, I don't need any dimensional data (no db name , app name, query text, etc), just the metric values and their "sql hash" from the "PowerBI_WorkloadData" view. Can you share those data?

Trovalo commented 4 years ago

I had a look at some of my data and didn't notice any anomaly between the data and the calculation values.

But I noticed that the averages might be unprecise. As an example, [Average Duration - Baseline] is calculated by computing the average of the column "Avg Duration (µs)" where the data series is "baseline". But that column is already a pre-aggregation of the time interval (ie: 1min) if in the same minute the query runs 10 times then "Avg Duration (µs)" is the average of 10 executions and this is not considered.

I will fix the averages soon, by:

  1. calculating a "weighted average" or
  2. calculating the average using "Sum Duration (µs)" / "Execution Count", which will probably be even more precise since it will have the decimals (we are talking about decimals of µs)

Update: Here you can find the new template. It will be official as soon as the PR is approved WorkloadTools Report - Template.zip

spaghettidba commented 4 years ago

PR approved 😄