marcingminski / sqlwatch

SQL Server Performance Monitor
https://docs.sqlwatch.io
Other
432 stars 170 forks source link

Start time in PowerBi (SQLWATCH Dashboard 20210720.pbix) Blocking report is wrong #406

Open jmccandrade opened 3 years ago

jmccandrade commented 3 years ago

The Blocking start time and end time in the report, are not beeing calculated correctely. Start time in the report, is coming from timestamp of the extend event, that it is end time of the extend event.

I also think that in the database, it shoud be log only the last extend event of blocking issue (as it was in version 3.0) and not all the the extend events.

I'm using SQL Server 2019 and SQLWATCH 4.2

marcingminski commented 3 years ago

Please see the https://github.com/marcingminski/sqlwatch/issues/401 for details of the fix.

Neither will be 100% correct. When monitor loop detects blocking, it dumps it into the extended session after the amount of time specified in the sys.configurations has passed. So, if you have your blocking threshold set to 5 seconds, the timestamp will be 5 seconds after the blocking has started. And the same blocking chain will be dumped into the xes every 5 seconds with a new timestamp as long as the blocking is happening, so the timestamp is not the end of the blocking either. To make it more accurate, I would have to cross check with the blocking threshold to calculate the actual start time. As for the end time - this is also unknown as per the above, we can only assume that the blocking has ended if the next iteration of monitor loop has not logged it.

The problem with keeping only the latest record is the ability to identify what is the latest dump of the blocking chain. If you could suggest a reliable and fast primary key for comparison that would be great. Previous version where such comparison took place (and it was on a best guess) was CPU heave on few occasions.

I am open for ideas how to make this reliable and fast at the same time.

I should also mention that this is one of the reasons we keep the actual xml from xes in table so you have access to the details as logged in xes.

Another thing is that a blocking chain could start with one session blocking another, it could go up to 100 sessions being blocked by the head blocker only to then drop down 20 sessions at the end of the blocking chain (that could be because some had timed out and dropped off) so the picture will keep changing during the duration of the blocking which makes it difficult to simply pick "the final version".

jmccandrade commented 3 years ago

Ok, I agree that the timestamp is not the end of the blocking, and is more the moment of the registration of the event that is being occurred for the some time time - blocked waittime, and we don't know when it will end.

marcingminski commented 3 years ago

Yes, exactly. The best we can do is to be aware of what the data means and how it is being logged and what is being shown on the dashboard. Perhaps a note on the dashboard or in the documentation would help clarify it. I will try to add something in the next release.

marcingminski commented 3 years ago

Hi @jmccandrade I have had a look at version 3.x and the blocking_start_time was calculated as follows:

blocking_start_time = dateadd(ms,-bps.[blocking_duration_ms],bps.[report_end_time])

which was the event_time (when the XE was logged) minus the blocking duration. The first blocking duration will match the blocking threshold (as this is when it would trigger the XE), so this makes sense to me and is pretty reliable start_time.

As for the end_time, it was the event_time but because we were storing only last record, the end time was also realistic and for the last record, the blocking duration was the longest. I will try to bring those back into the next release.