microsoft / Application-Insights-Workbooks

Templates for Azure Monitor Workbooks
MIT License
562 stars 462 forks source link

IoT Hub Strange drops in Workbook #1761

Open bqstony opened 2 years ago

bqstony commented 2 years ago

Hi I see some stranges drops in Workbook of my IoT Hub

@micahl you told me of some new ux polish, i think since the update is this behavior?!?

See 31.01 2022-01-31 08_38_06-Window

When i Zoom in this range 2022-01-31 08_40_09-Window

I can see this drops on all charts, somethimes at the same thime, somthimes not 🤔

2022-01-31 08_46_31-Window 2022-01-31 08_47_02-Window 2022-01-31 08_47_43-Window

I am not shure, is this now the real behavior, because of missing data or is it now wrong.

micahl commented 2 years ago

@bqstony, at what frequency do you have the device set to emit metrics? The default is 5min.

There was a recent fix for the graphs in the 'Host' tab to address the value always dropping to zero at the end. See https://github.com/microsoft/Application-Insights-Workbooks/pull/1753.

For other situations where you may see the graph reporting zero... The Hub device details workbook uses a 5min interval by default in the underlying queries when chunking up the data to generate a time series for visualization (starting from the beginning of your selected time range). That value can be modified in the 'Settings' tab. If no data was reported within a given interval (5 min by default) then the graphs will report 0. That is likely what you're seeing. But to be sure you can query the InsightsMetrics table from the Logs section of your Hub to see if something was reported for a given metric during a given time period.

image

bqstony commented 2 years ago

I use the defaults 5min and also de defaults in the metricscollector. The Version of the Container is mcr.microsoft.com/azureiotedge-metrics-collector:1.0.2

in the short, i tested your query in the range of 30.01 to 31.01 image so when we calculate: 2304 / 8 = 288

the same in the the half houre query (executed on 01.02) image

micahl commented 2 years ago

@bqstony try this query. It should list the N largest time differences between successively recorded samples for CPU usage. This should help us highlight if there are gaps and when they occurred.

let N = 5;
InsightsMetrics
| where TimeGenerated between (ago(2d) .. now())
    and Name == "edgeAgent_used_cpu_percent" 
    and Tags has "host"
    and Tags matches regex ".quantile.:.0.9[^9]"
| extend device = extractjson("$.edge_device", Tags, typeof(string))
| order by TimeGenerated, device
| extend diff = TimeGenerated - next(TimeGenerated)
| top-nested N of TimeGenerated by gap=max(diff), top-nested 1 of device by Ignored=max(1)
| project device, gap, TimeGenerated 
| order by device, TimeGenerated, gap;
bqstony commented 2 years ago

@micahl here is the result for the time range we discused above image

and here the actual top 5 from today image