microsoft / ApplicationInsights-node.js

Microsoft Application Insights SDK for Node.js
MIT License
325 stars 139 forks source link

Server Response Time Chart rendering incorrect data with version 2.2.0 #882

Open vinnieking06 opened 2 years ago

vinnieking06 commented 2 years ago

I upgraded from 1.8.10 to 2.2.0 and then in the Application Insights Dashboard overview, our server responses times dropped to 1ms. They normally are about 200ms and after the upgrade, they showing about 1ms.

Here is the 6 hour view. The upgrade to 2.2.0 was deployed around 1pm: Screen Shot 2021-12-17 at 2 28 20 PM

1 hour view: Screen Shot 2021-12-17 at 2 27 49 PM

hectorhdzg commented 2 years ago

@vinnieking06 is this still happening on your side?, are you sampling any data?, if you are not please disabled pre aggregated metrics to ensure this is not causing the issue, you can use setAutoCollectPreAggregatedMetrics(false) more info here, please let us know the results.

vinnieking06 commented 2 years ago

@hectorhdzg Yes it's still happening, and yes we are sampling. Here are our sampling settings: appInsights.defaultClient.config.samplingPercentage = 33; Since we are sampling 33%, do you still recommend trying to disable pre aggregated metrics?

hectorhdzg commented 2 years ago

@vinnieking06 no, you should not turn it off because your metrics will be for only the data you are not sampling, that was the case for data you were sending when using 1.8.10, the preaggregation was calculated on the backend using the Request telemetry you actually sent, now we are calculating the metrics for sampled data as well, this could cause some differences in your numbers. You must be able to query Request telemetry duration to see if there is an inconsistency. Is also possible the response time is actually improved for different reasons like less traffic or something, maybe leave it running for a while to see if data start to look similar.

vinnieking06 commented 2 years ago

@hectorhdzg A response time of 0-2ms would be impossible for our app, since we make a call an api that takes at least 90ms. Also, when you click on the performance graph and it goes to full screen, and then refresh the data, the data looks good, as is shown here: Screen Shot 2022-01-26 at 11 31 06 AM

hectorhdzg commented 2 years ago

@vinnieking06 can you query your Request data and take a look at the duration in there? I would like to understand where the incorrect data could be generated. Also are you using multiple TelemetryClients in your app?

vinnieking06 commented 2 years ago

@hectorhdzg, it looks good when I query it. Low 100ms, which is what we expect. Screen Shot 2022-01-27 at 2 37 37 PM

vinnieking06 commented 2 years ago

Here is the raw data. Most are in the 100 range, but I did see a couple that were around 40 and one that was like 1ms. But the average, as the graph shows above is about 105ms. Screen Shot 2022-01-27 at 2 41 35 PM

vinnieking06 commented 2 years ago

@hectorhdzg As for using multiple TelemtryClients, yes, we also send logs to splunk.