microsoft / ApplicationInsights-ServiceFabric

ApplicationInsights SDK for ServiceFabric projects
MIT License
63 stars 26 forks source link

Appinsights logging affecting performance on SF nodes #107

Closed mwonger closed 4 years ago

mwonger commented 4 years ago

We are investigating a performance level issue with a set of customer applications that use Application Insights. We see the following callstack

System.Diagnostics.Activity.get_TraceStateString() [/_/src/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs @ 543] Microsoft.ApplicationInsights.Extensibility.OperationCorrelationTelemetryInitializer+cDisplayClass1_0.b__0() [E:\A_work\248\s\src\Microsoft.ApplicationInsights\Extensibility\OperationCorrelationTelemetryInitializer.cs @ 32] Microsoft.ApplicationInsights.ActivityExtensions.TryRun(System.Action) [E:\A_work\248\s\src\Microsoft.ApplicationInsights\ActivityExtensions.cs @ 34] Microsoft.ApplicationInsights.Extensibility.OperationCorrelationTelemetryInitializer.Initialize(Microsoft.ApplicationInsights.Channel.ITelemetry) [E:\A_work\248\s\src\Microsoft.ApplicationInsights\Extensibility\OperationCorrelationTelemetryInitializer.cs @ 29] Microsoft.ApplicationInsights.TelemetryClient.Initialize(Microsoft.ApplicationInsights.Channel.ITelemetry) [E:\A_work\248\s\src\Microsoft.ApplicationInsights\TelemetryClient.cs @ 532] Microsoft.ApplicationInsights.TelemetryClientExtensions.StartOperation[[System.Canon, System.Private.CoreLib]](Microsoft.ApplicationInsights.TelemetryClient, System.__Canon) [E:\A_work\248\s\src\Microsoft.ApplicationInsights\TelemetryClientExtensions.cs @ 128] Microsoft.ApplicationInsights.ServiceFabric.Module.ServiceRemotingClientEventListener.ServiceRemotingClientEvents_SendRequest(System.Object, System.EventArgs)

TraceStateString looks like a hot spot for us in the performance testing. To test, we disabled application insights especially the OperationCorrelationTelemetry and performance has been way better consistently. The normal behavior with AppInsights enabled is it gradually deteriorates possibly because of the Activity depth below?

in perfview we consistently see hot spots with the same callstacks above: image

image

the top of each "flame" in the flame graph above is the callstack above.

If we look at the source (thanks Michael Lopez for pointing me at the source) we see https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs#L542 / W3C support functionality (see https://w3c.github.io/trace-context) /

/// /// Holds the W3C 'tracestate' header as a string. /// /// Tracestate is intended to carry information supplemental to trace identity contained /// in traceparent. List of key value pairs carried by tracestate convey information /// about request position in multiple distributed tracing graphs. It is typically used /// by distributed tracing systems and should not be used as a general purpose baggage /// as this use may break correlation of a distributed trace. /// /// Logically it is just a kind of baggage (if flows just like baggage), but because /// it is expected to be special cased (it has its own HTTP header), it is more /// convenient/efficient if it is not lumped in with other baggage. /// public string? TraceStateString { get { for (Activity? activity = this; activity != null; activity = activity.Parent) { string? val = activity._traceState; if (val != null) return val; } return null; } We need help in understanding why potentially each activity gets really deep until the parent == null. Each Activity all the way up to the parent map to the same traceid it looks like?

System.Diagnostics.Activity depth averages to around 354 with max of 1148 Activities until it gets back to the parent. Trying to get some understanding around the above.

yantang-msft commented 4 years ago

@mwonger Please open the issue under this repo: https://github.com/microsoft/ApplicationInsights-dotnet as this is a general AppInsights issue but not Service Fabric specific.