microsoft / ApplicationInsights-dotnet

ApplicationInsights-dotnet
MIT License
565 stars 287 forks source link

Dependency calls not being logged by App Insights when being performed as part of a Polly retry #1923

Open guysteel opened 4 years ago

guysteel commented 4 years ago

Describe your environment. Describe any aspect of your environment relevant to the problem:

Since upgrading from v2.8.2, when using Polly to retry a failing Http request automatically only the first failing call to the dependency is being logged. Subsequent dependency calls are no longer appear to be logged.

Steps to reproduce. In ConfigureServices - configure app insights and setup a named http client:

  services.AddApplicationInsightsTelemetry();

  ...

  var httpClientBuilder = services.AddHttpClient("TestClient", (provider, client) =>
  {
     client.Timeout = TimeSpan.FromSeconds(120);
 });

  var retryPolicy = Policy.Handle<HttpRequestException>()
        .OrResult<HttpResponseMessage>(response => (int)response.StatusCode >= 500)
        .WaitAndRetryAsync(5, retryAttempt => TimeSpan.FromSeconds(1), (result, span) =>
                {
                    Console.WriteLine(result.Result.StatusCode);
                });

  httpClientBuilder.AddPolicyHandler(retryPolicy);

In Configure add a failing call to demonstrate the behaviour:


  ...
  var httpClientFactory = app.ApplicationServices.GetService<IHttpClientFactory>();
  var httpClient = httpClientFactory.CreateClient("TestClient");
  var request = new HttpRequestMessage(HttpMethod.Get, new Uri("http://httpstat.us/500"));
  var result = httpClient.SendAsync(request).Result;

What is the expected behavior? I would expect each call to the failing endpoint to be logged as a separate dependency call.

What is the actual behavior? I see only a single dependency call logged - the initial call. In the previous version of App Insights all of the calls were being logged.

Additional context. The above uses Polly 3.1.3

karun-verghese commented 4 years ago

I am experiencing a very similar error. The setup is similar to how @guysteel has set it up above. On retries, I do not see a dependency call being logged in app insights. I know the retries are happening because I see the info logs that I am writing within the retry policy handler. The packages/SDK I am using are: .Net Core SDK 3.1.402 Microsoft.ApplicationInsights.AspNetCore (2.14.0) Microsoft.Extensions.Http.Polly (3.1.8)

cijothomas commented 4 years ago

The retry calls will likely not be captured if the retries are transparent to the caller. ApplicationInsights collects httpcalls by subscribing to the DiagnosticSource events emitted by a DiagnosticHandler in the HttpClient. If the retry is happening without going through the full pipeline of handlers (esp DiagnosticHandler), then the retries will not be captured by ApplicationInsights.

Don't think I can suggest a workaround at this stage, as the DiagnosticHandler is part of the HttpClient itself, so you may/may not have the ability to make sure it is called when using the Polly package to do the retry.

karun-verghese commented 4 years ago

@cijothomas I tried the same scenario but with an older version of our service. Basically, I checked out an older branch of my service and ran the same scenario. In app insights I can see the following:   | timestamp [UTC] | itemType | resultCode |   |9/24/2020, 8:26:48.803 AM | dependency | 500 |     | 9/24/2020, 8:26:32.737 AM | dependency | 500 |     | 9/24/2020, 8:26:24.641 AM | dependency | 500 |     | 9/24/2020, 8:26:20.550 AM | dependency | 500 |     | 9/24/2020, 8:26:18.425 AM | dependency | 500|

I know these are my retry calls, as there are no other services running in this environment at the moment, and I can see from the timestamp that the calls are due to the exponential backoff algorithm I have configured for Polly. The packages I am using in this version are :

.Net Core SDK 2.2.0 Microsoft.ApplicationInsights.AspNetCore (2.7.1) Microsoft.Extensions.Http.Polly (2.2.0)

So it appears that this behaviour was available earlier but has since changed for some reason.

My app insights config is as follows: var aiOptions = new Microsoft.ApplicationInsights.AspNetCore.Extensions.ApplicationInsightsServiceOptions(); aiOptions.EnableAdaptiveSampling = false; aiOptions.InstrumentationKey = Configuration.GetSection("ApplicationInsights").GetValue("InstrumentationKey"); services.AddApplicationInsightsTelemetry(aiOptions);

My app insights config has not changed with my new version, except that now we configure the connection string instead of the instrumentation key and of course we use the newer packages which I mentioned in my earlier message.

piotrzbyszynski commented 4 years ago

Also, Application Insights seem to log exceptions handled by Polly

Policy.Handle<HttpRequestException>() After exception occurs, it is logged although IMO it should be regarded as handled exception. @cijothomas Any thoughts?

cijothomas commented 4 years ago

Also, Application Insights seem to log exceptions handled by Polly

Policy.Handle<HttpRequestException>() After exception occurs, it is logged although IMO it should be regarded as handled exception. @cijothomas Any thoughts?

In the ExceptionTelemetry, what is the SDKVersion field value. If it is something starting with "rdd..", then its captured by the DependencyCollectionModule, which I would consider a bug and provide fix. If it is something starting with "il..", then the exception is captured by Msft.Extensions.Logging.ApplicationInsightsLoggingProvider. This works independent of whether DependencyCollectionModule is enabled or not. By default, this loggingprovider captures all logs Warning or above. This is by design and you can configure logging to not capture this by using instructions from https://docs.microsoft.com/en-us/azure/azure-monitor/app/ilogger#control-logging-level

piotrzbyszynski commented 4 years ago

@cijothomas SDK Version is azurefunctions: 3.0.14785.0 (forgot to mention that it is Azure Function, sorry for that)

cijothomas commented 4 years ago

@pzbyszynski Ok. For functions, you need to raise this issue in https://github.com/Azure/azure-webjobs-sdk repo. Functions + ApplicationInsights integration is in that repo.

(But if you see same behaviour outside of functions, let us know right here, we can help)

karun-verghese commented 4 years ago

@cijothomas Any update on the original question? I had posted some tests that I did in the message below:

@cijothomas I tried the same scenario but with an older version of our service. Basically, I checked out an older branch of my service and ran the same scenario. In app insights I can see the following:   | timestamp [UTC] | itemType | resultCode |   |9/24/2020, 8:26:48.803 AM | dependency | 500 |     | 9/24/2020, 8:26:32.737 AM | dependency | 500 |     | 9/24/2020, 8:26:24.641 AM | dependency | 500 |     | 9/24/2020, 8:26:20.550 AM | dependency | 500 |     | 9/24/2020, 8:26:18.425 AM | dependency | 500|

I know these are my retry calls, as there are no other services running in this environment at the moment, and I can see from the timestamp that the calls are due to the exponential backoff algorithm I have configured for Polly. The packages I am using in this version are :

.Net Core SDK 2.2.0 Microsoft.ApplicationInsights.AspNetCore (2.7.1) Microsoft.Extensions.Http.Polly (2.2.0)

So it appears that this behaviour was available earlier but has since changed for some reason.

My app insights config is as follows: var aiOptions = new Microsoft.ApplicationInsights.AspNetCore.Extensions.ApplicationInsightsServiceOptions(); aiOptions.EnableAdaptiveSampling = false; aiOptions.InstrumentationKey = Configuration.GetSection("ApplicationInsights").GetValue("InstrumentationKey"); services.AddApplicationInsightsTelemetry(aiOptions);

My app insights config has not changed with my new version, except that now we configure the connection string instead of the instrumentation key and of course we use the newer packages which I mentioned in my earlier message.

cijothomas commented 4 years ago

@karun-verghese Is the only thing different is the SDK version,? need to double confirm this, as you said "checked out older branch of your service". Please share a minimal repro showing this behavior. Will investigate.

(From what I see, in .Net core, we always relied on DiagnosticSource callbacks to collect http dependency. In very old versions esp. .NET Framework, there were alternate options to collect http calls)

karun-verghese commented 4 years ago

@cijothomas @guysteel I have invited you to be collaborators on a small repo that I set up to investigate this issue. The repo is simple: It contains a very basic .Net Core Web service with couple of healthcheck operations. On the 'main' branch, the setup is as following: .Net Core SDK (3.1.403) Microsoft.ApplicationInsights.AspNetCore (2.15.0) Microsoft.Extensions.Http.Polly (3.1.9)

The configuration for Polly and App insights can be found in the Startup class and is exactly the same as we have in our actual services.

On the branch 'OldSDK', I performed the same tests with the following setup: .Net Core SDK (2.2.208) Microsoft.ApplicationInsights.AspNetCore (2.7.1) Microsoft.Extensions.Http.Polly (2.2.0)

The tests performed are as follows: I have setup an 'isOperable' endpoint such that when I locally host it using kestrel, I can call the endpoint http://localhost:5001/isOperable

The isOperable endpoint intentionally calls a non-existing 'dependent api' which is a variation of some publicly available dummy test api's that I found in the following git repo: https://github.com/public-apis/public-apis#animals. Thus, the expected status code is a 404(Not Found)

The test is very simple and has just 3 steps: 1) Launch the service 2) Call the isOperable endpoint 3) Check the logs in app insights

When I run the test for the latest SDK and packages I see the following in my logs: timestamp [UTC] message data resultCode itemType
10/29/2020, 7:53:17.863 AM - https://cat-fact.herokuapp.com/factsWrongUrl 404 dependency
10/29/2020, 7:53:19.447 AM Delaying for 2.096 seconds and retrying. Retry attempt :1. Request uri : https://cat-fact.herokuapp.com/factsWrongUrl - - trace
10/29/2020, 7:53:22.807 AM Delaying for 4.084 seconds and retrying. Retry attempt 2 . Request uri : https://cat-fact.herokuapp.com/factsWrongUrl - - trace
10/29/2020, 7:53:28.127 AM Delaying for 8.08 seconds and retrying. Retry attempt :3. Request uri : https://cat-fact.herokuapp.com/factsWrongUrl - - trace
When I run the test for the Old 2.2 SDK with the old Polly and App Insights packages I see the following: timestamp [UTC] message data resultCode  itemType
10/29/2020, 9:09:32.865 AM - https://cat-fact.herokuapp.com/factsWrongUrl 404 dependency
10/29/2020, 9:09:34.313 AM Delaying for 2.059 seconds and retrying. Retry attempt :1. Request uri : https://cat-fact.herokuapp.com/factsWrongUrl - - trace
10/29/2020, 9:09:36.381 AM - https://cat-fact.herokuapp.com/factsWrongUrl 404 dependency
10/29/2020, 9:09:37.616 AM Delaying for 4.02 seconds and retrying. Retry attempt 2 . Request uri : https://cat-fact.herokuapp.com/factsWrongUrl - - trace
10/29/2020, 9:09:41.636 AM - https://cat-fact.herokuapp.com/factsWrongUrl 404 dependency
10/29/2020, 9:09:42.981 AM Delaying for 8.096 seconds and retrying. Retry attempt :3. Request uri : https://cat-fact.herokuapp.com/factsWrongUrl - - trace
10/29/2020, 9:09:51.089 AM - https://cat-fact.herokuapp.com/factsWrongUrl 404 dependency

The app insights instance I am pointing to is a private instance I created.

As you can see, with a really stripped down service with nothing except the app insights and polly configuration, the behaviour is the same as reported earlier. With the old SDK I see the dependencies being logged for the retries. But with the latest .Net Core SDK that functionality has stopped working. The difference between the two branches is now really only the differences in SDK, packages and small code changes which you can easily check for yourself

karun-verghese commented 3 years ago

@cijothomas I also gave this a try with the new .Net 5 SDK and packages. It's the same behaviour as described above. The configuration is as follows: .Net Core SDK (5.0.100) Microsoft.ApplicationInsights.AspNetCore (2.16.0) Microsoft.Extensions.Http.Polly (5.0.0) Even now, the dependencies due to retries are not logged.

cijothomas commented 3 years ago

There is no special handling in .NET5 which I am aware of, which would help with this issue.

sebader commented 3 years ago

I came across the same (or very similar?) issue. Running in Azure Functions v3 (netcore 3.1)

        public override void Configure(IFunctionsHostBuilder builder)
        {
            builder.Services
                .AddHttpClient<HttpClient>("HttpClient", client => client.Timeout = TimeSpan.FromSeconds(120)) // default overall request request timeout (includes all polly retries)
                .AddPolicyHandler(GetRetryWithTimeoutPolicy());
        }

        static IAsyncPolicy<HttpResponseMessage> GetRetryWithTimeoutPolicy()
        {
            var retryPolicy = HttpPolicyExtensions
                .HandleTransientHttpError()
                .Or<TimeoutRejectedException>()
                .WaitAndRetryAsync(3, retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)))
                .WrapAsync(Policy.TimeoutAsync(30)); // per request timeout

            return retryPolicy;
        }

Some of the calls to an external API fail (usually time out after the 30 sec.). They are correctly retried by Polly. But AppInsights does not log another dependency call: image You can actually see it in the gap between the failed call and the next one. Based on the trace logs I can see that the call is executed again and succeeds.

I would expect that the retry call shows up as another dependency.

The SDK version on the dependency telemetry is recorded as rdddsc:2.14.0-17971

kAleksei commented 3 years ago

Any updates? Maybe someone found a temporary solution for this issue?

karun-verghese commented 3 years ago

Dear all,

It's been a long time since we've had any update on this issue? Has any investigation been done on this? We even provided a minimal repo (as asked) for you guys to investigate.

Looking forward to hearing from you guys. Thanks.

sekiyak commented 3 years ago

We experience the original issue described with the following versions: .Net Core SDK (5.0.202) Microsoft.ApplicationInsights.AspNetCore (2.17.0) Microsoft.Extensions.Http.Polly (3.1.7)

Hollister commented 3 years ago

We experience the same behaviour: Microsoft.ApplicationInsights.AspNetCore (2.17.0) Microsoft.Extensions.Http.Polly (5.0.1)

It is disappointing to read through this thread, see the complete reproducible example and full detail provided by @karun-verghese and then the complete silence afterwards.

SoucianceEqdamRashti commented 3 years ago

We experience the same behavior with Polly version 3.1.13 and Azure functions 3.0.13. Its frustrating that there is exist no workaround for this.

cijothomas commented 2 years ago

https://github.com/microsoft/ApplicationInsights-dotnet/issues/2556 same/duplicate issue.

devlie commented 2 years ago

The retry calls will likely not be captured if the retries are transparent to the caller. ApplicationInsights collects httpcalls by subscribing to the DiagnosticSource events emitted by a DiagnosticHandler in the HttpClient. If the retry is happening without going through the full pipeline of handlers (esp DiagnosticHandler), then the retries will not be captured by ApplicationInsights.

Don't think I can suggest a workaround at this stage, as the DiagnosticHandler is part of the HttpClient itself, so you may/may not have the ability to make sure it is called when using the Polly package to do the retry.

@cijothomas Should this be be a runtime bug then? Perhaps they can reorder the DiagnosticHandler? This issue combined with #2556 makes DependencyTelemetry pretty useless when there's retry involved.

cijothomas commented 2 years ago

@devlie No. I do not see an issue with Runtime. #2556 shows it is an application insights sdk behavior.

(Separate Note: OpenTelemetry is fixing it.)

diyan-penkov-sciant commented 1 year ago

We experience the same behavior: Microsoft.ApplicationInsights (2.21.0) Microsoft.Extensions.Http.Polly (6.0.11) Microsoft.NET.Sdk.Functions (4.1.1) Microsoft.NETCore.App (6.0.11)

Any new on resolving it?

andrasg commented 1 year ago

Is there anyone who has found a workaround on this? Wrapping retries somehow and manually generating dependency telemetry?

Krusty93 commented 1 year ago

Is there anyone who has found a workaround on this? Wrapping retries somehow and manually generating dependency telemetry?

We do it using DelegatingHandlers (Add sorting is important):

services.AddTransient<ApplicationInsightsPayloadTracerHandler>();

services
  .AddHttpClient(..)
  .AddPolicyHandler(..)
  .AddHttpMessageHandler<ApplicationInsightsPayloadTracerHandler>();

The custom DelegatingHandler to log requests and responses.

    public class ApplicationInsightsPayloadTracerHandler : DelegatingHandler
    {
        private readonly IServiceProvider _serviceProvider;
        private readonly ILogger<ApplicationInsightsPayloadTracerHandler> _logger;

        public ApplicationInsightsPayloadTracerHandler(...) { ... }

        protected override async Task<HttpResponseMessage> SendAsync(
            HttpRequestMessage request,
            CancellationToken cancellationToken)
        {
            HttpResponseMessage response = null;
            var requestTimestamp = DateTime.UtcNow;

            try
            {
                response = await base.SendAsync(request, cancellationToken);
            }
            finally
            {
                await TraceDependencyAsync(request, response, requestTimestamp);
            }

            return response;
        }

        private async Task TraceDependencyAsync(
            HttpRequestMessage request,
            HttpResponseMessage response,
            DateTimeOffset traceTimestamp)
        {
            string traceCorrelationId = ActivityTraceHelper.BuildTraceCorrelationId();

            await TraceRequestAsync(request, traceCorrelationId, traceTimestamp);
            await TraceResponseAsync(request, response, traceCorrelationId);
        }

        private async Task TraceRequestAsync(
            HttpRequestMessage request,
            string traceCorrelationId,
            DateTimeOffset traceTimestamp)
        {
            try
            {
                using var requestScope = _serviceProvider.CreateScope();
                var telemetryClient = requestScope.ServiceProvider.GetRequiredService<TelemetryClient>();

                ...

                var requestTelemetry = new TraceTelemetry
                {
                    SeverityLevel = SeverityLevel.Verbose,
                    Timestamp = traceTimestamp,
                };

                requestTelemetry.Properties.Add("TraceCorrelationId", traceCorrelationId);
                // log headers, status code, etc.
                telemetryClient.TrackTrace(requestTelemetry);
            }
            catch (Exception ex)  {  }
        }

        private async Task TraceResponseAsync(...)
        {
           // very likely as above
        }
    }

Note: To get request headers, use request.Headers.ToString().Split(Environment.NewLine) becauseToString` method is internally overridden with a special implementation.

With this setup if the first attemp fails, Polly triggers the retry that will end up in this handler again, so the request will be logged twice.