jbogard / NServiceBus.Extensions.Diagnostics

Apache License 2.0
16 stars 7 forks source link

Exception Handling #6

Open jamesfera opened 3 years ago

jamesfera commented 3 years ago

I am unable to find any mention of exception handling using the Activity API. I am curious about your thoughts on this. As it stands, Incoming/Outgoing Message failures are not captured and are therefore unavailable to downstream telemetry.

https://github.com/jbogard/NServiceBus.Extensions.Diagnostics/blob/b7ef20b5f579809c5ed69ac7518c9d1ddae89884/src/NServiceBus.Extensions.Diagnostics/OutgoingPhysicalMessageDiagnostics.cs#L21-L37

Something like this would allow the Activity to propagate the Exception (admittedly in an awkward way):

using (var activity = StartActivity(context))
{
  if (activity != null)
  {
      InjectHeaders(activity, context);
  }
  try
  {
    await next().ConfigureAwait(false);
  }
  catch(Exception ex)
  {
    activity.SetCustomProperty("Exception", ex);
  }
  finally
  {
    if (_diagnosticListener.IsEnabled(EventName))
    {
        _diagnosticListener.Write(EventName, context);
    }
  }
}

Have you thought about this? Is there a better way?

DorianGreen commented 3 years ago

I guess you'd want to re-throw the exceptions recoverability can kick in.

jbogard commented 2 years ago

Yeah I think the problem is there's no easy way to detect when a message is sent to the error queue. It's just not exposed today. I can do the exception but I think the better route would be to hook directly into the NSB error processing, which is also not exposed today.

jgo-bvb commented 2 years ago

And it is not feasible to do something with this kind of hook in the Feature.Setup method?

var recoverability = context.Settings.Get<RecoverabilitySettings>();
recoverability.Failed(settings => settings.OnMessageSentToErrorQueue(message =>
{
    System.Diagnostics.Activity.Current?.SetCustomProperty("exception", message.Exception);
    return Task.CompletedTask;
}));
RemyDuijkeren commented 2 years ago

Also setting the Activity Status is important for Application Insights Telemetry.

activity.SetStatus(ActivityStatusCode.Ok);
activity.SetStatus(ActivityStatusCode.Error, "error description"); // exception stacktrace as description?

"Yeah I think the problem is there's no easy way to detect when a message is sent to the error queue." I don't think this should be definition of an error. I don't care if the message is being retried or ends op in the error queue: it failed and I want to see that :-).

I want to see the retries in the Telemetry: I think this is valuable information to show (it helped me multiple times to identify issues). The only downside is that Exceptions are overly reported, because of the retries, but I prefer this, then to hide them.

jbogard commented 2 years ago

I don't think I saw that failure config, this should be easy enough to add.

lailabougria commented 2 years ago

From my tests, just setting the status on the activity doesn't propagate that information into the exporter tool.

To make that work, you need to set specific Otel tags:

activity?.SetTag("otel.status_code", "ERROR");
activity?.SetTag("otel.status_description", ex.Message);
RemyDuijkeren commented 2 years ago

Reading this issue here https://github.com/open-telemetry/opentelemetry-dotnet/issues/2569 , its not being doing on purpose. It's saying it's the responsibility of the exporter(s) to generate those tags.