open-telemetry / opentelemetry-dotnet

The OpenTelemetry .NET Client
https://opentelemetry.io
Apache License 2.0
3.23k stars 764 forks source link

Using ILogger instead of EventSource for logging exceptions #3881

Open timmydo opened 2 years ago

timmydo commented 2 years ago

Is your feature request related to a problem?

We're running the prometheus exporter middleware in production container environment where we have a setup with ILogger exporting to Geneva. We don't have the means to connect perfview or read files on disk to diagnose why the prometheus exporter is returning 500.

Describe the solution you'd like:

What do you want to happen instead? What is the expected behavior?

https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Exporter.Prometheus.AspNetCore/PrometheusExporterMiddleware.cs#L92

            catch (Exception ex)
            {
                PrometheusExporterEventSource.Log.FailedExport(ex);
                if (!response.HasStarted)
                {
                    response.StatusCode = 500;
                }
            }

something like this

                LogFailedExportException(LogLevel.Error, this.logger);

    public partial class PrometheusMiddlewareLogs
    {
        [LoggerMessage(EventId = 1, Message = "ExportFailed exception")]
        static partial void LogFailedExportException(LogLevel logLevel, ILogger logger);
    }

I looked at replacing the middleware class but exporter.CollectionManager is internal so I might need to use some reflection to get around that if we have to hack a fix together.

@cijothomas

cijothomas commented 2 years ago

Linking previous attempts https://github.com/open-telemetry/opentelemetry-dotnet/pull/1867

cijothomas commented 1 year ago

Tagging for consideration in 1.6, as that is the milestone when we expect to finalize Logging changes which are currently worked on in main-logs branch.

cijothomas commented 1 year ago

Linking to https://github.com/open-telemetry/opentelemetry-dotnet/issues/3447#issuecomment-1185329104

pharring commented 8 months ago

If you do this (replace internal EventSources with ILogger), please make sure that ActivityStarted/ActivityStopped events are turned into logger scopes (BeginScope/Dispose).

cijothomas commented 7 months ago

If you do this (replace internal EventSources with ILogger), please make sure that ActivityStarted/ActivityStopped events are turned into logger scopes (BeginScope/Dispose).

I am not sure why is that relevant/needed? Could you elaborate...

Code-Grump commented 7 months ago

I am somewhat confused - if I'm routing my logging via OTel and I'm having problems with an exporter, writing the problems back into that system feels like it could make the problem worse, or otherwise make it harder to diagnose. Writing to an ILogger is definitely a lot easier to work with than an EventSource, I'd just like to see an example of how this is expected to work.

cijothomas commented 7 months ago

I am somewhat confused - if I'm routing my logging via OTel and I'm having problems with an exporter, writing the problems back into that system feels like it could make the problem worse, or otherwise make it harder to diagnose. Writing to an ILogger is definitely a lot easier to work with than an EventSource, I'd just like to see an example of how this is expected to work.

It is a valid point, and yes, solving that (i.e avoiding dead-loop situations) is part of implementation details. See https://github.com/open-telemetry/opentelemetry-dotnet/pull/1867#discussion_r588066689

cijothomas commented 2 months ago

@Yun-Ting is still working on it targeting 1.10 milestone.

Yun-Ting commented 1 month ago

I am somewhat confused - if I'm routing my logging via OTel and I'm having problems with an exporter, writing the problems back into that system feels like it could make the problem worse, or otherwise make it harder to diagnose.

While trying to implement the to ILogger migration for OTel internal diagnostics events, I hit the above mentioning issue.

The crux of the problem we want to solve here is the inability to access the file-on-disk in containerized environments. I'm thinking we can consider the below alternative solution—write diagnostics logs to the console.

How to enable OTel self diagnostics

By setting environment variables for the at fault process or by adding key-value pair entries to appsettings.

For example,

OTel_selfDiagnostics_target = console; Otel_selfDiagnostcs_level= warning;

{ "OTel_selfDiagnostics_target": "console", "Otel_selfDiagnostcs_level": "warning" }

The assumption here is console logs will be accessible for environments where file-on-disk is not accessible. (For example: https://learn.microsoft.com/en-us/azure/container-apps/log-streaming?tabs=bash#view-log-streams-via-the-azure-portal)

Implementation

Use the existing EventSource diagnostics infrastructure, and add support for write self diagnostics log to console when user specifies to direct logs there. Adding support to write to console here: https://github.com/open-telemetry/opentelemetry-dotnet/blob/a55a5ac997204e571a41c495f83de828d3e07685/src/OpenTelemetry/Internal/SelfDiagnosticsConfigRefresher.cs#L37C1-L43C57

cijothomas commented 1 month ago

While trying to implement the to ILogger migration for OTel internal diagnostics events, I hit the above mentioning issue.

Isn't that solvable with suppression scope? We prevent SDK's own http spans from being looped back in already. We can use the same mechanism for logs too.

I'm thinking we can consider the below alternative solution—write diagnostics logs to the console.

Console maybe a viable alternative, given stdout is a very common supported solution in many hosting environents, but I think we should still send logs to ILogger, and let user decide if they want ILogger to route it to console or someplace else. Sometimes users want the Ilogger piped via OTel SDK + OTLP itself (eg: Metrics cardinality limit is hit - user can chose to send such logs to their OTLP Logs etc.)

CodeBlanch commented 1 month ago

@cijothomas We've been prototyping ILogger. But it seems to introduce more challenges/questions than solutions 🤣 Planning to discuss this on the SIG today if you want to join.

Here's some thoughts in no particular order...

What I want to do is step back a bit. Instead of jumping into a solution like "use ILogger for internal logging" I think we should focus first on the scenario(s) we want to enable:

When @Yun-Ting and I discussed the scenario we felt a much simpler thing to do would be augment the feature we already have to support a console target. It will enable the scenario(s) and avoid all the problems we were running into with ILogger 🤔

Code-Grump commented 1 month ago

I feel using ILogger is attractive, but not correct. It promises the ability to write diagnostic information to any compatible sink, using an established configuration system, but it's intended to deliver the application logs via OpenTelemetry. The kind of logging we're interested in is the "telemetry isn't working" kind that we turn on because the systems connected to ILogger aren't working.

Being able to route the existing system to the console, enabled via an environment variable meets that need without a great deal of thinking. I've implemented it myself three times now and would love to not have to make it a fourth.

CodeBlanch commented 1 month ago

Obligatory

image

noahfalk commented 1 month ago

+1 on nailing down the scenario goal above all else :)

If its helpful it sounds like the issues you are hitting includes this one: https://github.com/dotnet/runtime/issues/50777

There is lots of discussion on that thread, but the currently suggested workaround is to create your own ILogger and configure it internal to your library in situations where you can't (or don't want to) have the user pass one in through your API surface. Doing that implies taking an assembly dependency on M.E.L and perhaps particular providers like console. Depending on your scenario goal you might also decide that these dependencies aren't justified relative to the functionality you need to meet your goal.

alanwest commented 1 month ago

When @Yun-Ting and I discussed the scenario we felt a much simpler thing to do would be augment the feature we already have to support a console target. It will enable the scenario(s) and avoid all the problems we were running into with ILogger

I support this approach. However, I'm not sure there'd be anything inherently wrong with using ILogger either. To me it seems like folks are conflating a number of things together which caused the complications perceived with using ILogger. Namely, OpenTelemetry .NET's support for ILogger is orthogonal to the SDK's self diagnostic logging. In my opinion, it was never a requirement to integrate the self diagnostics into the SDK's logging pipeline. As such, the SDK could create a configure a LoggerFactory expressly for the purpose of self diagnostics that is isolated from the SDK's pipeline.

That said, either ILogger or EventSource is fine. They're both logging frameworks within the .NET ecosystem, so they're both fine options. Every other language SDK has done similar in adopting a logging framework from their ecosystem. For example, OpenTelemetry Java uses java.util.logging. To @noahfalk's point, maybe EventSource is preferable in order to avoid additional dependencies?

Irrespective of logging framework chosen, here are the requirements I'd like us to consider:

  1. Default to writing self diagnostics to stdout. The option to write to a file is nice, but I have found it to be an additional hurdle for supporting end users.
  2. Allow for self diagnostics to be enabled/disabled.
  3. Enable self diagnostics by default with error (or maybe warn) severity being the default.

There was some additional discussion in the SIG meeting about using the OTLP format. I believe this too is orthogonal and should not be a requirement. Later we could entertain a separately configurable export pipeline that could send these logs over OTLP, but I suspect using log forwarders like fluent bit to scrape console logs is the more common pattern for sending these kind of logs to a backend.

Lastly, in the SIG meeting, there was some concern expressed about multi-line log message (e.g., a log message containing a stack trace). I'd be fine with a requirement that self diagnostics be configurable to emit messages on a single line, but I do not think this should be the default. Multiline log messages are very common, and log forwarders like fluent bit can be configured to handle them pretty seamlessly.

CodeBlanch commented 1 month ago

EventSource is preferable in order to avoid additional dependencies?

Yes. We have some ILogger dependencies. But we don't have ILogger Console dependencies. If we want console-by-default we would have to take a reference on Microsoft.Extensions.Logging.Console and that feels like a non-starter. Could implement our own ILoggerProvider which writes to Console to avoid that but there is also the size of the effort to switch from EventSource to ILogger. It is big so we better be sure there is a really compelling reason to take it on 😄

  1. Default to writing self diagnostics to stdout. The option to write to a file is nice, but I have found it to be an additional hurdle for supporting end users.
  2. Allow for self diagnostics to be enabled/disabled.
  3. Enable self diagnostics by default with error (or maybe warn) severity being the default.

👍 from me

There was some additional discussion in the SIG meeting about using the OTLP format. I believe this too is orthogonal and should not be a requirement. Later we could entertain a separately configurable export pipeline that could send these logs over OTLP, but I suspect using log forwarders like fluent bit to scrape console logs is the more common pattern for sending these kind of logs to a backend.

That's certainly not what I was suggesting. Maybe someone else did 😄 What I was suggesting was, when it comes to writing to console/stdout, we should do single-line JSON using OTLP JSON schema. Doesn't have to be that, but if we can avoid inventing some new format great!