open-telemetry / opentelemetry-dotnet

The OpenTelemetry .NET Client
https://opentelemetry.io
Apache License 2.0
3.22k stars 765 forks source link

Review and audit of EventSource logging for self-diagnostics #2543

Open alanwest opened 3 years ago

alanwest commented 3 years ago

OpenTelemetry .NET uses EventSource for internal logging. Each component defines an EventSource including the API, SDK, and each of the exporters and instrumentation components.

The purpose of this issue is threefold:

  1. Review the usage of these EventSource's for correctness in what gets logged and where it gets logged.

Example of incorrectness:BaseExporter.Shutdown logs a span processor related error when there is a failure. This error should not be about spans or processors: https://github.com/open-telemetry/opentelemetry-dotnet/blob/635028834c7d435bc64dd64510e4f7b7ec4207a4/src/OpenTelemetry/BaseExporter.cs#L123-L127

  1. Improve coverage of diagnostic logging by identifying gaps.

Example: Sometimes (but not always) we have TODOs throughout the code highlighting gaps where a diagnostic log may be useful, but has not been implemented yet. Here is an example where an attempt to update a metric has failed: https://github.com/open-telemetry/opentelemetry-dotnet/blob/db8f0e712555716932e323f5fc7c301b17ec4c11/src/OpenTelemetry/Metrics/AggregatorStore.cs#L292

  1. Aim to make diagnostic logging actionable

Based on this comment https://github.com/open-telemetry/opentelemetry-dotnet/pull/2525#discussion_r737777141, we should seek to provide actionable guidance when there are errors. For example, if an error is due to misconfiguration, then the log message should give an indication for how to resolve the error.

Related in-flight work

The following PR was held off until the post-1.0 release. It may be beneficial to land this work prior to the review of self-diagnostics:

https://github.com/open-telemetry/opentelemetry-dotnet/pull/1529

References

OpenTelemetry general error handling guidelines/self-diagnostics OpenTelemetry .NET self-diagnostics guide

github-actions[bot] commented 2 months ago

This issue was marked stale due to lack of activity and will be closed in 7 days. Commenting will instruct the bot to automatically remove the label. This bot runs once per day.