open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.94k stars 1.32k forks source link

[otelcol] Allow confmap to write logs using configured logger #10008

Closed TylerHelmuth closed 2 weeks ago

TylerHelmuth commented 1 month ago

Description

This PR allows confmap to create logs, and then actually writes the logs out after the collector's real logger is instantiated.

Example of the logger in action

receivers:
  nop:

exporters:
  otlphttp:
    endpoint: http://0.0.0.0:4317
    headers:
      # Not set
      x-test: ${env:TEMP3}
  debug:
    # set to "detailed"
    verbosity: $TEMP

service:
  pipelines:
    traces:
      receivers:
        - nop
      exporters:
        - debug

image

Alternative to https://github.com/open-telemetry/opentelemetry-collector/pull/10007

Link to tracking issue

Related to https://github.com/open-telemetry/opentelemetry-collector/issues/9162 Related to https://github.com/open-telemetry/opentelemetry-collector/issues/5615

Testing

If we like this approach I'll add tests

TylerHelmuth commented 1 month ago

A downside to this approach is that the logs come in a weird order. This is highlighted in the description's screenshot.

evan-bradley commented 1 month ago

@TylerHelmuth I think this is a more scalable solution given that providers may want to do additional logging in the future (e.g. an HTTP provider that polls an endpoint can't reach the endpoint during runtime) and it would be nice if these logging settings are in line with the rest of the Collector.

Regarding the log ordering, this highlights the fact that, in my opinion, otelcol.Collector and service have a lot of weird overlap. We could realistically do various odd things to fix the ordering (e.g. passing the logs into the service), but I don't think it's such a big deal.

mx-psi commented 4 weeks ago

A downside to this approach is that the logs come in a weird order. This is highlighted in the description's screenshot.

I don't feel like this is a big deal. As part of #4970 we could explore moving the logger initialization outside of service, that would give us more flexibility with how to do this.

Note that service.Logger() has this comment:

This is a temporary API that may be removed soon after investigating how the collector should record different events.

I am fine with using it here for now, but I guess we'll have to work on refactoring this part at some point (it does feel a bit weird that that method exists).

cc @ankitpatel96, this is a rework of #9908

TylerHelmuth commented 4 weeks ago

Another potential downside I need to investigate: if there is an error during confmap resolution will the logs still be written

TylerHelmuth commented 4 weeks ago

if there is an error during confmap resolution will the logs still be written

The answer is no, the logs will not be written. So if we wanted to add debug logs to help users while troubleshooting configuration resolution that wouldn't work - they'd need to entirely depend on the errors returned.

With this solution any logs from confmap will only be written on a successful collector configuration resolution AND service creation.

evan-bradley commented 4 weeks ago

With this solution any logs from confmap will only be written on a successful collector configuration resolution AND service creation.

Could we circumvent this by creating an "error only" logger in otelcol.Collector that replays the logs from the observer with a configuration similar to the one in https://github.com/open-telemetry/opentelemetry-collector/pull/10007?

mx-psi commented 4 weeks ago

The answer is no, the logs will not be written. So if we wanted to add debug logs to help users while troubleshooting configuration resolution that wouldn't work - they'd need to entirely depend on the errors returned.

The alternative (without changing public API) would be to create the logger twice: once before creating the service to log these, and once inside the server.

I am okay with dealing with this problem later if we don't like that though, this is a net improvement IMO and we could merge after adding some tests

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.56%. Comparing base (31528ce) to head (8e9bce4).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #10008 +/- ## ======================================= Coverage 91.56% 91.56% ======================================= Files 360 360 Lines 16698 16703 +5 ======================================= + Hits 15289 15294 +5 Misses 1073 1073 Partials 336 336 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

TylerHelmuth commented 3 weeks ago

I think this is a more scalable solution given that providers may want to do additional logging in the future (e.g. an HTTP provider that polls an endpoint can't reach the endpoint during runtime) and it would be nice if these logging settings are in line with the rest of the Collector.

@evan-bradley to address this concern in https://github.com/open-telemetry/opentelemetry-collector/pull/10007 could the confmap provider's logger be updated with a new logger once the actual one is instantiated?

evan-bradley commented 3 weeks ago

@evan-bradley to address this concern in https://github.com/open-telemetry/opentelemetry-collector/pull/10007 could the confmap provider's logger be updated with a new logger once the actual one is instantiated?

I would prefer not to mess around with the logger instance if we can get away with it. Additionally, that may be a challenge: once we pass the logger in and the providers store the reference, I don't think we have any real ability to update it.

If we want to avoid using the observer, I would say we should do one of these:

  1. Refactor the otelcol<->service interface to move config resolution responsibilities to the service. This needs a lot of discussion and may not be the right move, but I can bring this one up at the next SIG if there aren't any major objections to it.
  2. Have otelcol.Collector instantiate the logger and pass it to the service.

In both of these cases we would ideally find a way to delay actually outputting any logs until we've resolved a log level. Otherwise we would just have to use a default log level until we get the real one.

Actually, looking at this a second time, I think we may have to move away from the observer regardless if we want logs at runtime. It doesn't look like it has any methods that would support "streaming" the logs to a real logger.

evan-bradley commented 3 weeks ago

once we pass the logger in and the providers store the reference, I don't think we have any real ability to update it.

We could get around this by passing a wrapper struct that allows us to update the instance reference so long as providers access the logger only through the wrapper. It would be nice to see if we could solve this some other way first.

TylerHelmuth commented 3 weeks ago

@evan-bradley I don't think either option 1 or 2 will solve the problem of trying to have an the actual user-configured logger during config resolution since they don't solve the problem that the user-configured logger depends on config resolution.

TylerHelmuth commented 3 weeks ago

We could get around this by passing a wrapper struct that allows us to update the instance reference so long as providers access the logger only through the wrapper. It would be nice to see if we could solve this some other way first.

This would work and would be enforceable if ProviderSettings.Logger used that wrapper instead of *zap.Logger. I believe we'd also need a new method on ConfigProvider that would allow updating the logger, or we'd need to keep around a reference to the original wrapper passed into ProviderSettings that could be updated.

evan-bradley commented 3 weeks ago

@evan-bradley I don't think either option 1 or 2 will solve the problem of trying to have an the actual user-configured logger during config resolution since they don't solve the problem that the user-configured logger depends on config resolution.

Both of those options are intended to keep a single instance of the logger and just update it in-place after the config is resolved. I think you may be right that they don't solve the problem though, since it appears the zap APIs all return a new instance when you update a logger.

This would work and would be enforceable if ProviderSettings.Logger used that wrapper instead of *zap.Logger.

Providers could still store the *zap.Logger reference and just use that; any updates to the wrapper would then go unused. I think to make it fully enforceable we would need to fully wrap *zap.Logger and keep the instance reference hidden. Realistically we could also rely on documentation and just say to only store references to the wrapper since the instance may change.

TylerHelmuth commented 3 weeks ago

I think to make it fully enforceable we would need to fully wrap *zap.Logger and keep the instance reference hidden.

Agreed, ProviderSettings would need to look something like

// ProviderSettings are the settings to initialize a Provider.

type OurCustomLoggingInterface {
  # expose the zap.Logger functions here and the ability to update the underlying zap logger.
}

type ProviderSettings struct {
    Logger OurCustomLoggingInterface
}

I don't like it.

mx-psi commented 3 weeks ago

Re: how to update the logger at runtime. I don't get why we need to wrap *zap.Logger into something else. Why not just build a custom zapcore.Core that we can update at runtime? It's an interface already so we can do whatever we want there, including replacing its internals after the fact, can't we?

TylerHelmuth commented 3 weeks ago

Why not just build a custom zapcore.Core that we can update at runtime?

I can investigate this idea

TylerHelmuth commented 2 weeks ago

Tried out the idea here: https://github.com/open-telemetry/opentelemetry-collector/pull/10056

TylerHelmuth commented 2 weeks ago

Closing this for now in favor of https://github.com/open-telemetry/opentelemetry-collector/pull/10056