open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.93k stars 1.32k forks source link

[otelcol] Add a custom zapcore.Core for confmap logging #10056

Closed TylerHelmuth closed 1 week ago

TylerHelmuth commented 2 weeks ago

Description

Provides a logger to confmap that buffers logs in memory until the primary logger can be used. Once the primary logger exists, places that used the original logger are given the updated Core.

If an error occurs that would shut down the collector before the primary logger could be created, the logs are written to stdout/err using a fallback logger.

Alternative to https://github.com/open-telemetry/opentelemetry-collector/pull/10008

I've pushed the testing I did to show how the logger successfully updates. Before config resolution the debug log in confmap is not printed, but afterwards it is.

test config:

receivers:
  nop:

exporters:
  otlphttp:
    endpoint: http://0.0.0.0:4317
    headers:
      # Not set
      x-test: ${env:TEMP3}
  debug:
    # set to "detailed"
    verbosity: $TEMP

service:
  telemetry:
    logs:
      level: debug
  pipelines:
    traces:
      receivers:
        - nop
      exporters:
        - debug

image

Link to tracking issue

Related to https://github.com/open-telemetry/opentelemetry-collector/issues/9162 Related to https://github.com/open-telemetry/opentelemetry-collector/issues/5615

Testing

If we like this approach I'll add tests

Documentation

codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 89.79592% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 91.67%. Comparing base (c6b70a7) to head (c3b7488). Report is 3 commits behind head on main.

Files Patch % Lines
otelcol/collector.go 72.22% 7 Missing and 3 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #10056 +/- ## ======================================== Coverage 91.67% 91.67% ======================================== Files 362 367 +5 Lines 16754 16878 +124 ======================================== + Hits 15359 15473 +114 - Misses 1056 1063 +7 - Partials 339 342 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

djaglowski commented 2 weeks ago

Generally looks good to me

TylerHelmuth commented 2 weeks ago

@mx-psi I believe I've address all feedback

mx-psi commented 1 week ago

@TylerHelmuth can you address the lint error?

Error: collector.go:289:10: indent-error-flow: if block ends with a return statement, so drop this else and outdent its block (revive)

I can merge after that

mx-psi commented 1 week ago

The Windows test failure seems to be legit:

    collector_windows_service_test.go:146: 
            Error Trace:    D:/a/opentelemetry-collector/opentelemetry-collector/otelcol/collector_windows_service_test.go:146
            Error:          Should be true
            Test:           TestCollectorAsService/ConfigFileNotFound

It's failing here

https://github.com/open-telemetry/opentelemetry-collector/blob/7b63bfc9a46b692451c838bb7acd2e3692580bf4/otelcol/collector_windows_service_test.go#L146

with this test

https://github.com/open-telemetry/opentelemetry-collector/blob/7b63bfc9a46b692451c838bb7acd2e3692580bf4/otelcol/collector_windows_service_test.go#L54-L58

The Windows service uses a special zap core https://github.com/open-telemetry/opentelemetry-collector/blob/7b63bfc9a46b692451c838bb7acd2e3692580bf4/otelcol/collector_windows.go#L203

I am wondering if the fallback logger is somehow messing with this

TylerHelmuth commented 1 week ago

Agreed that the failure is legit, looking into it. I'll be using CI as the runner lol

TylerHelmuth commented 1 week ago

@pjanotti I need some help here, why can the windows service not handle instantiating a fallback logger before returning the error on start? If I use zap.NewNOP it works, but something about the builder causes problems.