open-telemetry / opentelemetry-go

OpenTelemetry Go API and SDK
https://opentelemetry.io/docs/languages/go
Apache License 2.0
5.22k stars 1.05k forks source link

Make it easier to add a resource resource.Detector to the SDKs #5422

Open dashpole opened 4 months ago

dashpole commented 4 months ago

Problem Statement

There is a fairly common use-case which users need to do when configuring the (trace/metric/logs) SDK, in which they want to add a resource detector for the environment they are running in. For example, they start with:

// This gives the user resource.Default()
tp := trace.NewTracerProvider(
    trace.WithBatcher(exporter),
)

They want to add a detector. Say, for example, they want to add go.opentelemetry.io/contrib/detectors/gcp.NewDetector(). Many users will do:

res, err := resource.New(ctx,
    resource.WithDetectors(gcp.NewDetector()),
)
// handle err
tp := trace.NewTracerProvider(
    trace.WithBatcher(exporter),
    sdktrace.WithResource(res),
)

This gives them the attributes from the detector they want, but removes the default set. If they want to add the default set back in, they need to do:

res, err := resource.New(ctx,
    resource.WithDetectors(gcp.NewDetector()),
)
// handle err
res, err = resource.Merge(res, resource.Default())
// handle err
tp := trace.NewTracerProvider(
    trace.WithBatcher(exporter),
    sdktrace.WithResource(res),
)

This is quite a bit of extra code just to add a detector, and in my experience has been error prone for users.

Proposed Solution

Add a resource.WithDefault() Option to resource. When used, it is equivalent to merging the resource with the default resource, or adding resource.WithFromEnv() and resource.WithTelemetrySDK() (and adding something similar to the defaultServiceNameDetector). Users would then be able to do:

res, err := resource.New(ctx,
    resource.WithDefault(),
    resource.WithDetectors(gcp.NewDetector()),
)
// handle err
tp := trace.NewTracerProvider(
    trace.WithBatcher(exporter),
    sdktrace.WithResource(res),
)

Alternatives Considered

Add trace/metric/log SDK WithResourceDetector(resource.Detector) option

If the SDK had a WithResourceDetector option, that would be the simplest possible configuration for users (a single option). However, that would mean the error returned by the resource detector, or the merge cannot be returned to users, and would need to be handled by the global error handler. I think this will be surprising to users whose applications occasionally don't have resource attributes attached to telemetry.

Add resource.WithResource(*resource.Resource) option

This would marginally simplify the above by being able to pass resource.WithResource(resource.Default()) to resource.New(). However, it is complex enough i'm not sure users would figure it out. It is unclear if this is worth it.

dashpole commented 4 months ago

I was reading through the spec for resource, and it seems like our SDKs aren't compliant with the resource spec (which isn't part of the SDK specification).

The SDK MUST extract information from the OTEL_RESOURCE_ATTRIBUTES environment variable and merge this, as the secondary resource, with any resource information provided by the user, i.e. the user provided resource information has higher priority.

The spec for the default service name has similar language (also not in the SDK spec): https://github.com/open-telemetry/semantic-conventions/blob/61870711f9e7c13c3a058215a15e50877dd92d77/docs/attributes-registry/service.md?plain=1#L47

[service.name] MUST be the same for all instances of horizontally scaled services. If the value was not specified, SDKs MUST fallback to unknown_service: concatenated with process.executable.name, e.g. unknown_service:bash. If process.executable.name is not available, the value MUST be set to unknown_service.

Same for the telemetry detector (also not in the SDK specification): https://github.com/open-telemetry/semantic-conventions/blob/61870711f9e7c13c3a058215a15e50877dd92d77/docs/attributes-registry/telemetry.md

The OpenTelemetry SDK MUST set the telemetry.sdk.name attribute to opentelemetry. If another SDK, like a fork or a vendor-provided implementation, is used, this SDK MUST set the telemetry.sdk.name attribute to the fully-qualified class or module name of this SDK's main entry point or another suitable identifier depending on the language. The identifier opentelemetry is reserved and MUST NOT be used in this case. All custom identifiers SHOULD be stable across different versions of an implementation.

The above seems to imply SDKs MUST include our Default resource, and merge it with the user-provided resource. It seems like the correct resolution here is to not add an additional option, but instead to perform that merge within the SDK initialization. Thoughts?

MrAlias commented 4 months ago

The SDK provides the Default function to comply with this requirement.

There are a few ways create a resource. I'm not sure we want to impose that a default resource be included in all of them by default. It seems like it would impose potentially unnecessary composition on all use-cases.

MrAlias commented 4 months ago

We want to look at other language implementation to determine if we are following what they do for our defaults.

Holding until that work is done.