open-telemetry / opentelemetry-java-contrib

https://opentelemetry.io
Apache License 2.0
144 stars 117 forks source link

Add resource providers for common cloud providers #1074

Open SylvainJuge opened 8 months ago

SylvainJuge commented 8 months ago

Most cloud providers provide a metadata endpoint that allows to build resource information, however in Java contrib repo we only have an implementation for AWS.

For example, in the js contrib repo, we can see there are other implementations in https://github.com/open-telemetry/opentelemetry-js-contrib/tree/main/detectors/node : alibaba, gcp and aws (I haven't looked at their respective implementations though).

The goal here is to add implementations for the most common cloud providers.

Initially the focus will be on the following cloud providers: AWS, GCP and Azure with the following task breakdown

Other cloud providers can of course be added later, but should be tracked independently.

Collector implementations that can be used for reference : https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor


for triage: This issue can be assigned to me.

punya commented 5 months ago

Hi @SylvainJuge, thanks for starting the initiative to gather implementations for all cloud providers.

Once all the resource detectors are added to the contrib repository, what will be the recommended way for users of the Java agent to incorporate these detectors? Naively, I would expect the workflow to look like:

  1. Find or build shaded jars for all the detectors needed.
  2. Add them to the command line using -Dotel.javaagent.extensions

This seems pretty inconvenient, especially because they require the use of advanced features of Maven/Gradle to automate.

I know that there have been earlier discussions about incorporating detectors for common cloud platforms into the default agent distribution. In at least one case, we decided to exclude the detector because it added startup latency. I was wondering if we could get the best of both worlds by

  1. Including the detector code in the agent
  2. Keeping it disabled by default

This would parallel the approach used in the Collector, where the contrib distribution includes many detectors, but a given detector is only invoked if it's explicitly enabled in the Collector configuration file.

SylvainJuge commented 4 months ago

Hi @punya , sorry for the late reply on this.

So far I haven't really thought about "making it convenient to use them", but that's a very good point here. I agree with you that shading or using the command line option is not really practical for most users and doing that for every agent distribution would be wasteful.

Having them included and disabled by default in the agent would definitely be a good option:

In order to implement the "included but disabled by default", what we did on our side so far is the following:

This strategy is complex and can´t be reused when using those resource providers directly as SDK extensions.

On the code side, I think that keeping it in the contrib repo and not directly into the agent allows to reuse them as SDK extensions without an agent, but in practice I really don't know how popular or how relevant this option would be. Given support for java agents in native images like GraalVM is clearly not for the short term that's still something to keep in mind.

So here I would be in favor of keeping the code in contrib repo and add them (but disabled) in the agent. However I am not 100% clear about is what would be the best option to implement the "included but disabled by default" behavior:

I think that we might need to have an agent-only configuration option here to implement the opt-in behavior, as we can't alter the semantics of the existing SDK autoconfig options, for example otel.instrumentation.optional.resource.providers. The agent would contain an hard-coded list of included FQN optional providers and unless their FQN is added to this option those would be added at agent startup to the otel.java.disabled.resource.providers by the agent.

trask commented 4 months ago

I was wondering if we could get the best of both worlds by

  1. Including the detector code in the agent
  2. Keeping it disabled by default

this makes sense to me

On the code side, I think that keeping it in the contrib repo and not directly into the agent allows to reuse them as SDK extensions without an agent, but in practice I really don't know how popular or how relevant this option would be.

even if we moved them to the instrumentation repo, we would still publish them as standalone artifacts, e.g. https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/resources/library

but I agree with keeping them here in the contrib repo where the cloud vendors can have ownership of them, and we can still pull them into the Java agent.

zeitlinger commented 4 months ago

otel.instrumentation.optional.resource.providers

I think this is a good idea :smile:

SylvainJuge commented 4 months ago
  • why use a FQN here instead of the short names as for other providers?

What I meant here is that we should the same values as the ones we can use with the otel.java.{enabled,disabled}.resource.providers options as the agent will probably copy/append/modify the provided values to those existing SDK options.

I wasn't aware of the "short names" that we can use with other providers, is there any documentation or list of them somewhere ? Currently the SDK documentation only refers to FQN.

zeitlinger commented 4 months ago

Sorry, confused that with exporter...

zeitlinger commented 4 months ago

I was wondering if we could get the best of both worlds by

  1. Including the detector code in the agent
  2. Keeping it disabled by default

this makes sense to me

@trask what about otel.java.additional.resource.providers=<FQN1,FQN2> to enable resource providers, without affecting resource providers that are not mentioned in this list.

trask commented 4 months ago

something similar to otel.instrumentation.<>.enabled=true? (could be done entirely in the agent, without impacting resource providers themselves)

jack-berg commented 4 months ago

something similar to otel.instrumentation.<>.enabled=true? (could be done entirely in the agent, without impacting resource providers themselves)

I was wondering if this makes sense given that resource providers can also be used without the otel java agent. Would a user using the resource providers as library instrumentation expect to have the notion of default enabled providers? I think the answer is no. Users have to manually add a dependency on the resource provider, and it makes sense to interpret this as wanting to enable that resource provider by default. In contrast, when the agent is installed, (most) users don't have a say on which resource providers are included, so it makes sense to have an additional configuration knob.

If something like otel.instrumentation.<>.enabled=true was introduced, we could:

zeitlinger commented 4 months ago

Would a user using the resource providers as library instrumentation expect to have the notion of default enabled providers? I think the answer is no.

In the case of a spring boot starter it would also make sense - but I think it doesn't change the proposed solution.

If I understand the proposal correctly, it could be implemented with a new NamedResourceProvider in the SDK

NamedResourceProvider:

Suggestions

  1. use otel.java.resource.provider.<>.enabled to align with the existing providers
    • or otel.resource.provider.<>.enabled if we use gcp instead of FQN
  2. javaagent could create wrapper resource providers for contrib if there's some reason not to implement NamedResourceProvider in contrib
zeitlinger commented 4 months ago

@trask @jack-berg I've created a PR that implements this proposal: https://github.com/open-telemetry/opentelemetry-java/pull/6250

zeitlinger commented 4 months ago

@trask here's the ticket for the Azure resource provider: https://github.com/open-telemetry/opentelemetry-java-contrib/issues/1214