Instrumentation support selector

open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector

Apache License 2.0

1.12k stars 395 forks source link

Instrumentation support selector #2744

Open crossoverJie opened 3 months ago

crossoverJie commented 3 months ago

Component(s)

auto-instrumentation

Is your feature request related to a problem? Please describe.

Instrumentation applies to the specified Pod, similar to this: https://skywalking.apache.org/docs/skywalking-swck/latest/java-agent-injector/#1-label-selector-and-container-matcher

Describe the solution you'd like

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
spec:
  selector:
    app: appname

Describe alternatives you've considered

No response

Additional context

No response

pavolloffay commented 3 months ago

This approach has one considerable downside over the annotation. The annotation changes pod spec, therefore it triggers re-deployment which is needed by the OTEL operator to inject the auto-instrumentation (the operator uses pod mutating webhook)

crossoverJie commented 3 months ago

therefore it triggers re-deployment which is needed by the OTEL operator to inject the auto-instrumentation (the operator uses pod mutating webhook)

Yeah, you're right.

However, we hope that all the configurations associated with OTel can be handed over to Operator, and if the annotation are used, we also need to maintain them ourselves during the deployment.

crossoverJie commented 3 months ago

This is our usage scenario; I'll submit a PR if appropriate.

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
spec:
  selector:
    app: appname
  propagators:
    - tracecontext
  java:
    image: image

jaronoff97 commented 1 month ago

@crossoverJie we discussed this issue at the SIG (and realized we should have probably done this prior to your PR) and had a few follow up questions.

I saw the link you shared for how skywalking handles this, and at first glanced it seemed very similar, however, there's a key difference. Skywalking seems to be a java-only auto-instrumentation solution whereas opentelemetry can inject instrumentation for multiple languages. That results in what we agreed to be a confusing experience for users where a cluster admin needs to set a piece of configuration AND a tenant in the cluster needs to set configuration as well.

What are the user stories for this request and how do they differ from today? With the change you've made, it seems to be a very similar end state where both cluster admin and tenant need to do actions which feels against the goal of the issue which is to put the onus of instrumentation solely in the hands of the cluster admin.
What are the reasons for having multiple instrumentation resources for the same language in the same cluster? I'm not sure I understand the exact issue today that the current solution solves.

I was imagining it would be more beneficial for both personas if the cluster admin could specify a set of rules that specify a selector for applications to target for injection and specify which languages (and optionally container names) to inject. Let me know your thoughts here, and if you'd like to meet more to discuss the above I'd be happy to chat.

crossoverJie commented 1 month ago

@jaronoff97

Thank you for following up on this issue.

That results in what we agreed to be a confusing experience for users where a cluster admin needs to set a piece of configuration AND a tenant in the cluster needs to set configuration as well.

Can you explain the relationship between cluster admin and tenants here? In my scenario, I'm the only one maintaining Instrumentation.

2. What are the reasons for having multiple instrumentation resources for the same language in the same cluster? I'm not sure I understand the exact issue today that the current solution solves.

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: instrumentation-sit-grpc-consumer
  namespace: sit
spec:
  env:
    - name: OTEL_SERVICE_NAME
      value: grpc-consumer
  selector:
    matchLabels:
      app: grpc-consumer
  java:
    image: autoinstrumentation-java:v1
    extensions:
      - image: extensions:v1
        dir: /extensions
    env:
      - name: OTEL_RESOURCE_ATTRIBUTES
        value: service.name=grpc-consumer

---
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: instrumentation-sit-grpc-provider
  namespace: sit
spec:
  env:
    - name: OTEL_SERVICE_NAME
      value: grpc-provider
  selector:
    matchLabels:
      app: grpc-provider
  java:
    image: autoinstrumentation-java:v1
    extensions:
      - image: extensions:v2
        dir: /extensions

    env:
      - name: OTEL_RESOURCE_ATTRIBUTES
        value: service.name=grpc-provider

This is our usage scenario, and there are some application-specific configurations that need to be defined in different Instrumentations.

There are some other examples:

New versions of javaagent are only tested on specific applications first.
Some specific applications only need to report metrics but do not want to report traces.
For resource isolation, the OTEL_EXPORTER_OTLP_ENDPOINT reported by some applications may be different.

Therefore we like that each application can maintain its own Instrumentation file independently.

jaronoff97 commented 1 month ago

Thanks for adding more context! Given that you want each application to maintain its own Instrumentation, it's unclear to me why adding the name of the instrumentation in the application is more effort than today? All of the use cases you mentioned are solved with the existing system of annotations by allowing users to specify which instrumentation they want to use.

I still think the broader use case you describe is valuable, however, and it would be much simpler if the instrumentation selected its applications too. I think that instrumentation selector only works (or works best) when the user doesn't need to modify their application whatsoever to do so.

crossoverJie commented 1 month ago

Thanks for your reply, the following issues may require further discussion.

it's unclear to me why adding the name of the instrumentation in the application is more effort than today?

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: order
  name: order
spec:
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/container-names: "order"
        instrumentation.opentelemetry.io/inject-java: "true"
        # associate instrumentation
        instrumentation.opentelemetry.io/instrumentation-name: "instrumentation-prod"
...
----
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: instrumentation-prod
  namespace: prod
...

Do you mean to associate Instrumentation in deployment?

All of the use cases you mentioned are solved with the existing system of annotations by allowing users to specify which instrumentation they want to use.

I don't understand clearly here.

Similar to my last comment, in the current time we will create an Instrumentation for each application and define environment variables in this Instrumentation.

Is there any other way to configure these environment variables? We don't want to configure startup parameters in the application, we want all configuration to be done by Operator.

I think that instrumentation selector only works (or works best) when the user doesn't need to modify their application whatsoever to do so.

Not sure I understand what you mean. Currently there is no need to modify the application, just associate the application through the selector in the Instrumentation.

jaronoff97 commented 1 month ago

the example you linked demonstrates what i was referring to i.e. creating an instrumentation object and then referencing it directly via its name in the application.

Currently there is no need to modify the application, just associate the application through the selector in the Instrumentation.

There is a need to modify the application with at least the following in the world where we applied a selector as your PR does today. per your example:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: order
  name: order
spec:
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/container-names: "order"
        instrumentation.opentelemetry.io/inject-java: "true" # <<< This is always needed today
        # associate instrumentation
        instrumentation.opentelemetry.io/instrumentation-name: "true" # <<< This is still needed in your PR

What i'm thinking is a world where a cluster admin like yourself would define an instrumentation like so:

apiVersion: opentelemetry.io/v1alpha1
kind: InstrumentationRules
metadata:
  name: instrumentation-prod
  namespace: prod
spec:
  rules:
  - name: inject-java-app-prod
     java: <java-config>
     env: []
     selector:
       matchLabels:
         app: order
         env: prod
  - name: inject-java-app-dev
     java: <java-config>
     env: []
     selector:
       matchLabels:
         app: order
         env: dev

This would then inject configuration for the following deployments without requiring any modifications in the deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: order
    env: prod
  name: prod-order
spec:
  ...
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: order
    env: dev
  name: dev-order
spec:
  ...
---

This to me would be closer to what skywalking can accomplish today and would allow cluster admins to entirely separate injection from application developers. Let me know what you think, thank you!

crossoverJie commented 3 days ago

Sorry for the delayed response.

What i'm thinking is a world where a cluster admin like yourself would define an instrumentation like so:

Using a new CRD: InstrumentationRules seems like a good approach. We just need to move the configuration changes from Instrumentation to InstrumentationRules.

One more question: if we use InstrumentationRules, is it still necessary to keep Instrumentation? Or will Instrumentation be removed in a future update?