Allow more customization in the OpenTelemetry collector configuration

jvanz commented 4 weeks ago

The Helm charts need to be updated to allow for more customization of the OpenTelemetry (Optel) collector configuration. Making the Optel collector configuration as flexible as possible. This will allow users to configure pipelines and exporters to send data to a Stackstate cluster.

This is necessary because the main point of integration with Stackstate is done through the Kubewarden Optel collector sending data to the Stackstate Optel collector. In order to accomplish this, the Kubewarden collector must be configured to receive data, pass it through a pipeline, and export the final data to the Stackstate collector. This will require configuration changes to the receivers, processors, exporters, and pipelines. The exporter can be the https/grpc exporter available in Optel.

In previous experiments, it was necessary to update the collector configuration after the Kubewarden installation. This issue should be addressed. The configuration used in those experiments, based on Stackstate documentation and experiments, is provided as an example.


image:
  repository: "otel/opentelemetry-collector-k8s"
extraEnvsFrom:
  - secretRef:
      name: open-telemetry-collector
mode: deployment
ports:
  metrics:
    enabled: true
presets:
  kubernetesAttributes:
    enabled: true
    extractAllPodLabels: true
config:
  extensions:
    bearertokenauth:
      scheme: StackState
      token: "${env:API_KEY}"
  exporters:
    otlp/stackstate:
      auth:
        authenticator: bearertokenauth
      endpoint: <otlp-stackstate-endpoint>:443
  processors:
    tail_sampling:
      decision_wait: 10s
      policies:
      - name: rate-limited-composite
        type: composite
        composite:
          max_total_spans_per_second: 500
          policy_order: [errors, slow-traces, rest]
          composite_sub_policy:
          - name: errors
            type: status_code
            status_code: 
              status_codes: [ ERROR ]
          - name: slow-traces
            type: latency
            latency:
              threshold_ms: 1000
          - name: rest
            type: always_sample
          rate_allocation:
          - policy: errors
            percent: 33
          - policy: slow-traces
            percent: 33
          - policy: rest
            percent: 34
    resource:
      attributes:
      - key: k8s.cluster.name
        action: upsert
        value: <your-cluster-name>
      - key: service.instance.id
        from_attribute: k8s.pod.uid
        action: insert
    filter/dropMissingK8sAttributes:
      error_mode: ignore
      traces:
        span:
          - resource.attributes["k8s.node.name"] == nil
          - resource.attributes["k8s.pod.uid"] == nil
          - resource.attributes["k8s.namespace.name"] == nil
          - resource.attributes["k8s.pod.name"] == nil
  connectors:
    spanmetrics:
      metrics_expiration: 5m
      namespace: otel_span
    routing/traces:
      error_mode: ignore
      match_once: false
      table: 
      - statement: route()
        pipelines: [traces/sampling, traces/spanmetrics]
  service:
    extensions:
      - health_check
      - bearertokenauth
    pipelines:
      traces:
        receivers: [otlp]
        processors: [filter/dropMissingK8sAttributes, memory_limiter, resource]
        exporters: [routing/traces]
      traces/spanmetrics:
        receivers: [routing/traces]
        processors: []
        exporters: [spanmetrics]
      traces/sampling:
        receivers: [routing/traces]
        processors: [tail_sampling, batch]
        exporters: [debug, otlp/stackstate]
      metrics:
        receivers: [otlp, spanmetrics, prometheus]
        processors: [memory_limiter, resource, batch]
        exporters: [debug, otlp/stackstate]

[!warning] This is an example configuration and does not necessarily represent the final configuration.

Given the wide variety of possible Optel collector configurations, I propose allowing users to completely overwrite the current default definition as the easiest solution. This will give users the ability to customize the collector as they see fit, without the need for a Helm chart update every time they want to add a new feature, such as a new pipeline, processor, or exporter. I understand that this may cause issues in defining what is supported and what is not. This is a decision that can be made during the course of working on this task.

Acceptance Criteria

Define the best way to allow users to define their own Optel collector configuration for integration with Stackstate. Ideas include:
- Adding values to cover the most common possibilities, as is currently done.
- Allowing users to overwrite the collector configuration and configure it freely.
Implement the new configuration approach.
Add tests to cover the Optel collector configuration.

flavio commented 3 weeks ago

Makes sense, moved to the TODO column

flavio commented 3 weeks ago

@jvanz: while working on this issue you could also move to the new v1beta1 CRDs od OTEL (see this issue)

jvanz commented 2 weeks ago

I'm moving this to block until I restore the access to my testing machines

flavio commented 1 week ago

Moving to blocked, we have to discuss how to move forward with that

kubewarden / helm-charts

Allow more customization in the OpenTelemetry collector configuration #573

Acceptance Criteria