open-telemetry / opentelemetry-helm-charts

OpenTelemetry Helm Charts
https://opentelemetry.io
Apache License 2.0
389 stars 468 forks source link

BUG Chart enforces incompatible config options. #877

Open herrbpl opened 1 year ago

herrbpl commented 1 year ago

This in turn leaves colelctor with nonfunctioning config which prevents collector to come up.

Ie, you specify as config part of values.yaml:

  receivers:
    otlp:
      protocols:
        grpc:
        http:
    jaeger:
      protocols:
        grpc:
        thrift_binary:
        thrift_compact:
        thrift_http:
    zipkin:
  exporters:  
    otlp/anonymous:
      endpoint: tempo:4317
      tls:
        insecure: false
      headers:
        x-scope-orgid: anonymous
  processors:
    k8sattributes:        
      passthrough: false
      pod_association:
      - sources:
        - from: resource_attribute
          name: k8s.pod.ip
      - sources:
        - from: resource_attribute
          name: k8s.pod.uid
      - sources:
        - from: connection
      extract:
        metadata:
          - "k8s.namespace.name"
          - "k8s.deployment.name"
          - "k8s.statefulset.name"
          - "k8s.daemonset.name"
          - "k8s.cronjob.name"
          - "k8s.job.name"
          - "k8s.node.name"
          - "k8s.pod.name"
          - "k8s.pod.uid"
          - "k8s.pod.start_time"
  service:
    pipelines:
      traces:
        receivers: [otlp, jaeger,zipkin]
        processors: [k8sattributes]
        exporters: **[otlp/anonymous]**

and by chart, it gets turned into:

    exporters:
      logging: {}
      otlp/anonymous:
        endpoint: tempo:4317
        headers:
          x-scope-orgid: anonymous
        tls:
          insecure: false
    extensions:
      health_check: {}
      memory_ballast:
        size_in_percentage: 40
    processors:
      k8sattributes:
        extract:
          metadata:
          - k8s.namespace.name
          - k8s.deployment.name
          - k8s.statefulset.name
          - k8s.daemonset.name
          - k8s.cronjob.name
          - k8s.job.name
          - k8s.node.name
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.pod.start_time
        passthrough: false
        pod_association:
        - sources:
          - from: resource_attribute
            name: k8s.pod.ip
        - sources:
          - from: resource_attribute
            name: k8s.pod.uid
        - sources:
          - from: connection
      memory_limiter:
        check_interval: 5s
        limit_percentage: 80
        spike_limit_percentage: 25
    receivers:
      jaeger:
        protocols:
          grpc: {}
          thrift_binary: null
          thrift_compact: {}
          thrift_http: {}
      otlp:
        protocols:
          grpc: {}
          http: {}
      prometheus:
        config: {}
      zipkin: {}
    service:
      pipelines:
        logs: {}
        metrics: {}
        traces:
          exporters:
          - otlp/anonymous
          processors:
          - k8sattributes
          - batch
          receivers:
          - otlp
          - jaeger
          - zipkin
      telemetry:
        metrics: {}

This causes collector not to start because of:

Error: invalid configuration: receivers::prometheus: no Prometheus scrape_configs or target_allocator set
2023-08-26T15:30:15.945420398Z 2023/08/26 15:30:15 collector server run finished with error: invalid configuration: receivers::prometheus: no Prometheus scrape_configs or target_allocator set

When I remove parts that originally were missing from my provided config:

exporters:      
  otlp/anonymous:
    endpoint: tempo:4317
    headers:
      x-scope-orgid: anonymous
    tls:
      insecure: false    
processors:
  k8sattributes:
    extract:
      metadata:
      - k8s.namespace.name
      - k8s.deployment.name
      - k8s.statefulset.name
      - k8s.daemonset.name
      - k8s.cronjob.name
      - k8s.job.name
      - k8s.node.name
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.pod.start_time
    passthrough: false
    pod_association:
    - sources:
      - from: resource_attribute
        name: k8s.pod.ip
    - sources:
      - from: resource_attribute
        name: k8s.pod.uid
    - sources:
      - from: connection      
receivers:
  jaeger:
    protocols:
      grpc: {}
      thrift_binary: null
      thrift_compact: {}
      thrift_http: {}
  otlp:
    protocols:
      grpc: {}
      http: {}
  zipkin: {}
service:
  pipelines:                
    traces:
      exporters:
      - otlp/anonymous
      processors:
      - k8sattributes        
      receivers:
      - otlp
      - jaeger
      - zipkin      

collector starts without issues.

Do not babysit people, we are capable of providing working config on our own, this is micromanagement (good intentions, no doubt) and is totally unneeded. If there is need to specify default config template, Grafana charts give a good example to chart is using template provided to render default config upon which end user config is merged to.

herrbpl commented 1 year ago

Damn, this chart really messes up config you wish to provide to collector. If you remove pipelines you don't want, like logs, metrics etc, chart adds them back with empty config. This will fail config validation by collector and and service won't start. Collector is pretty much unusable unless you go with default config provided by chart. You CANNOT remove pipelines configured in chart.

herrbpl commented 1 year ago

This part of README does not work as advertised:

The Collector's configuration is set via the config section. Default components can be removed with null. Remember that lists in helm are not merged, so if you want to modify any default list you must specify all items, including any default items you want to keep.

Example: Disable metrics and logging pipelines and non-otlp receivers:

config:
receivers:
jaeger: null
prometheus: null
zipkin: null
service:
pipelines:
traces:
receivers:
- otlp
metrics: null
logs: null

Pipelines for logs and metrics will still be generated with empty config, failing collector startup checks

herrbpl commented 1 year ago

WORKAROUND:

You can remove unwanted pipelines via helm template rendering trick. Since logs and metrics are always added back, for example when provisioning as Rancher app, might as well leave them in and have them eliminated from created configmap via unsetting their value prior turning map to yaml.

...
service:
    extensions:
      - health_check
      - memory_ballast
    pipelines:      
      logs:
        exporters:
          - logging
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp
      metrics:
        exporters:
          - logging
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp
          - prometheus    
      traces:
        exporters:
          - logging
        processors:
          - k8sattributes
          - memory_limiter
          - batch{{ $_ := unset .Values.config.service.pipelines "logs" }}{{ $_ := unset .Values.config.service.pipelines "metrics" }}
        receivers:
          - otlp
          - jaeger
          - zipkin
...

Too bad one needs to use such tricks to get desired configuration.

herrbpl commented 1 year ago

Another issue with chart is - chart imposes internalTrafficPolicy setting on service, whether its needed or not. It is needed to set only for daemonset, not for deployment mode and service spec default would do fine. Catch is that internalTrafficPolicy was introduced in kubernetes 1.22 or so and this prevents using chart on earlier kubernetes installations. While it is true that README says chart supports K8S from 1.24, nothing actually in otel collector itself makes it incompatible for earlier kubernetes versions. Thus this chart severly limits its usage on earlier kubernetes versions with no good reason.. Especially when otel collector is plugged into these clusters as a stop-gap solution till clusters can be upgraded. Workaround is to render template for service separately, disable service install for chart and deploy service definition as independent manifest. Why such babysitting?

AlissonRS commented 1 year ago

There is known bug in Helm where null does not remove defaults from dependent charts.

Also discussed here: #223 #752

Helm issue: https://github.com/helm/helm/issues/9027

Expected to be fixed in Helm 3.13.0

Laurens-W commented 11 months ago

Helm 3.13 released today and I can confirm that as @AlissonRS said this solves the setting of null values.

One way to work around it is to explicitly provide a values file: -f values.yaml but you dont always have control over the helm command that is used. For example when using ArgoCD.

awaizman1 commented 10 months ago

Adding on @Laurens-W:

using latest chart version 0.73.1 fails when trying to deploy using argoCD. argoCD complains that the application spec is invalid, root cause:

Error: template: .../opentelemetry-collector/templates/NOTES.txt:26:42: executing ".../opentelemetry-collector/templates/NOTES.txt" at <eq .Values.service.internalTrafficPolicy "Cluster">: error calling eq: incompatible types for comparison Use --debug flag to render out invalid YAML

I run it in daemonset mode. something with the fact that default Values.yaml doesn't include any service.internalTrafficPolicy field, and that NOTES.txt includes the line: {{- if and (eq .Values.mode "daemonset") (eq .Values.service.internalTrafficPolicy "Cluster") }} cause it to fail.

As a workaround i added an override with service.internalTrafficPolicy=Local.

appreciate a fix for this. thanks