open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.12k stars 2.39k forks source link

jmxreceiver won't start in otel/opentelemetry-collector-contrib:0.61.0 docker image #14757

Closed gabrielgiussi closed 1 year ago

gabrielgiussi commented 2 years ago

What happened?

Description

The collector fails trying to start the jmxreceiver because it can't create the jxm-config-*.properties file.

Steps to Reproduce

You can just apply the yaml configuration in a k8s cluster and the agent container won't start.

Collector version

otel/opentelemetry-collector-contrib:0.61.0 docker image

Environment information

apiVersion: v1
kind: ConfigMap
metadata:
  name: collector-config
data:
  collector.yaml: |
    receivers:
      otlp:
        protocols: 
          grpc:
    processors:
    exporters:
      logging:
      prometheus:
        endpoint: "0.0.0.0:4318"
        namespace: test-space
        send_timestamps: true
        metric_expiration: 180m
        enable_open_metrics: true
        resource_to_telemetry_conversion:
          enabled: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: []
          exporters: [logging]
        metrics:
          receivers: [otlp]
          processors: []
          exporters: [logging]
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: agent-config
data:
  agent.yaml: |
    receivers:
      otlp:
        protocols: 
          grpc:
      prometheus:
        config:
          scrape_configs:
            - job_name: 'otel-collector'
              scrape_interval: 5s
              metrics_path: /stats/prometheus
              static_configs:
                - targets: ['0.0.0.0:9901']
      jmx:
        jar_path: /data/otel-jmx.jar
        endpoint: localhost:1101
        target_system: jvm
        collection_interval: 10s
        # optional: the same as specifying OTLP receiver endpoint.
        #otlp:
        #  endpoint: mycollectorotlpreceiver:4317
        # username: my_jmx_username
        # determined by the environment variable value
        # password: $MY_JMX_PASSWORD
        #resource_attributes: my.attr=my.value,my.other.attr=my.other.value
        log_level: info
    processors:
      k8sattributes:
        auth_type: "serviceAccount"
        passthrough: false
        filter:
          node_from_env_var: KUBE_NODE_NAME
        extract:
          metadata:
            - k8s.pod.name
            - k8s.pod.uid
            - k8s.deployment.name
            - k8s.cluster.name
            - k8s.namespace.name
            - k8s.node.name
            - k8s.pod.start_time
        pod_association:
          - from: resource_attribute
            name: k8s.pod.uid     
      metricstransform:
        transforms:
          - include: ^envoy\_
            match_type: regexp
            action: update
            operations:
              - action: add_label
                new_label: pod_name
                new_value: ${KUBE_POD_NAME}
              - action: add_label
                new_label: test_label
                new_value: hardcoded_value
          - include: envoy_vhost_vcluster_upstream_rq_retry_overflow
            action: update
            operations:
              - action: add_label
                new_label: test_label_two
                new_value: hardcoded_value

    exporters:
      otlp:
        endpoint: "opentelemetrycollector.default.svc.cluster.local:4317"
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: []
          exporters: [otlp]
        metrics:
          receivers: [jmx, prometheus]
          processors: [metricstransform]
          exporters: [otlp]
---
apiVersion: v1
kind: Service
metadata:
  name: opentelemetrycollector
spec:
  ports:
    - name: grpc-otlp
      port: 4317
      protocol: TCP
      targetPort: 4317
    - name: prometheus
      port: 4318
      protocol: TCP
      targetPort: 4318
  selector:
    app.kubernetes.io/name: opentelemetrycollector
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: opentelemetrycollector
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: opentelemetrycollector
  template:
    metadata:
      labels:
        app.kubernetes.io/name: opentelemetrycollector
    spec:
      containers:
        - name: otelcol
          args:
            - --config=/conf/collector.yaml
          image: otel/opentelemetry-collector:0.61.0
          volumeMounts:
            - mountPath: /conf
              name: collector-config
      volumes:
        - configMap:
            items:
              - key: collector.yaml
                path: collector.yaml
            name: collector-config
          name: collector-config
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gabi
  labels:
    app: gabi
spec:
  replicas: 2
  selector:
    matchLabels:
      app: gabi
  template:
    metadata:
      labels:
        app: gabi
    spec:
      serviceAccountName: otel-collector-service-account
      initContainers:
        - name: config-data
          image: ubuntu:xenial
          command: [ "/bin/sh","-c" ]
          args: [ "apt update; apt install -y wget tar; wget -O /data/otel-jmx.jar https://repo1.maven.org/maven2/io/opentelemetry/contrib/opentelemetry-jmx-metrics/1.15.0-alpha/opentelemetry-jmx-metrics-1.15.0-alpha.jar" ]
          volumeMounts:
            - mountPath: /data
              name: jmx-jar
      containers:
        - name: app
          image: clojure:temurin-17-lein-2.9.10-bullseye
          command: ["/bin/sh", "-c", "apt-get update && apt-get install -y git && git clone https://github.com/spring-guides/gs-spring-boot.git && cd gs-spring-boot/complete/ && ./gradlew bootRun"]
          env:
            - name: JVM_OPTS
              value: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1101 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"
        - name: envoy
          image: envoyproxy/envoy-dev:d00e4536195ec03b305a7411409910c7101aea25
          command: ["/bin/sh", "-c", "apt-get update && apt-get install -y wget && wget -O /tmp/envoy.yaml https://gist.githubusercontent.com/ggiussi/406229725fbd929a353d047fa4a6b513/raw/92462ab816145f5926c3765f6f7923915c53694d/envoy.yaml && envoy -c /tmp/envoy.yaml"]
          securityContext:
            allowPrivilegeEscalation: false
            runAsUser: 0
        - name: agent
          image: otel/opentelemetry-collector-contrib:0.61.0
          args:
          - --config=/conf/agent.yaml
          volumeMounts:
          - mountPath: /conf
            name: agent-config
          - mountPath: /data
            name: jmx-jar
          env:
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
            - name: KUBE_POD_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
            - name: KUBE_POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
        - name: sample-app
          image: springcommunity/spring-framework-petclinic:latest
          ports:
            - containerPort: 8080
      volumes:
        - name: agent-config
          configMap:
            items:
            - key: agent.yaml
              path: agent.yaml
            name: agent-config
        - name: jmx-jar
          hostPath:
            path: /mnt/kube-data/sdc-ipxe/
            type: DirectoryOrCreate

OpenTelemetry Collector configuration

receivers:
      otlp:
        protocols: 
          grpc:
      prometheus:
        config:
          scrape_configs:
            - job_name: 'otel-collector'
              scrape_interval: 5s
              metrics_path: /stats/prometheus
              static_configs:
                - targets: ['0.0.0.0:9901']
      jmx:
        jar_path: /data/otel-jmx.jar
        endpoint: localhost:1101
        target_system: jvm
        collection_interval: 10s
        # optional: the same as specifying OTLP receiver endpoint.
        #otlp:
        #  endpoint: mycollectorotlpreceiver:4317
        # username: my_jmx_username
        # determined by the environment variable value
        # password: $MY_JMX_PASSWORD
        #resource_attributes: my.attr=my.value,my.other.attr=my.other.value
        log_level: info
    processors:
      k8sattributes:
        auth_type: "serviceAccount"
        passthrough: false
        filter:
          node_from_env_var: KUBE_NODE_NAME
        extract:
          metadata:
            - k8s.pod.name
            - k8s.pod.uid
            - k8s.deployment.name
            - k8s.cluster.name
            - k8s.namespace.name
            - k8s.node.name
            - k8s.pod.start_time
        pod_association:
          - from: resource_attribute
            name: k8s.pod.uid     
      metricstransform:
        transforms:
          - include: ^envoy\_
            match_type: regexp
            action: update
            operations:
              - action: add_label
                new_label: pod_name
                new_value: ${KUBE_POD_NAME}
              - action: add_label
                new_label: test_label
                new_value: hardcoded_value
          - include: envoy_vhost_vcluster_upstream_rq_retry_overflow
            action: update
            operations:
              - action: add_label
                new_label: test_label_two
                new_value: hardcoded_value

    exporters:
      otlp:
        endpoint: "opentelemetrycollector.default.svc.cluster.local:4317"
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: []
          exporters: [otlp]
        metrics:
          receivers: [jmx, prometheus]
          processors: [metricstransform]
          exporters: [otlp]

Log output

2022-10-06T17:09:10.055Z        info    pipelines/pipelines.go:102      Receiver is starting... {"kind": "receiver", "name": "jmx", "pipeline": "metrics"}
Error: cannot start pipelines: failed to get tmp file for jmxreceiver config: open /tmp/jmx-config-4078735325.properties: no such file or directory
2022-10-06 17:09:10.055686 I | collector server run finished with error: cannot start pipelines: failed to get tmp file for jmxreceiver config: open /tmp/jmx-config-4078735325.properties: no such file or directory

Additional context

No response

github-actions[bot] commented 2 years ago

Pinging code owners: @rmfitzpatrick. See Adding Labels via Comments if you do not have permissions to add labels yourself.

gabrielgiussi commented 2 years ago

Is there any other way to export jvm metrics as a workaround until this is fixed (if it is a bug as I think it is)?

I know the agent will export jvm metrics but how to configure it to only do that? Is there a way to export jvm metrics without using the agent and just using the sdk? Of course I could instantiate some observable instruments that could fetch the metrics from the mbeans but perhaps this already exist somewhere and I can just plug it in to the OpenTelemetry I'm configuring manually.

gabrielgiussi commented 2 years ago

I guess the user 10001 doesn't have write permissions over the /tmp folder but I'm not familiarized with distroless images to understand how to give write permissions to that user. Perhaps the solution is to allow this part to be configurable instead of using os.TempDir() always, and pass a folder that user 10001 can write.

About my last question I found the artifact opentelemetry-runtime-metrics.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling commented 1 year ago

ping @rmfitzpatrick

atoulme commented 1 year ago

This is indeed a permission issue. The collector cannot write to disk. You can mount a volume specifically that is writeable, as /tmp to help alleviate the issue.

I guess this is turning into a request for enhancement to make the temporary folder configurable. Is that correct?

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

atoulme commented 1 year ago

An other approach is to point to a separate temporary folder for golang by using the TMPDIR environment variable. This will take care of your immediate issue. Closing the issue as resolved, please reopen if you need more help.

vigneshvdas commented 1 year ago

@atoulme Could you tell the value to be used for TMPDIR environment variable. Tried multiple values eg: /var/run but still getting permission denied.

Error: cannot start pipelines: failed to get tmp file for jmxreceiver config: open /var/run/jmx-config-3089779787.properties: permission denied 2023/08/18 00:23:25 collector server run finished with error: cannot start pipelines: failed to get tmp file for jmxreceiver config: open /var/run/jmx-config-3089779787.properties: permission denied

atoulme commented 1 year ago

@vigneshvdas I missed your question - please open a new issue to discuss further, sorry.

CzyerChen commented 8 months ago

@atoulme Could you tell the value to be used for TMPDIR environment variable. Tried multiple values eg: /var/run but still getting permission denied.

Error: cannot start pipelines: failed to get tmp file for jmxreceiver config: open /var/run/jmx-config-3089779787.properties: permission denied 2023/08/18 00:23:25 collector server run finished with error: cannot start pipelines: failed to get tmp file for jmxreceiver config: open /var/run/jmx-config-3089779787.properties: permission denied

Making the dir volume worked for me -v $(pwd)/tmp/:/tmp/ in 1.32.0

choucavalier commented 2 months ago

sorry to bump this but i can't figure this out. i keep getting the same permission error

collector:                                                                                                             
▎ image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:latest                
▎ command: ["--config=/etc/otel-collector-config.yml"]                                                                 
▎ environment:                                                                                                         
▎ ▍ TMPDIR: /data/tmp/                                                                                                 
▎ volumes:                                                                                                             
▎ ▍ - ./otel-collector-config.yml:/etc/otel-collector-config.yml                                                       
▎ ▍ - /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem:/etc/ssl/cert.pem:ro                                           
▎ ▍ - ./opentelemetry-jmx-metrics.jar:/opt/opentelemetry-jmx-metrics.jar                                               
▎ ▍ - ./tmp:/data/tmp                                                                                                  
▎ extra_hosts:                                                                                                         
▎ ▍ - host.docker.internal:host-gateway                                                                                
fig": {"Endpoint":"localhost:55679","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0}}
monitoring-collector-1  | 2024-09-25T15:26:53.513Z      info    extensions/extensions.go:59     Extension started.      {"kind": "extension", "name": "zpages"}                            
monitoring-collector-1  | 2024-09-25T15:26:53.513Z      info    internal/resourcedetection.go:125       began detecting resource information    {"kind": "processor", "name": "resourcedetection/system", "pipeline": "metrics"}
monitoring-collector-1  | 2024-09-25T15:26:53.513Z      info    internal/resourcedetection.go:139       detected resource information   {"kind": "processor", "name": "resourcedetection/system", "pipeline": "metrics", "resource": {"host.name":"551f6d561da7","os.type":"linux"}}
monitoring-collector-1  | 2024-09-25T15:26:53.513Z      error   graph/graph.go:425      Failed to start component       {"error": "failed to get tmp file for jmxreceiver config: open /data/tmp/jmx-config-3010133719.properties: permission denied", "type": "Receiver", "id": "jmx"}
monitoring-collector-1  | 2024-09-25T15:26:53.513Z      info    service@v0.110.0/service.go:270 Starting shutdown...                                                                       
monitoring-collector-1  | 2024-09-25T15:26:54.745Z      info    extensions/extensions.go:66     Stopping extensions...                                                                     
monitoring-collector-1  | 2024-09-25T15:26:54.746Z      info    zpagesextension@v0.110.0/zpagesextension.go:106 Unregistered zPages span processor on tracer provider   {"kind": "extension", "name": "zpages"}
monitoring-collector-1  | 2024-09-25T15:26:54.746Z      info    service@v0.110.0/service.go:284 Shutdown complete.                                                                         
monitoring-collector-1  | Error: cannot start pipelines: failed to get tmp file for jmxreceiver config: open /data/tmp/jmx-config-3010133719.properties: permission denied                 
monitoring-collector-1  | 2024/09/25 15:26:54 collector server run finished with error: cannot start pipelines: failed to get tmp file for jmxreceiver config: open /data/tmp/jmx-config-3010133719.properties: permission denied
monitoring-collector-1 exited with code 1

what should i do??