open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.12k stars 2.39k forks source link

otel collector not able to send metrics to elastic apm #36546

Closed madhureddy143 closed 3 hours ago

madhureddy143 commented 2 days ago

Component(s)

exporter/elasticsearch

What happened?

Description

We are attempting to use OpenTelemetry Collector with Elastic APM in a Docker Compose-based setup. The installation includes:

The stack is running without any issues.

Current Setup

We are using a model where the OpenTelemetry Java Auto-Instrumentation agent sends logs, metrics, and traces to the OpenTelemetry Collector.

Tried Different Exporter Configurations:

ElasticSearch Exporter:

Tried Transformations:

Verified Version Compatibility:

Request for Help

Could you suggest which exporter configuration we should use to ensure that metrics are exported correctly to Elastic APM? If additional transformations or configurations are required, guidance on that would also be appreciated.

Steps to Reproduce

docker compose file

services:
  setup:
    image: docker.elastic.co/elasticsearch/elasticsearch:${ELASTIC_VERSION}
    volumes:
      - certs:/usr/share/elasticsearch/config/certs
    user: "0"
    command: >
      bash -c '
        if [ x${ELASTIC_PASSWORD} == x ]; then
          echo "Set the ELASTIC_PASSWORD environment variable in the .env file";
          exit 1;
        elif [ x${KIBANA_PASSWORD} == x ]; then
          echo "Set the KIBANA_PASSWORD environment variable in the .env file";
          exit 1;
        fi;
        if [ ! -f config/certs/ca.zip ]; then
          echo "Creating CA";
          bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip;
          unzip config/certs/ca.zip -d config/certs;
        fi;
        if [ ! -f config/certs/certs.zip ]; then
          echo "Creating certs";
          echo -ne \
          "instances:\n"\
          "  - name: es01\n"\
          "    dns:\n"\
          "      - es01\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
> config/certs/instances.yml;
          bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key;
          unzip config/certs/certs.zip -d config/certs;
        fi;
        echo "Setting file permissions"
        chown -R root:root config/certs;
        find . -type d -exec chmod 750 \{\} \;;
        find . -type f -exec chmod 640 \{\} \;;
        echo "Waiting for Elasticsearch availability";
        until curl --cacert config/certs/ca/ca.crt https://es01:9200/ | grep -q "missing authentication credentials"; do sleep 30; done;
        echo "Setting kibana_system password";
        until curl -X POST --cacert config/certs/ca/ca.crt -u elastic:${ELASTIC_PASSWORD} -H "Content-Type: application/json" https://es01:9200/_security/user/kibana_system/_password -d "{\"password\":\"${KIBANA_PASSWORD}\"}" | grep -q "^{}"; do sleep 10; done;
        echo "All done!";
        if [ ! -f config/certs/ca.zip ]; then
            echo "Setting APM Server data retention";
            until curl -s -X PUT --cacert config/certs/ca/ca.crt -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" https://es01:9200/_ilm/policy/apm-rollover-15-days -d '{
              "policy": {
                "phases": {
                  "hot": {
              "actions": {
                "rollover": {
                  "max_age": "15d"
                }
              }
                  }
                }
              }
            }' | grep -q "^{}"; do sleep 10; done;
        fi;'
    healthcheck:
      test: ["CMD-SHELL", "[ -f config/certs/es01/es01.crt ]"]
      interval: 1s
      timeout: 5s
      retries: 120
    networks:
      - observability-network

  es01:
    depends_on:
      setup:
        condition: service_healthy
    image: docker.elastic.co/elasticsearch/elasticsearch:${ELASTIC_VERSION}
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - bootstrap.memory_lock=true
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.key=certs/es01/es01.key
      - xpack.security.http.ssl.certificate=certs/es01/es01.crt
      - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.http.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.key=certs/es01/es01.key
      - xpack.security.transport.ssl.certificate=certs/es01/es01.crt
      - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.license.self_generated.type=${LICENSE}

    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - certs:/usr/share/elasticsearch/config/certs
      - ${ELASTIC_DATA_PATH}:/usr/share/elasticsearch/data
      - ${ELASTIC_LOGS_PATH}:/usr/share/elasticsearch/logs
      - ./config/elasticsearch/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    ports:
      - "${ES_PORT}:9200"
    networks:
      - observability-network
    healthcheck:
      test: ["CMD-SHELL", "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200/ | grep -q 'missing authentication credentials'"]
      interval: 10s
      timeout: 10s
      retries: 20
    restart: unless-stopped

  kibana:
    image: docker.elastic.co/kibana/kibana:${ELASTIC_VERSION}
    container_name: kibana
    environment:
      - ELASTICSEARCH_HOSTS=https://es01:9200/
      - ELASTICSEARCH_USERNAME=kibana_system
      - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
      - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt
      - xpack.fleet.isAirGapped=true
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt

    ports:
      - "${KIBANA_PORT}:5601"
    depends_on:
      es01:
        condition: service_healthy
    volumes:
      - certs:/usr/share/kibana/config/certs
      - ./config/kibana/kibana.yml:/usr/share/kibana/config/kibana.yml
    networks:
      - observability-network
    healthcheck:
      test: ["CMD-SHELL", "curl -s -I http://localhost:5601/ | grep -q 'HTTP/1.1 302 Found'"]
      interval: 20s
      timeout: 10s
      retries: 3
    restart: unless-stopped

  apm-server:
    depends_on:
      kibana:
        condition: service_healthy
    image: docker.elastic.co/apm/apm-server:${ELASTIC_VERSION}
    container_name: apm-server
    cap_add: ["CHOWN", "DAC_OVERRIDE", "SETGID", "SETUID"]
    cap_drop: ["ALL"]
    volumes:
      - certs:/usr/share/apm-server/certs
      - ./config/apmserver/apm-server.docker.yml:/usr/share/apm-server/apm-server.yml:ro
    ports:
      - "8200:8200"
    user: root
    healthcheck:
      interval: 10s
      retries: 12
      test: "! apm-server test output -E output.elasticsearch.username=elastic -E output.elasticsearch.password=${ELASTIC_PASSWORD} | grep -q ERROR"
    networks:
      - observability-network

  otel-collector:
    depends_on:
      apm-server:
        condition: service_healthy
    image: otel/opentelemetry-collector-contrib:${OTEL_VERSION}
    container_name: otel-collector
    command: ["--config=/etc/otel-collector-config.yml"]
    volumes:
      - certs:/certs
      - ./config/otel-collector/otel-collector-config.yml:/etc/otel-collector-config.yml
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "55681:55681"
    networks:
      - observability-network
    restart: unless-stopped

volumes:
  certs:
    driver: local

networks:
  observability-network:
    driver: bridge
    name: observability-network
    external: true

env file for docker compose

# Elasticsearch
ELASTIC_VERSION=8.16.0
ELASTIC_PASSWORD=*******
ELASTIC_MEM_LIMIT=1g

# Kibana
KIBANA_PASSWORD=******
NODE_OPTIONS="--max-old-space-size=4096"

# OpenTelemetry
OTEL_VERSION=0.114.0

# Paths
ELASTIC_DATA_PATH=./data/elasticsearch
ELASTIC_LOGS_PATH=./logs/elasticsearch

# Network
ES_PORT=9200
KIBANA_PORT=5601
OTEL_PORT=4317

# Set to 'basic' or 'trial' to automatically start the 30-day trial
LICENSE=basic

application server otel java configuration file


# Exporter endppint and config
otel.exporter.otlp.endpoint=http://localhost:4318/
#otel.exporter.otlp.protocol=grpc
otel.exporter.otlp.protocol=http/protobuf
otel.exporter.otlp.compression=none
otel.exporter.otlp.retry.enabled=true

# Basic service name
otel.service.name=my-app #(application name has been changed just for privacy)

# All resource attributes combined under one key
otel.resource.attributes=deployment.environment=my-tech,service.version=1.0.0,service.instance.id=my-app-perftech

#Supporting observability
otel.traces.exporter=otlp
otel.metrics.exporter=otlp
otel.logs.exporter=otlp

# Set the logging level
otel.log.level=INFO
otel.javaagent.logging=simple

# Enable JVM metrics collection
otel.exporter.otlp.metrics.enabled=true
otel.exporter.otlp.metrics.temporality.preference=delta
otel.instrumentation.runtime-metrics.enabled=true
otel.jvm.metrics.enabled=true

# Specify JVM metric collection
otel.metric.export.interval=1500
otel.javaagent.extensions.jvm.enabled=true
otel.instrumentation.jvm-metrics.enabled=true
otel.instrumentation.common.enabled=true
otel.instrumentation.runtime-telemetry.enabled=true
otel.instrumentation.system-metrics.enabled=true

Expected Result

successfully send metrics to elastic APM, which in turn visualized in Kibana

Actual Result

metrics are not visualized in Kibana

image (2)

Collector version

v0.114.0

Environment information

Environment

os :- ubuntu deploymet mode:- docker image - otel/opentelemetry-collector-contrib:v0.114.0 app type - java app deployed in tomcat [ using otel java auto instrumentation jar file ]

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  memory_limiter:
    check_interval: 1s
    limit_mib: 2000

exporters:
  otlp/apm:
    endpoint: "http://apm-server:8200"
    tls:
      insecure: true
        #      insecure_skip_verify: true
        #      ca_file: certs/ca/ca.crt
    sending_queue:
      enabled: true
      num_consumers: 50
      queue_size: 500

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [otlp/apm]
    metrics:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [otlp/apm]
    logs:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [otlp/apm]

Log output

2024-11-26T09:03:53.516Z        warn    otlpexporter@v0.114.0/otlp.go:116       Partial success response        {"kind": "exporter", "data_type": "metrics", "name": "otlp/apm", "message": "unsupported data points", "dropped_data_points": 1}
2024-11-26T09:03:55.497Z        warn    otlpexporter@v0.114.0/otlp.go:116       Partial success response        {"kind": "exporter", "data_type": "metrics", "name": "otlp/apm", "message": "unsupported data points", "dropped_data_points": 1}

Additional context

No response

github-actions[bot] commented 2 days ago

Pinging code owners:

carsonip commented 2 days ago

APM Server vs elasticsearchexporter

I see that you are using APM server, which is responsible for receiving otlp and sending to Elasticsearch. You should not use elasticsearchexporter when using APM Server, as it send to Elasticsearch directly, which is also why it "complains that it’s not a valid Elastic server" when you tried to point elasticsearchexporter at APM Server.

unsupported data points

Assuming you want to use APM server, there are data points that APM server does not support now, for example, exponential histogram (see issue https://github.com/elastic/apm-server/issues/7614). Can you use a debugexporter to see if your data contain any exponential histograms?

carsonip commented 2 days ago

/label -needs-triage

madhureddy143 commented 2 days ago

hi @carsonip,

Thank you for looking into this issue. we couldn’t find an Elastic APM category and thought Elasticsearch exporter category was the closest match. As requested, please find below the debug log. Does it provide any clues about the root cause of the issue? We’ve tried several exporters, but nothing seems to work. Could you please suggest if we need to use a different exporter or apply any specific transformation configuration in the OpenTelemetry Collector to enable us to view metrics along with traces and logs in Elastic APM?

StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 7340032
Metric #7
Descriptor:
     -> Name: process.runtime.jvm.system.cpu.utilization
     -> Description: Recent cpu utilization for the whole system
     -> Unit: 1
     -> DataType: Gauge
NumberDataPoints #0
StartTimestamp: 2024-11-26 21:50:43.379033 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 0.355701
Metric #8
Descriptor:
     -> Name: process.runtime.jvm.buffer.count
     -> Description: The number of buffers in the pool
     -> Unit: {buffers}
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> pool: Str(mapped)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 0
NumberDataPoints #1
Data point attributes:
     -> pool: Str(direct)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 14
Metric #9
Descriptor:
     -> Name: process.runtime.jvm.memory.limit
     -> Description: Measure of max obtainable memory
     -> Unit: By
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> pool: Str(Metaspace)
     -> type: Str(non_heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 536870912
NumberDataPoints #1
Data point attributes:
     -> pool: Str(Compressed Class Space)
     -> type: Str(non_heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 528482304
NumberDataPoints #2
Data point attributes:
     -> pool: Str(CodeHeap 'profiled nmethods')
     -> type: Str(non_heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 122912768
NumberDataPoints #3
Data point attributes:
     -> pool: Str(G1 Old Gen)
     -> type: Str(heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 1073741824
NumberDataPoints #4
Data point attributes:
     -> pool: Str(CodeHeap 'non-profiled nmethods')
     -> type: Str(non_heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 122912768
NumberDataPoints #5
Data point attributes:
     -> pool: Str(CodeHeap 'non-nmethods')
     -> type: Str(non_heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 5832704
Metric #10
Descriptor:
     -> Name: process.runtime.jvm.buffer.usage
     -> Description: Memory that the Java virtual machine is using for this buffer pool
     -> Unit: By
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> pool: Str(mapped)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 0
NumberDataPoints #1
Data point attributes:
     -> pool: Str(direct)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 114688
Metric #11
Descriptor:
     -> Name: process.runtime.jvm.memory.usage_after_last_gc
     -> Description: Measure of memory used after the most recent garbage collection event on this pool
     -> Unit: By
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> pool: Str(G1 Eden Space)
     -> type: Str(heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 0
NumberDataPoints #1
Data point attributes:
     -> pool: Str(G1 Old Gen)
     -> type: Str(heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 0
NumberDataPoints #2
Data point attributes:
     -> pool: Str(G1 Survivor Space)
     -> type: Str(heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 7340032
Metric #12
Descriptor:
     -> Name: process.runtime.jvm.memory.committed
     -> Description: Measure of memory committed
     -> Unit: By
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> pool: Str(Metaspace)
     -> type: Str(non_heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 320471040
NumberDataPoints #1
Data point attributes:
     -> pool: Str(Compressed Class Space)
     -> type: Str(non_heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 53346304
NumberDataPoints #2
Data point attributes:
     -> pool: Str(CodeHeap 'profiled nmethods')
     -> type: Str(non_heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 62455808
NumberDataPoints #3
Data point attributes:
     -> pool: Str(G1 Eden Space)
     -> type: Str(heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 572522496
NumberDataPoints #4
Data point attributes:
     -> pool: Str(G1 Old Gen)
     -> type: Str(heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 27131904
NumberDataPoints #6
Data point attributes:
     -> pool: Str(CodeHeap 'non-nmethods')
     -> type: Str(non_heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 2555904
NumberDataPoints #7
Data point attributes:
     -> pool: Str(G1 Survivor Space)
     -> type: Str(heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 7340032
Metric #13
Descriptor:
     -> Name: process.runtime.jvm.classes.current_loaded
     -> Description: Number of classes currently loaded
     -> Unit: {class}
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 53973
Metric #14
Descriptor:
     -> Name: process.runtime.jvm.threads.count
     -> Description: Number of executing threads
     -> Unit: {thread}
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> daemon: Bool(false)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 20
NumberDataPoints #1
Data point attributes:
     -> daemon: Bool(true)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 21:50:44.879009 +0000 UTC
Value: 173
        {"kind": "exporter", "data_type": "metrics", "name": "debug"}

StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 22:08:01.379019 +0000 UTC
Value: 0
NumberDataPoints #2
Data point attributes:
     -> pool: Str(G1 Survivor Space)
     -> type: Str(heap)
StartTimestamp: 2024-11-26 10:04:25.373018 +0000 UTC
Timestamp: 2024-11-26 22:08:01.379019 +0000 UTC
Value: 13631488
Metric #12
Descriptor:
     -> Name: process.runtime.jvm.memory.committed
     -> Description: Measure of memory committed
     -> Unit: By
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> pool: Str(Metaspace)
     -> type: Str(non_heap)
madhureddy143 commented 1 day ago

update!

with the latest otel java instrumentation package i am able to send metrics to my apm end point but few errors still persist

package used:- https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/tag/v2.10.0

error :-

66e30c20-f9f8-41db-a4b8-4a797b31fbc0

pls do suggest how can i fix it.

carsonip commented 1 day ago

@madhureddy143 are there any more unsupported data points logs after updating the package?

carsonip commented 1 day ago

According to docs, to emit jvm.system.cpu.utilization, start your java application with -Dotel.instrumentation.runtime-telemetry.emit-experimental-telemetry=true. e.g. java ... -Dotel.instrumentation.runtime-telemetry.emit-experimental-telemetry=true ...

madhureddy143 commented 21 hours ago

@madhureddy143 are there any more unsupported data points logs after updating the package?

@carsonip, thank you for the help, after the package update only jvm.system.cpu.utilization is missing. will try ur suggestion below

According to docs, to emit jvm.system.cpu.utilization, start your java application with -Dotel.instrumentation.runtime-telemetry.emit-experimental-telemetry=true. e.g. java ... -Dotel.instrumentation.runtime-telemetry.emit-experimental-telemetry=true ...

madhureddy143 commented 3 hours ago

@carsonip

i re ran with the flags and i can see the CPU utilization in my Kibana dashboard, thank you for your help.

ce4c6f0a-8256-4378-8aa7-8cc586023bee