open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.1k stars 2.39k forks source link

elasticsearch/log exporter dropping data + context deadline exceeded from aks to elastic cloud es #34564

Closed iamp3 closed 3 months ago

iamp3 commented 3 months ago

Component(s)

exporter/elasticsearch

What happened?

Description

my environment is hosted on the AKS 1.28.9 cluster and elasticsearch is 8.13.4 on the elastic cloud. I'm using elasticsearch exporter to send log to the elastic cloud. I'm getting errors with dropping data and context deadline exceeded. I dont see any other issue.

Steps to Reproduce

Setup AKS + Elastic Cloud integration on Azure Try to deploy otel/opentelemetry-collector-contrib:0.106.1 with configmap from OpenTelemetry Collector configuration field below.

Expected Result

elasticsearchexport can send pods/containers logs to the elastic cloud from my aks cluster

Actual Result

Exporting failed. Dropping data. image

Collector version

0.106.1

Environment information

Environment

AKS cluster version: 1.28.9

OpenTelemetry Collector configuration

receivers:
  filelog:
    include_file_path: true
    include:
      - /var/log/pods/*/*/*.log
    operators:
      - id: container-parser
        type: container

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 2000
  batch: {}

exporters:
  debug:
    verbosity: basic

  elasticsearch/log:
    endpoints: ["https://*****.azure.elastic-cloud.com:443"] 
    logs_index: test-logs-index
    timeout: 2m
    api_key: "***"
    tls:
      insecure_skip_verify: true
    discover:
      on_start: true
    flush:
      bytes: 10485760
    retry:
      max_requests: 5
    sending_queue:
      enabled: true  

service:
  pipelines:
    logs:
      receivers: [filelog]
      processors: [batch, memory_limiter]
      exporters: [debug, elasticsearch/log]

Log output

2024-08-09T13:01:46.334Z        error   elasticsearchexporter@v0.106.1/bulkindexer.go:226       bulk indexer flush error        {"kind": "exporter", "data_type": "logs", "name": "elasticsearch/log", "error": "failed to execute the request: context deadline exceeded"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.(*asyncBulkIndexerWorker).flush
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter@v0.106.1/bulkindexer.go:226
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.(*asyncBulkIndexerWorker).run
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter@v0.106.1/bulkindexer.go:211
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter.newAsyncBulkIndexer.func1
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter@v0.106.1/bulkindexer.go:109
2024-08-09T13:01:48.829Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 7, "log records": 147}
2024-08-09T13:01:49.030Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 65}
2024-08-09T13:01:56.347Z        error   exporterhelper/queue_sender.go:92       Exporting failed. Dropping data.        {"kind": "exporter", "data_type": "logs", "name": "elasticsearch/log", "error": "context deadline exceeded", "dropped_items": 24}
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1
        go.opentelemetry.io/collector/exporter@v0.106.1/exporterhelper/queue_sender.go:92
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume
        go.opentelemetry.io/collector/exporter@v0.106.1/internal/queue/bounded_memory_queue.go:52
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1
        go.opentelemetry.io/collector/exporter@v0.106.1/internal/queue/consumers.go:43


### Additional context

I tried with old image (0.96.1) and otlp/elastic exporter, but got the same dropping data issue as well.
github-actions[bot] commented 3 months ago

Pinging code owners:

carsonip commented 3 months ago
iamp3 commented 3 months ago
  • Do you see any logs in ES from the collector? Did the collector drop 100% of the logs or just a subset of it?
  • Can you double check if you have the right ES endpoint configured in the collector? If you browse the endpoint directly in browser / curl, you should get something like
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":["Basic realm=\"security\" charset=\"UTF-8\"","Bearer realm=\"security\"","ApiKey"]}}],"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":["Basic realm=\"security\" charset=\"UTF-8\"","Bearer realm=\"security\"","ApiKey"]}},"status":401}
  • Although it isn't related to the root cause, do you mind sending it without sending queue, i.e. sending_queue::enabled=false? Elasticsearch exporter bulk indexer is sending it in async, so a sending queue wouldn't help anyway. Disabling it would reduce noise in collector logs.

Looks like it was a configuration problem. Here is the final version of the config that works good for deployment with 1 replica, but on daemon-set setup periodically I see dropping data errors with context deadline exceeded msg:

        receivers:
          filelog:
            include_file_path: true
            include:
              - /var/log/pods/*/*/*.log
            exclude:
              - /var/log/pods/*/*/filelog-*.log
            operators:
              - id: container-parser
                type: container

        exporters:
          debug:
            verbosity: basic
          elasticsearch:
            endpoint: ${env:ELASTICSEARCH_URL}
            logs_index: filelog-logs
            api_key: "${env:FILELOG_SECRET_KEY}"
            tls:
              insecure_skip_verify: false
              ca_file: "/etc/ssl/certs/elk.cer"
            retry:
              enabled: true
            sending_queue:
              enabled: true
              num_consumers: 10
              queue_size: 50000
        service:
          pipelines:
            logs:
              receivers: [filelog]
              processors: []
              exporters: [debug, elasticsearch]

@carsonip Could you please tell me how I could configure dynamic creation of indexes in filelog-logs-2024-12 format for example? I found logs_dynamic_index, but couldnt find how I could create with %Y-%M timestamp

carsonip commented 3 months ago

My recommendation would be to use data streams instead of indices.

If you perform all the steps below:

  1. set
    elasticsearch:
    logs_dynamic_index:
        enabled: true
  2. remove logs_index or set it to something that complies with data stream naming convention, e.g. logs-my_application-default
  3. (optional) set mapping mode to ecs for to map fields to Elastic Common Schema (ECS)
    elasticsearch:
    mapping:
        mode: ecs

Then, the logs will then be sent to the configured data stream, and as anything under logs-*-* should have a default index template, it will be rolled over as configured. You may configure more specific index templates to control how often backing indices are created for a data stream.

iamp3 commented 3 months ago

The working configuration is written above, there was a problem with it. Also added logs_dynamic_index instead of dedicated index, for convenience. Thanks for tips @carsonip