open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.33k stars 1.44k forks source link

Exporting failed. Dropping data for host node logs #9776

Open ba1ajinaidu opened 6 months ago

ba1ajinaidu commented 6 months ago

Describe the bug I'm trying to export the host level logs from a kubernetes node to quickwit using otel-collector, using filelog receiver to read /var/log/messages file and exporting it. But the logs exporting fails with below error logs

2024-03-15T17:10:37.980Z    error   exporterhelper/queue_sender.go:97   Exporting failed. Dropping data.    {"kind": "exporter", "data_type": "logs", "name": "otlp", "error": "not retryable error: Permanent error: rpc error: code = Internal desc = ", "dropped_items": 2}
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1
    go.opentelemetry.io/collector/exporter@v0.96.0/exporterhelper/queue_sender.go:97
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume
    go.opentelemetry.io/collector/exporter@v0.96.0/internal/queue/bounded_memory_queue.go:57
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1
    go.opentelemetry.io/collector/exporter@v0.96.0/internal/queue/consumers.go:43

Steps to reproduce install otel-collector with the below give helm-values and install quickwit with default helm values

What did you expect to see? Logs should be exported to quickwit

What did you see instead?

logs are not being exported and are dropped

What version did you use? 0.96.0

What config did you use? used helm chart to install collector

helm-values.yml

mode: daemonset
presets:
  logsCollection:
    enabled: true
  kubernetesEvents:
    enabled: true

extraVolumes:
  - name: varlog
    hostPath:
      path: /var/log
extraVolumeMounts:
  - name: varlog
    readOnly: true
    mountPath: /var/log
initContainers:
  - name: init-fs
    image: busybox:latest
    command:
      - sh
      - "-c"
      - "chown -R 10001: /var/log"
    volumeMounts:
      - name: varlog
        mountPath: /var/log
config:
  receivers:
    filelog/host:
      include:
        - /var/log/messages
  exporters:
    otlp:
      endpoint: quickwit-indexer.quickwit.svc.cluster.local:7281
      tls:
        insecure: true
  service:
    pipelines:
      logs/system:
        receivers: [filelog/host]
        processors: [batch]
        exporters: [otlp]

Environment

Additional context

TylerHelmuth commented 6 months ago

@ba1ajinaidu it looks like only your exporter is having trouble. Check that your endpoint/port is correct and available

ba1ajinaidu commented 6 months ago

@TylerHelmuth I checked endpoint/port both of them are correct and are working for other logs, it still fails for this file.

TylerHelmuth commented 6 months ago

Oh interesting. Can you add a debug exporter with verbosity: detailed to the pipeline and isolate it to only the troubled file?

ba1ajinaidu commented 6 months ago
Flags: 0
LogRecord #1
ObservedTimestamp: 2024-03-19 03:18:16.297599282 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(Mar 19 03:18:16 ip-10-70-255-117 kubelet: E0319 03:18:16.276293    1580 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"aws-eks-nodeagent\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=aws-eks-nodeagent pod=aws-node-qn2wb_kube-system(faba3791-df15-4e0b-8a1c-1089a6f6db10)\"" pod="kube-system/aws-node-qn2wb" podUID="faba3791-df15-4e0b-8a1c-1089a6f6db10")
Attributes:
     -> log.file.name: Str(messages)
Trace ID:
Span ID:
Flags: 0
    {"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-03-19T03:18:16.399Z    error   exporterhelper/queue_sender.go:97   Exporting failed. Dropping data.    {"kind": "exporter", "data_type": "logs", "name": "otlp", "error": "not retryable error: Permanent error: rpc error: code = Internal desc = ", "dropped_items": 2}
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1
    go.opentelemetry.io/collector/exporter@v0.96.0/exporterhelper/queue_sender.go:97
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume
    go.opentelemetry.io/collector/exporter@v0.96.0/internal/queue/bounded_memory_queue.go:57
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1
    go.opentelemetry.io/collector/exporter@v0.96.0/internal/queue/consumers.go:43
2024-03-19T03:18:16.501Z    info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 4}
2024-03-19T03:18:16.501Z    info    ResourceLog #0
TylerHelmuth commented 6 months ago

And other logs flow through this pipeline to the same endpoint without issue?

Can you point your OTLP exporter to a otlp receiver in another collector (or another pipeline in this collector)? I want to make sure there isn't something in the data that isn't being handled correctly in OTLP (this is extremely unlikely)

ba1ajinaidu commented 6 months ago

And other logs flow through this pipeline to the same endpoint without issue?

Yes. Tried pointing the exporter to a receiver in a new pipeline, still seeing the same error

bhavin-kotak commented 2 hours ago

Any resolution that you found for this error? I am getting same error when the exporter is tempo