open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.12k stars 2.39k forks source link

fileexporter: recreate files if they are deleted from outside otelcol #33987

Open lukasbindreiter opened 4 months ago

lukasbindreiter commented 4 months ago

Component(s)

exporter/file

What happened?

Description

If a currently open file from fileexporter is deleted, no new file with the same name will be created and all future entries will be silently discarded.

Some Background why this is needed: We are running otelcol in an offline environment (simulating a satellite, on which our service will run eventually), to write incoming traces and logs to a jsonl file. This file is in a special folder, which will be downloaded ("downlinked" in terms of satellites) to another machine regularly, and afterwards deleted. From there, we can ingest the traces/logs into a tracing platform such as jaeger. The actual download/downlink behaviour is outside of our control, we just have access to a folder that we can write files to that will be downlinked. Also this downlink happens at pretty much random times.

Steps to Reproduce

  1. Start the collector:

    otelcol --config config.yaml
    # config.yaml see below
  2. Start a service sending traces to the collector

  3. After a while, we can observe the following files:

    downlink/
    ├── logs
    │   └── logs.jsonl
    └── traces
    └── traces.jsonl

Now, delete logs.jsonl and traces.jsonl

rm downlink/logs/logs.jsonl
rm downlink/traces/traces.jsonl
  1. Our service producing traces and logs is still running
  2. But the downlink folders will be empty, no new jsonl file is ever created (and also no warnings are printed by the otelcol binary that something is discarded)
downlink/
├── logs
└── traces

Expected Result

Once a currently active exported file is deleted, and new entries were to be added, I expected otelcol to re-create the file and start adding the new entries to it.

Actual Result

No new file is ever created, and instead all newly received entries are silently discarded.

Collector version

v0.104.0

Environment information

Environment

Running on: https://developer.nvidia.com/embedded/jetson-tx2-nx (ubuntu ARM based OS "Jetson Linux")

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      http:
        endpoint: localhost:4318
processors:
  batch:

# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/fileexporter
exporters:
  file/traces:
    path: /home/dummy/downlink/traces/traces.jsonl
    rotation:
      max_megabytes: 1
      max_backups: 0 # retain all backups

  file/logs:
    path: /home/dummy/downlink/logs/logs.jsonl
    rotation:
      max_megabytes: 1
      max_backups: 0 # retain all backups

extensions:

service:
  extensions: []
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [file/traces]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [file/logs]

Log output

Jul 09 17:52:21 tbx-em-1 systemd[1]: Stopping OpenTelemetry Collector...
Jul 09 17:52:21 tbx-em-1 otelcol[7712]: 2024-07-09T17:52:21.284+0200        info        otelcol@v0.104.0/collector.go:323        Received signal from OS        {"signal": "terminated"}
Jul 09 17:52:21 tbx-em-1 otelcol[7712]: 2024-07-09T17:52:21.287+0200        info        service@v0.104.0/service.go:256        Starting shutdown...
Jul 09 17:52:21 tbx-em-1 otelcol[7712]: 2024-07-09T17:52:21.288+0200        info        extensions/extensions.go:59        Stopping extensions...
Jul 09 17:52:21 tbx-em-1 otelcol[7712]: 2024-07-09T17:52:21.289+0200        info        service@v0.104.0/service.go:270        Shutdown complete.
Jul 09 17:52:21 tbx-em-1 systemd[1]: Stopped OpenTelemetry Collector.
Jul 09 17:52:21 tbx-em-1 systemd[1]: Started OpenTelemetry Collector.
Jul 09 17:52:21 tbx-em-1 otelcol[27240]: 2024-07-09T17:52:21.486+0200        info        service@v0.104.0/service.go:115        Setting up own telemetry...
Jul 09 17:52:21 tbx-em-1 otelcol[27240]: 2024-07-09T17:52:21.486+0200        info        service@v0.104.0/telemetry.go:96        Serving metrics        {"address": ":8888", "level": "Normal"}
Jul 09 17:52:21 tbx-em-1 otelcol[27240]: 2024-07-09T17:52:21.491+0200        info        service@v0.104.0/service.go:193        Starting otelcol...        {"Version": "0.104.0", "NumCPU": 4}
Jul 09 17:52:21 tbx-em-1 otelcol[27240]: 2024-07-09T17:52:21.491+0200        info        extensions/extensions.go:34        Starting extensions...
Jul 09 17:52:21 tbx-em-1 otelcol[27240]: 2024-07-09T17:52:21.491+0200        info        otlpreceiver@v0.104.0/otlp.go:152        Starting HTTP server        {"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "localhost:4318"}
Jul 09 17:52:21 tbx-em-1 otelcol[27240]: 2024-07-09T17:52:21.492+0200        info        service@v0.104.0/service.go:219        Everything is ready. Begin running and processing data.

Additional context

No response

github-actions[bot] commented 4 months ago

Pinging code owners:

github-actions[bot] commented 2 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

atoulme commented 1 month ago

You might want to use logrotate instead of deleting files outright.