open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.47k stars 1.47k forks source link

[configretry] max_elapsed_time cannot be set to `0` anymore because of new validation logic #9641

Closed rnishtala-sumo closed 8 months ago

rnishtala-sumo commented 8 months ago

Describe the bug According to this change in v0.95.0, max_elapsed_time cannot be set to 0 which was a previously allowed value to attempt retry indefinitely.

Steps to reproduce To prevent the sumologic exporter form ever dropping data that was successfully queued, set retry_on_failure.max_elapsed_time to 0.

What did you expect to see? setting retry_on_failure.max_elapsed_time to 0 should be allowed to retry indefinitely

What did you see instead? errors.New("'max_elapsed_time' must not be less than 'initial_interval'")

What version did you use? v0.95.0

What config did you use?

exporters:
    sumologic:
        retry_on_failure:
            max_elapsed_time: 0
        sending_queue:
            enabled: true
            storage: file_storage
atoulme commented 8 months ago

That behavior is not defined in the docs: https://github.com/open-telemetry/opentelemetry-collector/blame/main/exporter/exporterhelper/README.md#L15 I do see in the godoc: https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configretry/backoff.go#L41

It'd be good to fix the README as part of this fix.

TylerHelmuth commented 8 months ago

@rnishtala-sumo I am confused why the code you linked is the culprit. In your example config aren't MaxElapsedTime and InitialInterval set to 0? Also, for the config to take effect you need to set retry_on_failure.enabled = true.

rnishtala-sumo commented 8 months ago

@TylerHelmuth, all the other params are their default values. retry_on_failure.enabled is true by default and InitialInterval is 5s. Given the defaults, prior to the change in v0.95.0 one could set MaxElapsedTime to 0 to retry indefinitely.

rnishtala-sumo commented 8 months ago

Below is the entire config, that worked in prior releases (< 0.95.0)

exporters:
    sumologic:
        retry_on_failure:
            max_elapsed_time: 0
        sending_queue:
            enabled: true
            storage: file_storage
processors:
    batch:
        send_batch_size: 1024
        timeout: 1s
    memory_limiter:
        check_interval: 5s
        limit_percentage: 75
        spike_limit_percentage: 20
    resource/e8c32e52-d626-4ea9-829a-c60d3c2a8800:
        attributes:
            - action: insert
              key: _source
              value: custom-test
            - action: insert
              key: _contentType
              value: OpenTelemetry
            - action: insert
              key: _sourceCategory
              value: otel/localfile
    resourcedetection/system:
        detectors:
            - system
        system:
            hostname_sources:
                - dns
                - os
receivers:
    filelog/e8c32e52-d626-4ea9-829a-c60d3c2a8800:
        encoding: utf-8
        include:
            - /tmp/test.log
            - /tmp/test1.log
            - /tmp/test2.log
        include_file_name: false
        include_file_path: true
        operators:
            - from: attributes["log.file.path"]
              to: resource["log.file.path"]
              type: move
            - from: resource["log.file.path"]
              to: resource["_sourceName"]
              type: copy
        start_at: end
        storage: file_storage
service:
    pipelines:
        logs/localfilesource/e8c32e52-d626-4ea9-829a-c60d3c2a8800:
            exporters:
                - sumologic
            processors:
                - memory_limiter
                - resourcedetection/system
                - batch
                - resource/e8c32e52-d626-4ea9-829a-c60d3c2a8800
            receivers:
                - filelog/e8c32e52-d626-4ea9-829a-c60d3c2a8800

In 0.95.0, we see the following error

Error: invalid configuration: exporters::sumologic: 'max_elapsed_time' must not be less than 'initial_interval'
2024/02/26 17:57:44 collector server run finished with error: invalid configuration: exporters::sumologic: 'max_elapsed_time' must not be less than 'initial_interval'
atoulme commented 8 months ago

@rnishtala-sumo would you like to propose a patch? Should the issue be assigned to you? Please let us know.

rnishtala-sumo commented 8 months ago

Yes, I can work on the issue and a patch for this would definitely help us upgrade to the next v0.95.x release.

rnishtala-sumo commented 8 months ago

Raised the following PR for review

shashitessell commented 5 months ago

@rnishtala-sumo in your filelog reciever config , you have specified start_at: end and you have given storage as well . Can you please elaborate me in detail . that if otel collector restarts from where it will start reading file again .

receivers:
    filelog/e8c32e52-d626-4ea9-829a-c60d3c2a8800:
        encoding: utf-8
        include:
            - /tmp/test.log
            - /tmp/test1.log
            - /tmp/test2.log
        include_file_name: false
        include_file_path: true
        operators:
            - from: attributes["log.file.path"]
              to: resource["log.file.path"]
              type: move
            - from: resource["log.file.path"]
              to: resource["_sourceName"]
              type: copy
        start_at: end
        storage: file_storage