open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.94k stars 2.29k forks source link

Local configuration is overwritten #32048

Open mike9421 opened 6 months ago

mike9421 commented 6 months ago

Component(s)

cmd/opampsupervisor

What happened?

Description

When I run opampsupervisor(the backend program is OpAMP's sample server), I noticed that EffectivConfig gets cleared, leaving only the contents of ownMetricsCfg and ExtraLocalConfig.

Steps to Reproduce

  1. Fill in the correct configuration to ensure that the opampsupervisor can connect to the backend normally
  2. Start the backend OpAMP sample server
  3. Fill in the effective configuration in local effective.yaml
  4. Start opampsupervisor
  5. Check the otel configuration displayed by the OpAMP UI or view the local file named effective.yaml.

    Expected Result

    The content I filled in to the local configuration file named effective.yaml will not be cleared after the connection is established.

    Actual Result

    The configuration content I filled in was cleared

Collector version

all

Environment information

Environment

OS: Darwin Compiler: go 1.21

OpenTelemetry Collector configuration

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: otel-collector
          scrape_interval: 10s
          static_configs:
            - targets:
              - 0.0.0.0:55149
exporters:
  otlphttp:
    endpoint: http://localhost:4318/v1/metrics
service:
  pipelines:
    metrics:
      exporters:
        - otlphttp
      receivers:
        - prometheus

Log output

No response

Additional context

I know the reason is that the backend server returns an empty remote configuration. However, I believe that the opampsupervisor needs to handle such situations to avoid the issue where configurations get overwritten due to logic errors in the backend.

github-actions[bot] commented 6 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

evan-bradley commented 5 months ago

The reason for this behavior is that the effective.yaml file is only intended to be written by the Supervisor. Can you explain more about your use case? The Supervisor specification allows for the possibility of using local Collector config along with remote config, would this work for you?

mike9421 commented 3 months ago

The reason for this behavior is that the effective.yaml file is only intended to be written by the Supervisor. Can you explain more about your use case? The Supervisor specification allows for the possibility of using local Collector config along with remote config, would this work for you?

Thanks for your reply. I know that the effective.yaml file is meant for the supervisor to write, and the supervisor retrieves remote configurations from the OpAMP server.

The phenomenon described in this issue is that when the remote configuration provided by the OpAMP server is empty (item.body is "" ), https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/5c765a3303696554000c7259a1683c92b41f052b/cmd/opampsupervisor/supervisor/supervisor.go#L716 the final configuration will become the default configuration provided by the supervisor.

Thecause of this issue is that the OpAMP server sends an empty configuration when the connection is first established. Has OTel considered implementing restrictions for this scenario?"(for example, using the saved configuration)

tigrannajaryan commented 3 months ago

Thecause of this issue is that the OpAMP server sends an empty configuration when the connection is first established.

Why is the server doing this? I don't think this is compliant with the spec. Which server implementation is this?

mike9421 commented 3 months ago

Thecause of this issue is that the OpAMP server sends an empty configuration when the connection is first established.

Why is the server doing this? I don't think this is compliant with the spec. Which server implementation is this?

@tigrannajaryan I'm using the OpAMP-go server example.

When a connection is established for the first time, since the OpAMP server does not have the OTel's RemoteConfig, it will return an empty configuration (non-nil)

tigrannajaryan commented 3 months ago

I think the server should send back the config after the agent sends the first message that contains the AgentDescription message. If that is not happening then I think it is a server bug.

mike9421 commented 3 months ago

I think the server should send back the config after the agent sends the first message that contains the AgentDescription message. If that is not happening then I think it is a server bug.

I also think it is a server error. I would like to ask if OTel needs to deal with this situation? After all, it is very important to ensure that the agent configuration is effective in remote configuration.

github-actions[bot] commented 1 month ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

mike9421 commented 1 month ago

I think the server should send back the config after the agent sends the first message that contains the AgentDescription message. If that is not happening then I think it is a server bug.

I also think it is a server error. I would like to ask if OTel needs to deal with this situation? After all, it is very important to ensure that the agent configuration is effective in remote configuration.

If this happens, will OTel judge this better to ensure the stability of OTel?

tigrannajaryan commented 1 month ago

I also think it is a server error. I would like to ask if OTel needs to deal with this situation? After all, it is very important to ensure that the agent configuration is effective in remote configuration.

I am not sure what exactly the Supervisor can do in this situation if the server misbehaves. I suggest to file a bug against the server (please include the repro steps).