open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.05k stars 2.36k forks source link

[cmd/opampsupervisor] RemoteConfigStatus is not populated with failed on invalid config #34785

Open Asarew opened 2 months ago

Asarew commented 2 months ago

Component(s)

cmd/opampsupervisor

Is your feature request related to a problem? Please describe.

When passing down "invalid" remote configuration from the otel controller to the supervisor, the supervisor doesn't report back in the RemoteConfigStatus status == failed. It does report back Unhealthy in the ComponentHealth with a LastError, but relying on that seems to break the opamp specification and it doesn't specify any details.

What is happening:

  1. Pushed down valid yaml but with invalid collector config: ```go &protobufs.AgentRemoteConfig{ Config: &protobufs.AgentConfigMap{ ConfigHash: []byte("abc123") ConfigMap: map[string]*protobufs.AgentConfigFile{ "": &protobufs.AgentConfigFile{ ContentType: "text/yaml" Body: []byte(` receivers: nop: exporters: nop: service: pipelines: traces/3: receivers: [nop] exporters: [nop] force_invalid: config: because: "of unknown fields" `) }, }, } ```
  2. First message send by supervisor has RemoteConfigStatus: (with corresponding LastRemoteConfigHash) ```go &protobufs.RemoteConfigStatus{ LastRemoteConfigHash: "abc123" Status: protobufs.RemoteConfigStatuses_RemoteConfigStatuses_APPLIED } ```
  3. receive ComponentHealth.Healthy == false every 5 seconds with ComponentHealth.LastError: ``` Agent process PID={*} exited unexpectedly, exit code=1. Will restart in a bit... ```
  4. agent.log file gets rewritten every 5 seconds with: ``` Error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s): '' has invalid keys: force_invalid 2024/08/21 13:01:42 collector server run finished with error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s): '' has invalid keys: force_invalid ```

    Describe the solution you'd like

Call the collector validate command before starting and the agent. if that fails report the error message back in the RemoteConfigStatus.ErrorMessage with the correct status of Failed.

Describe alternatives you've considered

"Reuse" the ComponentHealth as the RemoteConfigStatus for now, but in my opinion that's a bad implementation of the opamp spec from both the controller as the supervisor.

Additional context

No response

github-actions[bot] commented 2 months ago

Pinging code owners:

BinaryFissionGames commented 2 months ago

Yep, this is absolutely something that's missing right now. It's tracked here: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/21079

Looks like there was a PR opened for this but it slipped through the cracks somehow.