Open Asarew opened 2 months ago
Pinging code owners:
cmd/opampsupervisor: @evan-bradley @atoulme @tigrannajaryan @BinaryFissionGames
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Yep, this is absolutely something that's missing right now. It's tracked here: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/21079
Looks like there was a PR opened for this but it slipped through the cracks somehow.
Component(s)
cmd/opampsupervisor
Is your feature request related to a problem? Please describe.
When passing down "invalid" remote configuration from the otel controller to the supervisor, the supervisor doesn't report back in the RemoteConfigStatus status == failed. It does report back Unhealthy in the ComponentHealth with a LastError, but relying on that seems to break the opamp specification and it doesn't specify any details.
What is happening:
Pushed down valid yaml but with invalid collector config:
```go &protobufs.AgentRemoteConfig{ Config: &protobufs.AgentConfigMap{ ConfigHash: []byte("abc123") ConfigMap: map[string]*protobufs.AgentConfigFile{ "": &protobufs.AgentConfigFile{ ContentType: "text/yaml" Body: []byte(` receivers: nop: exporters: nop: service: pipelines: traces/3: receivers: [nop] exporters: [nop] force_invalid: config: because: "of unknown fields" `) }, }, } ```First message send by supervisor has RemoteConfigStatus: (with corresponding LastRemoteConfigHash)
```go &protobufs.RemoteConfigStatus{ LastRemoteConfigHash: "abc123" Status: protobufs.RemoteConfigStatuses_RemoteConfigStatuses_APPLIED } ```receive ComponentHealth.Healthy == false every 5 seconds with ComponentHealth.LastError:
``` Agent process PID={*} exited unexpectedly, exit code=1. Will restart in a bit... ```agent.log file gets rewritten every 5 seconds with:
``` Error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s): '' has invalid keys: force_invalid 2024/08/21 13:01:42 collector server run finished with error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s): '' has invalid keys: force_invalid ```Describe the solution you'd like
Call the collector validate command before starting and the agent. if that fails report the error message back in the RemoteConfigStatus.ErrorMessage with the correct status of Failed.
Describe alternatives you've considered
"Reuse" the ComponentHealth as the RemoteConfigStatus for now, but in my opinion that's a bad implementation of the opamp spec from both the controller as the supervisor.
Additional context
No response