grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.29k stars 175 forks source link

Implement configuration cache for `module.*` components #332

Open jkroepke opened 11 months ago

jkroepke commented 11 months ago

Request

If the agent is running and the remote endpoint for module.http goes down, the agent continuously running with the lastest functional configuration.

However, if the agent will be restart, the agent refuse to restart, since external endpoints are not available which will be result in losing metrics.

I'm having my eye on the Remote Agent Configuration. https://github.com/grafana/agent/blob/8186196db6aa3931a0197ecdd163265db30aeac0/pkg/config/agentmanagement.go#L229-L232C1 I'm aware that is only for internal experimenting, however the concept of caching valid configuration is something that i'm looking for.

Use case

This would increase the reliability of the Agent, if the configuration endpoint is not available. At the moment, the configuration data plane must be considered as critical component. If the configuration endpoints are gone and agents are restarting for various reasons, the are out of the monitoring.

ntimo commented 11 months ago

One thing I would like to add to this, if you are using the module.git and the git repository is present in the agents data path, the agent still fails to start when the repository server is not available on startup, even tough the repository is present locally and the latest local state could be used to start.

spartan0x117 commented 10 months ago

@ntimo Could you create an issue for that? It sounds like a bug we should fix (but slightly different than this feature request)

@jkroepke This seems like a reasonable idea to me! If this doesn't end up getting picked up, I'll try to have a PR out for this soon.

ntimo commented 10 months ago

@spartan0x117 Done I created a seperate issue for this: https://github.com/grafana/agent/issues/5708