Proposal: Early evaluation of logging configuration.

jkroepke commented 1 month ago

Background

Currently, if Grafana Alloy fails to start, no logs are sent to Loki, even if it's configured in the main configuration. If the main configuration contains import.http, the startup fails. Since no logs are sent, an operator has to log in to the virtual machine to diagnose the issue.

Proposal

Early Evaluation of Loki Logging Configuration

If possible, evaluate the Loki logging configuration early during the startup process to ensure logs are sent even if the startup fails.

CLI Flags for Logging Configuration

Add CLI flags to configure logging settings during startup. This approach allows operators to specify logging configurations directly via command line, ensuring logs are captured early in the startup process.

Splitting Core Configuration and Metric Processing Configuration

Split the core configuration from the metric processing configuration. This separation simplifies the configuration and isolates logging settings, making it easier to ensure that logging is properly configured and operational early in the startup process.

ptodev commented 3 weeks ago

Hi, thank you for the proposal! I'm personally not too sure if it's worth the additional complexity if we could alternatively just advise folks to run a separate Alloy cluster that is solely responsible for gathering logs.

If it can be implemented in a simple and clean way then maybe it'd be worth it, but I personally wouldn't want to add too much complexity for it.

ptodev commented 3 weeks ago

I have a couple of ideas how such a feature may work:

We could sandbox each component DAG so that if one fails, the other ones run ok. E.g. If the logging pipeline is a DAG which consists of a logging block + loki.write, then that should be isolated from the other DAGs.
There was a discussion a few months ago about having a new clustering feature with which an Alloy instance will be the sole instances the cluster to run a specific component. The discussion was prompted by the need to run only one prometheus exporter of a given type per cluster. Maybe we could have a variation of this so that logging is ran on an Alloy instance which only does logging, so that it's more isolated from others.

grafana / alloy