[Feature Request] Alloy container health check

Request

Hi,

The standard OTel collector has a health check extension that can be used in deployments to restart the container if it fails:

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/healthcheckv2extension/README.md

I understand the v2 extension is still experimental, and the v1 is being deprecated due to limitations. However, would still be useful to see if the container had failed. I haven't yet set up the Alloy integration with Grafana Cloud, but I will be doing so soon.

Is there a plan to have a native health check and ready check we can use for deploying Alloy containers?

Thanks!

Use case

As a new Grafana Cloud user, I've been setting up an observability solution with metrics, traces, and logs.

I requested to enable native histogram support in Grafana, but this seemed to break my Alloy configuration without any error why:

ts=2024-10-15T13:11:36.774541357Z level=info msg="node exited without error" node=prometheus.remote_write.metrics

This seemed to break the entire Alloy server, I wasn't able to get logs or traces due to the broken prometheus component.

Reverting the send_native_histograms to false fixed the issue:

prometheus.remote_write "metrics" {
    // Exports metrics to Prometheus backend
    // https://grafana.com/docs/alloy/latest/reference/components/prometheus/prometheus.remote_write/
    endpoint {
        url                    = env("PROMETHEUS_SERVER_URL")
        send_native_histograms = false

        tls_config {
            insecure_skip_verify = env("TLS_ENABLED") == "false"
        }

        basic_auth {
            username = env("PROMETHEUS_USERNAME")
            password = env("PROMETHEUS_PASSWORD")
        }
    }
}

However, the deployment succeeded as the container wasn't detected as unhealthy.

grafana / alloy

[Feature Request] Alloy container health check #1891

Request

Use case