grafana / grafana-operator

An operator for Grafana that installs and manages Grafana instances, Dashboards and Datasources through Kubernetes/OpenShift CRs
https://grafana.github.io/grafana-operator/
Apache License 2.0
919 stars 397 forks source link

[Bug] Since 5.13.0 - Grafana Operator cannot manage TLS protected internal Grafanas #1675

Closed diranged closed 2 weeks ago

diranged commented 2 months ago

Describe the bug It seems in our environment (where we pass in TLS certs to our Grafana service so that it's encrypted end to end) that the Grafana Operator stopped being able to connect with our Grafana instances after https://github.com/grafana/grafana-operator/pull/1628 was shipped in 5.13.0. We get the following reconciliation errors:

    "status": {
        "hash": "9250f003846c19a973bd035ce560da23aaad2fdc855a951d63c99d75b7c40a03",
        "lastMessage": "fetching data sources: Get \"https://grafana-app-service.grafana:3000/api/datasources\": tls: failed to verify certificate: x509: certificate signed by unknown authority",
        "lastResync": "2024-09-12T17:50:18Z",
        "uid": "loki"
    }

Logs:

2024-09-13T20:58:43Z    ERROR   GrafanaDatasourceReconciler error reconciling datasource    {"controller": "grafanadatasource", "controllerGroup": "grafana.integreatly.org", "controllerKind": "GrafanaDatasource", "GrafanaDatasource": {"name":"grafana-app-root","namespace":"grafana"}, "namespace": "grafana", "name": "grafana-app-root", "reconcileID": "c1671f2b-3f25-436b-a479-7b2fe96edbdf", "datasource": "grafana-app-root", "grafana": "grafana-app", "error": "fetching data sources: Get \"https://grafana-app-service.grafana:3000/api/datasources\": tls: failed to verify certificate: x509: certificate signed by unknown authority"}
github.com/grafana/grafana-operator/v5/controllers.(*GrafanaDatasourceReconciler).Reconcile
    github.com/grafana/grafana-operator/v5/controllers/datasource_controller.go:252
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
    sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:222

Version v5.13.0

To Reproduce

Create a Grafana with a TLS config...

config:
...
  server:
    ca_cert: /certs/ca.crt
    cert_file: /certs/tls.crt
    cert_key: /certs/tls.key
    domain: ....com
    protocol: https
    root_url: https://....com
deployment:
  spec:
    template:
      spec:
        containers:
          - volumeMounts:
              - mountPath: /certs/ca.crt
                name: ca
                readOnly: true
                subPath: ca.crt
              - mountPath: /certs/tls.crt
                name: tls
                readOnly: true
                subPath: tls.crt
              - mountPath: /certs/tls.key
                name: tls
                readOnly: true
                subPath: tls.key
        volumes:
          - name: ca
            secret:
              defaultMode: 420
              optional: false
              secretName: grafana-app-cacert
          - name: tls
            secret:
              defaultMode: 420
              optional: false
              secretName: grafana-app-tls
theSuess commented 2 months ago

Thanks for reporting this. The TLS settings introduced in #1628 should have only affected external instances, but it had the unintended side effect of requiring complete certificate chains on all instances.

As a workaround, you can try mounting your ca.crt in the operator manager container under /etc/ssl/certs/ca-certificates.crt until we have a fix ready.

diranged commented 2 months ago

Thanks - for now we just rolled the operator upgrade back...

brogger71 commented 1 month ago

is that issue fixed in v5.14.0 ?

Thanks

diranged commented 1 month ago

is that issue fixed in v5.14.0 ?

Thanks

No - the fix in https://github.com/grafana/grafana-operator/pull/1690 is still unmerged.