grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.32k stars 262 forks source link

Not possible to use private domains with private CA #3050

Open runderwoodcr14 opened 9 months ago

runderwoodcr14 commented 9 months ago

What went wrong?

What happened: I have a setup of grafana oss and oncall oss on which I'm not using public domains, the domains for grafana and oncall are private, I'm facing the problem that both grafana and oncall can't connect because the tls is not from a public CA but from a private CA, its not a selfsigned certificate. There seem to no way to specify in oncall and grafana to use the root cert from the private CA to validated the urls, so when I try to add the oncall configuration in grafana, in the logs I can see this: failed to verify certificate: x509: certificate signed by unknown authority and in oncall I can see this: logger=apps.grafana_plugin.helpers.client Error connecting to api instance HTTPSConnectionPool(host='grafana.xxxx.xxx', port=443): Max retries exceeded with url: /api/org (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)'))) both has been installed using the helm charts, and of course grafana is not part of the oncall deployment

What did you expect to happen: The ideal scenario will be to be able to use private domains with private CA issued certificates

How do we reproduce it?

Grafana OSS version: v10.0.2 Oncall OSS version: v1.3.37

Using the helm chart you can set: base_url: grafana-oncall.domian.local

ingress:
  # className: "nginx-internal"
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
    cert-manager.io/issuer-group: "cas-issuer.jetstack.io"
    cert-manager.io/issuer-kind: "GoogleCASClusterIssuer"
    cert-manager.io/issuer: "googlecasclusterissuer"
  tls:
    - hosts:
        - grafana-oncall.domain.local
      secretName: certificate-tls

If you go to Grafana and try to configure the plugin with the fqdn set, you'll get the result mentioned in this issue

Grafana OnCall Version

v1.3.37

Product Area

Auth, Helm

Grafana OnCall Platform?

Kubernetes

User's Browser?

No response

Anything else to add?

No response

magnujoh commented 7 months ago

Did you ever get this fixed, or does anyone have any ideas on how to properly approach this issue? Having a same issue, where we can't seem to get on-call-engine to accept a certificate from a private ca.

Our situation is semi-similar, although we're funning completely on-prem, with no access to the internet, and thus use a private ca. GOC does not enjoy, and we can't seem to find a way to override. Copying our certificates around doesn't make life any easier.

magnujoh commented 7 months ago

Did you ever get this fixed, or does anyone have any ideas on how to properly approach this issue? Having a same issue, where we can't seem to get on-call-engine to accept a certificate from a private ca.

Our situation is semi-similar, although we're funning completely on-prem, with no access to the internet, and thus use a private ca. GOC does not enjoy, and we can't seem to find a way to override. Copying our certificates around doesn't make life any easier.

To respond to myself: We solved this by placing our CAs directly where certifi reads them. I'm unsure if this is a user-error, but we seemingly could not get our internal certificates properly read unless we copied them directly to a subfolder within the python library hierarchies.

This might have cascading effects down the line, but it solved our immediate problem so testing can continue. Error-messages never indicated where certs were read from, so trial and error brought us there. For anyone who might be in a similar situation (on-prem kubernetes with access to the wider web for resources/charts/public CAs etc), our solution was to place our own .crts inside certifi's library folders.

I don't expect we'll be running this setup in full production, but at least we got around it.

zeeZ commented 6 months ago

You should be able to set the env var REQUESTS_CA_BUNDLE to point to your custom CA bundle. Otherwise, python3 -m requests.certs inside the container should give you the default CA bundle used.