Closed schwesig closed 2 weeks ago
Also the ACM Observability metrics endpoint on the infra cluster has a different certificate error, where the valid dates are ok:
Starting with the second issue first:
The certificate presented by https://observatorium-api-open-cluster-management-observability.apps.nerc-ocp-infra.rc.fas.harvard.edu/api/metrics/v1/default is signed by the observability-server-ca-certificate
:
$ urlcert https://observatorium-api-open-cluster-management-observability.apps.nerc-ocp-infra.rc.fas.harvard.edu/api/metrics/v1/default | showcert
sha256 Fingerprint=4A:6C:A5:5C:69:D3:7D:3E:B8:EA:12:D1:5C:3B:D3:A2:AF:15:38:1C:43:5A:1C:23:BF:9E:76:86:9A:08:7A:03
subject=C=US, O=Red Hat, Inc., CN=observability-server-certificate
issuer=C=US, O=Red Hat, Inc., CN=observability-server-ca-certificate
notBefore=Aug 20 14:16:50 2024 GMT
notAfter=Aug 20 14:16:50 2025 GMT
X509v3 Subject Alternative Name:
DNS:observability-server-certificate, DNS:observability-observatorium-api.open-cluster-management-observability.svc.cluster.local, DNS:observatorium-api-open-cluster-management-observability.apps.nerc-ocp-infra.rc.fas.harvard.edu
That CA isn't going to be trusted by anybody, hence the "certificate issuer is unknown" error. The correct fix is probably to change the corresponding route from passthrough
to reencrypt
so that the default ingress certificate is exposed to outside clients.
Regarding the first problem, which certificate is resulting in the "certificate is expired or not yet valid" error?
The second problem sounds like ACM Observability suddenly broke with it's passthrough Route TLS handling.
kind: Route
apiVersion: route.openshift.io/v1
metadata:
name: observatorium-api
namespace: open-cluster-management-observability
uid: a7f4bf8b-eba5-456b-b9b8-71e2e1dc4802
resourceVersion: '1261594129'
creationTimestamp: '2023-11-02T13:48:51Z'
annotations:
openshift.io/host.generated: 'true'
ownerReferences:
- apiVersion: observability.open-cluster-management.io/v1beta2
kind: MultiClusterObservability
name: observability
uid: bcc31c98-3269-4ffc-bcfd-76257a9600d0
controller: true
blockOwnerDeletion: true
Another possible solution would be to configure grafana to trust the observability ca certificate.
The first one relates to dex and the Oauth configuration for Grafana in vault nerc-ocp-infra/dex/grafanas
GF_TLSCLIENTCERT
:
Validity
Not Before: 2023-11-02 13:48:52 +0000 UTC
Not After : 2024-11-01 13:48:52 +0000 UTC
The expired certificate in the oauth-client-secret
secret (in the grafana
namespace`) looks like it was generated by the observability tools:
$ k extract secret/oauth-client-secret
GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET
GF_AUTH_GENERIC_TLSCACERT
GF_AUTH_GENERIC_TLSCLIENTCERT
GF_AUTH_GENERIC_TLSCLIENTKEY
$ showcert !$
showcert GF_AUTH_GENERIC_TLSCLIENTCERT
sha256 Fingerprint=51:E6:4F:CC:F6:D7:07:17:75:4B:00:F4:37:A3:74:EE:0D:31:EB:97:57:B8:25:DD:9A:A2:49:4E:AD:70:B8:0B
subject=C=US, O=Red Hat, Inc., CN=grafana
issuer=C=US, O=Red Hat, Inc., CN=observability-client-ca-certificate
notBefore=Nov 2 13:48:52 2023 GMT
notAfter=Nov 1 13:48:52 2024 GMT
X509v3 Subject Alternative Name:
DNS:grafana
Note the issuer
entry. This suggests there must be some mechanism to regenerate this certificate.
@larsks @schwesig I updated the certs and keys described in this issue (observability-grafana-certs, observability-server-ca-certs) in nerc-ocp-obs/dex/grafanas
vault (GF_TLSCLIENTCERT
, GF_TLSCLIENTKEY
, GF_TLSCACERT
) and restarted the grafana pods to get Grafana working again!
oc --as system:admin -n open-cluster-management-observability get secret/observability-grafana-certs -o jsonpath='{.data.tls\.crt}' | base64 -d
oc --as system:admin -n open-cluster-management-observability get secret/observability-grafana-certs -o jsonpath='{.data.tls\.key}' | base64 -d
oc --as system:admin -n open-cluster-management-observability get secret/observability-server-ca-certs -o jsonpath='{.data.ca\.crt}' | base64 -d
It's still a temporary solution until:
Validity
Not Before: Aug 20 14:16:50 2024 GMT
Not After : Aug 20 14:16:50 2025 GMT
@computate @schwesig A neat command for dealing with files embedded in secrets (and configmaps) is the oc extract
command; this will extract each key to a file in your local directory:
$ oc -n open-cluster-management-observability extract secret/observability-grafana-certs
ca.crt
tls.crt
tls.key
$ ls -l
ca.crt tls.crt tls.key
Saves you from the whole jsonpath
/base64
dance.
FYI: thanks to @RH-csaggin for recommending and shout out to @dcommisso (https://github.com/dcommisso) for writing this great tool https://github.com/dcommisso/certexplorer
can we call this issue closed now? I created a follow up for next year. do we need an issue for finding a different solution?
You can close this issue @schwesig .
follow up:
https://github.com/nerc-project/operations/issues/802
Motivation
When opening a dashboard in Grafana on obs.nerc e.g. https://grafana.apps.obs.nerc.mghpcc.org/d/20241028a/ai4dd-v5?orgId=1 there is an error:
Completion Criteria
Opening the dashboards in Grafana obs, seeing the data and getting no cert error.
Description
Completion dates
Desired - 2024-11-06 Required - 2024-11-08
/CC @schwesig @computate @RH-csaggin @jtriley @larsks