Open computate opened 15 hours ago
@larsks or @jtriley any idea why we are seeing lots of these errors in this pod: https://console-openshift-console.apps.obs.nerc.mghpcc.org/k8s/ns/openshift-operators/pods/cert-manager-7b86568cb8-hdl69/logs
E0930 16:55:51.155876 1 sync.go:190] "propagation check failed" err="DNS record for \"api.obs.nerc.mghpcc.org\" not yet propagated" logger="cert-manager.controller" resource_name="default-api-certificate-5-2133370942-1414786744" resource_namespace="openshift-config" resource_kind="Challenge" resource_version="v1" dnsName="api.obs.nerc.mghpcc.org" type="DNS-01"
@computate no idea, but I'll see if I can figure it out. It looks as if cert-manager is attempting to create a dns record to respond to the dns-01 challenge, but that record never becomes resolveable.
@computate Just checked the cert via firefox and it looks like it updated:
Oh, sorry that's the ingress controller. API is indeed expired.
I double checked the IAM policy for OBS and it looks OK to me. The OBS cluster was configured with two domains (obs.nerc and apps.obs.nerc) instead of just one for obs.nerc.mghpcc.org
. I wonder if it's selecting the wrong zone for some reason?
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "route53:GetChange",
"Resource": "arn:aws:route53:::change/*"
},
{
"Effect": "Allow",
"Action": [
"route53:ChangeResourceRecordSets",
"route53:ListResourceRecordSets"
],
"Resource": [
"arn:aws:route53:::hostedzone/Z01584463FARBPKKZ6GLA",
"arn:aws:route53:::hostedzone/Z01587362S8JQQ08E8LB2"
]
},
{
"Effect": "Allow",
"Action": "route53:ListHostedZonesByName",
"Resource": "*"
}
]
}
Those two zones are:
Z01587362S8JQQ08E8LB2 apps.obs.nerc.mghpcc.org. 3
Z01584463FARBPKKZ6GLA obs.nerc.mghpcc.org. 5
The two-zone setup was configured before we delegated the DNS to Harvard-URC route53 instance. These days I'm configuring a single zone per cluster moving forward.
Today the obs cluster API certificate is expired.