vmware-archive / kube-prod-runtime

A standard infrastructure environment for Kubernetes
Apache License 2.0
760 stars 134 forks source link

cert manager not issuing due to missing annotations #432

Closed timuckun closed 5 years ago

timuckun commented 5 years ago

I am getting this error

Not syncing ingress keycloak-server-development/keycloak-server-ingress as it does not contain necessary annotations

This is what my resource looks like

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ${APP_NAME}-ingress
  namespace: ${KUBE_NAMESPACE}
  labels:
    name: ${APP_NAME}
  annotations:
    kubernetes.io/ingress.class: "nginx"
    external-dns.alpha.kubernetes.io/hostname: ${APP_HOST_NAME}
    kubernetes.io/tls-acme: true
    certmanager.k8s.io/issuer: "letsencrypt-staging"
    certmanager.k8s.io/acme-challenge-type: http01

spec:
  tls:
    - hosts:
        - ${APP_HOST_NAME}
      secretName: ${APP_NAME}-tls
  rules:
    - host:  ${APP_HOST_NAME}
      http:
        paths:
          - path: /
            backend:
              serviceName: ${APP_NAME}-service
              servicePort: 8443

What else do I need?

anguslees commented 5 years ago

Looks correct to me, except your kubernetes.io/tls-acme value is a boolean and not a string. Try changing that true to "true". YAML has some hidden sharp spikes :(

Let me know if that change is sufficient. If not, I'll try to reproduce it locally and go code diving.

timuckun commented 5 years ago

I am still getting the same error

not syncing ingress keycloak-server-development/keycloak-server-ingress as it does not contain necessary annotations

Here is my ingress

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ${APP_NAME}-ingress
  namespace: ${KUBE_NAMESPACE}
  labels:
    name: ${APP_NAME}
  annotations:
    kubernetes.io/ingress.class: "nginx"
    external-dns.alpha.kubernetes.io/hostname: ${APP_HOST_NAME}
    kubernetes.io/tls-acme: "true"
    certmanager.k8s.io/issuer: "letsencrypt-staging"
    certmanager.k8s.io/acme-challenge-type: http01

spec:
  tls:
    - hosts:
        - ${APP_HOST_NAME}
      secretName: ${APP_NAME}-tls
  rules:
    - host:  ${APP_HOST_NAME}
      http:
        paths:
          - path: /
            backend:
              serviceName: ${APP_NAME}-service
              servicePort: 8080

I also created a certificate which I don't think should be necessary right?

timuckun commented 5 years ago

Correction: the quotes were being swalloed by my templating system. I put them back in and now it's processing my ingress. The only problem is that it's giving me another error.

Re-queuing item "keycloak-server-development/keycloak-server-ingress" due to error processing: issuer.certmanager.k8s.io "letsencrypt-staging" not found

I can see that letsencrypt-staging does exist in the kubeprod namespace. I tried kubeprod/letsencrypt-staging and that didn't work either. I also tried letsencrypt-production and that didn't work either.

This is my jsonnet

falfaro commented 5 years ago

Where's the jsonnet? :-P

timuckun commented 5 years ago
// Cluster-specific configuration
(import "https://releases.kubeprod.io/files/v1.1.0/manifests/platforms/gke.jsonnet") {
    config:: import "kubeprod-autogen.json",
    // Place your overrides here
    cert_manager+: {
        "letsencrypt_contact_email": "redacted"
    },
    prometheus+: {
        config+: {
           scrape_configs_+: {
               "app1-metrics": {
                 static_configs: [{targets: ["app1.com"]}],
               },
               "app2-metrics": {
                   metrics_path: '/api/v1/metrics/prometheus',
                   static_configs: [ {targets: ['app2.com']}],
               }
            }
        }
    }
}
anguslees commented 5 years ago

Oh duh. Try:

-   certmanager.k8s.io/issuer: "letsencrypt-staging"
+   certmanager.k8s.io/cluster-issuer: "letsencrypt-staging"
timuckun commented 5 years ago

Thanks I'll try that. Is my jsonnet OK? Is that where I put my email address?

timuckun commented 5 years ago

Hi.

The cluster issuer annotation worked but.....

I kept getting errors "self check failed for domain". I did several removals and redeploys and kept getting the same error. After some googling somebody suggested that kubectl delete cert-name-tls and that caused cert manager to recreate the cert and I stopped getting the error. The odd thing is that I had deleted the cert before so I don't know why it worked the second time except that maybe there is a timing issue someplace. I hope that's a one time thing but I am going to try deploying kube prod into another GKE cluster and test it again.

This is the final ingress that works.

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ${APP_NAME}-ingress
  namespace: ${KUBE_NAMESPACE}
  labels:
    name: ${APP_NAME}
  annotations:
    kubernetes.io/ingress.class: "nginx"
    deploy_date: \"$(date)\"
    external-dns.alpha.kubernetes.io/hostname: ${APP_HOST_NAME}
    kubernetes.io/tls-acme: "true"
    certmanager.k8s.io/cluster-issuer: "letsencrypt-staging"

spec:
  tls:
    - hosts:
        - ${APP_HOST_NAME}
      secretName: ${APP_NAME}-tls
  rules:
    - host:  ${APP_HOST_NAME}
      http:
        paths:
          - path: /
            backend:
              serviceName: ${APP_NAME}-service
              servicePort: 8080

if you put the external dns annotation in your deploy you get the wrong IP address. Only put it in the ingress.

BTW

The DNS service is throwing errors which are not errors.

{
 insertId:  "gv2hyeeipimtqmceq"  
 logName:  "projects/project-name/logs/stderr"  
 metadata: {…}  
 receiveTimestamp:  "2019-03-17T10:56:33.987591958Z"  
 resource: {…}  
 severity:  "ERROR"  
 textPayload:  "time="2019-03-17T10:56:28Z" level=info msg="All records are already up to date"
"  
 timestamp:  "2019-03-17T10:56:28.404982379Z"  
}
anguslees commented 5 years ago

Thanks for the followup. I'm glad we got things working in the end.

somebody suggested that kubectl delete cert-name-tls and that caused cert manager to recreate the cert and I stopped getting the error

cert-manager won't recreate the certificate if it finds one that already exists. In particular, this means you need to manually delete the Certificate object if you change hostnames, so cert-manager will recreate it and get it signed under the correct new name. (Ditto for anything else that affects the Certificate contents, like changing issuer).

You should get some progress/status/error messages as events on the Certificate object itself (kubectl describe certificate $name), which might help with future debugging.

I agree the whole cert-manager/ingress/loadbalancer stack is deep and therefore difficult to debug when it doesn't Just Work. While the incident is still fresh, I would appreciate you looking over the relevant sections of the troubleshooting guide and suggesting anything that is missing/incorrect that will help the-next-you work through a similar issue.

BTW. The DNS service is throwing errors which are not errors.

Thanks. At the moment we're just collecting cert-manager stderr and tagging it with severity=ERROR, which is sometimes incorrect. I'll file a separate bug to improve our fluentd tagging to use that level=info information correctly.


Closing, since I think this issue has been resolved. Please reopen if I missed something.