tnozicka / openshift-acme

ACME Controller for OpenShift and Kubernetes Cluster. (Supports e.g. Let's Encrypt)
Apache License 2.0
319 stars 116 forks source link

Not getting valid certificates #90

Closed badri closed 5 years ago

badri commented 5 years ago

I'm using OpenShift Origin 3.11.

$ oc version
oc v3.11.0+62803d0-1
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://openshift-master:8443
openshift v3.11.0+d0c29df-98
kubernetes v1.11.0+d4cacc0

I installed the cluster-wise staging configuration.

$ oc create -fhttps://raw.githubusercontent.com/tnozicka/openshift-acme/master/deploy/letsencrypt-staging/cluster-wide/{clusterrole,serviceaccount,imagestream,deployment}.yaml
$ oc adm policy add-cluster-role-to-user openshift-acme -z openshift-acme

And provisioned a new route.

$ oc create route edge dev.example.com --hostname=dev.example.com --service=django-psql-persistent
$ oc patch route dev.example.com -p '{"metadata":{"annotations":{"kubernetes.io/tls-acme":"true"}}}'
$ oc logs deploy/openshift-acme
I0228 13:22:01.316056       1 http.go:78] url = 'dev.example.com/.well-known/acme-challenge/_IT1tp3NR2xxSBdmwiMaqrRMUpYSciEkyMyLkJx-nxU'; found = 'true'
I0228 13:22:01.375358       1 http.go:78] url = 'dev.example.com/.well-known/acme-challenge/_IT1tp3NR2xxSBdmwiMaqrRMUpYSciEkyMyLkJx-nxU'; found = 'true'
I0228 13:22:01.678222       1 http.go:78] url = 'dev.example.com/.well-known/acme-challenge/_IT1tp3NR2xxSBdmwiMaqrRMUpYSciEkyMyLkJx-nxU'; found = 'true'
I0228 13:22:01.805798       1 http.go:78] url = 'dev.example.com/.well-known/acme-challenge/_IT1tp3NR2xxSBdmwiMaqrRMUpYSciEkyMyLkJx-nxU'; found = 'true'
I0228 13:22:05.722412       1 route.go:385] Started syncing Route "django/dev.example.com" (2019-02-28 13:22:05.722387474 +0000 UTC m=+556.127475736)
I0228 13:22:05.794171       1 route.go:483] Route "django/dev.example.com": authorization state is "valid"
I0228 13:22:05.794214       1 route.go:515] Authorization "https://acme-staging.api.letsencrypt.org/acme/authz/bWbezanTPvDk4UhVHQhgMsAd5yw5SqYTafD2aJOaLVg" for Route django/dev.example.com successfully validated
I0228 13:22:40.505679       1 route.go:539] Route "django/dev.example.com" - created certificate available at https://acme-staging.api.letsencrypt.org/acme/cert/fac87b015d80e3e22e891fb14c56b7776838
I0228 13:22:40.521250       1 event.go:218] Event(v1.ObjectReference{Kind:"Route", Namespace:"django", Name:"dev.example.com", UID:"7ab3ba87-3b5b-11e9-accd-3e6860b9ef67", APIVersion:"route.openshift.io", ResourceVersion:"6694", FieldPath:""}): type: 'Normal' reason: 'AcmeCertificateProvisioned' Successfully provided new certificate
I0228 13:22:40.530926       1 route.go:189] Updating Route from django/dev.example.com UID=7ab3ba87-3b5b-11e9-accd-3e6860b9ef67 RV=6603 to django/dev.example.com UID=7ab3ba87-3b5b-11e9-accd-3e6860b9ef67,RV=6694
I0228 13:22:40.569358       1 route.go:387] Finished syncing Route "django/dev.example.com" (34.846954946s)
I0228 13:22:40.569419       1 route.go:385] Started syncing Route "django/dev.example.com" (2019-02-28 13:22:40.569412938 +0000 UTC m=+590.974501173)
I0228 13:22:40.584717       1 route.go:387] Finished syncing Route "django/dev.example.com" (15.290161ms)
I0228 13:22:40.584763       1 route.go:716] Error syncing Route django/dev.example.com: failed to sync secret for Route django/dev.example.com: failed to create Secret django/dev.example.com with TLS data: secrets "dev.example.com" already exists
I0228 13:22:40.590443       1 route.go:385] Started syncing Route "django/dev.example.com" (2019-02-28 13:22:40.590416573 +0000 UTC m=+590.995504846)
I0228 13:22:40.634450       1 route.go:237] Acme Secret django/dev.example.com updated.
I0228 13:22:40.638750       1 route.go:387] Finished syncing Route "django/dev.example.com" (48.322855ms)
I0228 13:22:40.638795       1 route.go:385] Started syncing Route "django/dev.example.com" (2019-02-28 13:22:40.638789079 +0000 UTC m=+591.043877335)
I0228 13:22:40.643022       1 route.go:387] Finished syncing Route "django/dev.example.com" (4.223768ms)
$ oc get events -n django | grep -i "AcmeCertificateProvisioned" 
8m          8m           1         dev.example.com.158789d8b68a6c71               Route                                                             Normal    AcmeCertificateProvisioned   openshift-acme-controller     Successfully provided new certificate

But I still don't get a valid certificate.

curl https://dev.example.com
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Am I missing any step?

thibserot commented 5 years ago

I have the same issue with openshift v3.11:

oc v1.5.0+031cbe4 kubernetes v1.5.2+43a9be4 features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://openshift:8443 openshift v3.11.0+92b7c41-132 kubernetes v1.11.0+d4cacc0

The issue seems to be on the issuer:

Signature Algorithm: sha256WithRSAEncryption
    Issuer: CN=openshift-signer@1552691887
    Validity
        Not Before: Mar 15 23:30:37 2019 GMT
        Not After : Mar 14 23:30:38 2021 GMT

The issuer should be let's encrypt....

tnozicka commented 5 years ago

@badri the certs from staging endpoint are still invalid, unless you use live - https://github.com/tnozicka/openshift-acme#staging

tnozicka commented 5 years ago

@thibserot I'd need more information to help you sort it out. e.g. the url (or part of it) that you are trying to reach, oc get route -o yaml (with keys redacted), acme controller logs

badri commented 5 years ago

@tnozicka perfect. Thanks for your response man. I'll try with live endpoint and update here.

thibserot commented 5 years ago

@tnozicka I finally managed to get it running by destroying my cluster and re-creating...Some secrets must have been laying around and prevented the update of certificate to happen! Thanks for the reply

thibserot commented 5 years ago

Actually I may have spoken a bit too fast as my domain is still using invalid certificate...Is there a way to force the re-issue of the certificate from your pod?

thibserot commented 5 years ago
oc get route landing-com -o yaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/tls-acme-awaiting-authorization-owner: https://acme-v01.api.letsencrypt.org/acme/reg/xxxxxxx
  creationTimestamp: 2019-03-23T23:26:15Z
  labels:
    app: landing
    template: nginx-https
  name: landing-com
  namespace: hex-production
  resourceVersion: "1159358"
  selfLink: /apis/route.openshift.io/v1/namespaces/hexagone-production/routes/landing-com
  uid: xxxxxxxxxxxx
spec:
  host: xxxxi.com
  port:
    targetPort: 8081-tcp
  tls:
    certificate: |
      -----BEGIN CERTIFICATE-----
      xxxxxxxxxxxxxxxxxxxxxxxxx
      -----END CERTIFICATE-----
      -----BEGIN CERTIFICATE-----
      xxxxxxxxxxxxxxxxxxxxxxxxxx
      -----END CERTIFICATE-----
    insecureEdgeTerminationPolicy: Redirect
    key: |
      -----BEGIN RSA PRIVATE KEY-----
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      -----END RSA PRIVATE KEY-----
    termination: edge
  to:
    kind: Service
    name: landing
    weight: 100
  wildcardPolicy: None
status:
  ingress:
  - conditions:
    - lastTransitionTime: 2019-03-23T23:26:15Z
      status: "True"
      type: Admitted
    host: hexagone-ai.com
    routerName: router
    wildcardPolicy: None

And here are the acme logs:

Started syncing Route "hexagone-production/landing-com" (2019-03-23 23:29:10.254080212 +0000 UTC m=+548219.840523563) Created authorization "https://acme-v01.api.letsencrypt.org/acme/authz/KKtVlSXcHEu7CF_qkuCNkYPbQqiugXfZrRQ-__4IF_o" for Route hexagone-production/landing-com Authorization "https://acme-v01.api.letsencrypt.org/acme/authz/KKtVlSXcHEu7CF_qkuCNkYPbQqiugXfZrRQ-__4IF_o" for Route hexagone-production/landing-com is already valid Finished syncing Route "hexagone-production/landing-com" (491.888711ms) Updating Route from hexagone-production/landing-com UID=0de0c90f-4dc3-11e9-8528-002590265614 RV=1159333 to hexagone-production/landing-com UID=0de0c90f-4dc3-11e9-8528-002590265614,RV=1159335 Started syncing Route "hexagone-production/landing-com" (2019-03-23 23:29:10.746044918 +0000 UTC m=+548220.332488291) Route "hexagone-production/landing-com": authorization state is "valid" Authorization "https://acme-v01.api.letsencrypt.org/acme/authz/KKtVlSXcHEu7CF_qkuCNkYPbQqiugXfZrRQ-__4IF_o" for Route hexagone-production/landing-com successfully validated Route "hexagone-production/landing-com" - created certificate available at https://acme-v01.api.letsencrypt.org/acme/cert/03af73ae0d9a07c248a971c3183f029fe98c Updating Route from hexagone-production/landing-com UID=0de0c90f-4dc3-11e9-8528-002590265614 RV=1159335 to hexagone-production/landing-com UID=0de0c90f-4dc3-11e9-8528-002590265614,RV=1159358 Event(v1.ObjectReference{Kind:"Route", Namespace:"hexagone-production", Name:"landing-com", UID:"0de0c90f-4dc3-11e9-8528-002590265614", APIVersion:"route.openshift.io", ResourceVersion:"1159358", FieldPath:""}): type: 'Normal' reason: 'AcmeCertificateProvisioned' Successfully provided new certificate Finished syncing Route "hexagone-production/landing-com" (10.524994928s) Started syncing Route "hexagone-production/landing-com" (2019-03-23 23:29:21.27108878 +0000 UTC m=+548230.857532136) Finished syncing Route "hexagone-production/landing-com" (95.837757ms) 86] github.com/tnozicka/openshift-acme/pkg/cmd/cmd.go:257: forcing resync Updating Route from hexagone-production/landing-com UID=0de0c90f-4dc3-11e9-8528-002590265614 RV=1159358 to hexagone-production/landing-com UID=0de0c90f-4dc3-11e9-8528-002590265614,RV=1159358 Started syncing Route "hexagone-production/landing-com" (2019-03-23 23:31:19.098073897 +0000 UTC m=+548348.684517242) Finished syncing Route "hexagone-production/landing-com" (2.807602ms)

I don't see any error in the logs...The only issue is that the certifiacte isn't signed by letsencrypt but by openshift....

And I believe my error was the first time i tried to run the acme pod i already had setup the route with the annotation but i hadn't properly setup permissions on the serviceaccount and I believe that's when the bad certificate was created...Since then I've been able to properly secure numerous route without issue...Just this one that stays stuck even after a full re-install from the openshift cluster...So I believe that force re-issue of the certificate could solve the problem...Just need to figure out how I can do it from the ACME Pod!

Any hints welcome! Cheers Thibault

badri commented 5 years ago

@tnozicka I can confirm that this now works with openshift 3.11 cluster.

tnozicka commented 5 years ago

I don't see any error in the logs...The only issue is that the certifiacte isn't signed by letsencrypt but by openshift....

if there is no cert in the tls section you get the default one from the router

And I believe my error was the first time i tried to run the acme pod i already had setup the route with the annotation but i hadn't properly setup permissions on the serviceaccount and I believe that's when the bad certificate was created...Since then I've been able to properly secure numerous route without issue...Just this one that stays stuck even after a full re-install from the openshift cluster...So I believe that force re-issue of the certificate could solve the problem...Just need to figure out how I can do it from the ACME Pod!

you can check the certificate using openssl (openssl s_client -connect <domain>:443)

I guess it could be still provisioned from staging.

To get a refresh, delete the tls certs from the route and kubernetes.io/tls-acme-awaiting-authorization-owner annotation.

tnozicka commented 5 years ago

glad it works for you now ;)