jetstack / kube-lego

DEPRECATED: Automatically request certificates for Kubernetes Ingress resources from Let's Encrypt
Apache License 2.0
2.16k stars 267 forks source link

Support allow-http: "false" #173

Open pijusn opened 7 years ago

pijusn commented 7 years ago

kube-lego requires port 80 to be open to verify server reachability. However, as far as I understand #164 , it is not required by Let's Encrypt. For instance, you could use a self-signed certificate to kick-off the cluster.

You can seehttp schema hard-coded in the reachability test ( https://github.com/jetstack/kube-lego/blob/master/pkg/acme/cert_request.go#L48 ).

Expected behaviour:

Why it's an issue:

How I found it: 1 - Created a TLS secret with a self-signed certificate. 2 - Created a new Ingress with kubernetes.io/ingress.allow-http: "false". 3 - Created kube-lego deployment. 4 - Noticed errors in the kube-lego logs: authorization failed after 1m0s: reachabily test failed: wrong status code '404'. 5 - Removed kubernetes.io/ingress.allow-http: "false" from the Ingress definition. 6 - Soon errors changed to authorization failed after 1m0s: reachabily test failed: wrong status code '502'. 7 - Eventually it passed and new certificate was issued.

ahmetb commented 7 years ago

Related: #164. Which ingress are you using (nginx or gce)? I am using gce and have been able to deploy an ingress that only has port 403 open and use HTTP-01 challenge (doesn't require a valid certificate) and kube-lego was able to set up the challenge endpoint in domain:443/.well-known/....

The gotcha is that gce ingress has a bug that you have to start with allow-http:false, adding it later on doesn’t do anything.

pijusn commented 7 years ago

I used GCE Load balancer. Will later check again with a new cluster.

pijusn commented 7 years ago

Ok, so I finally tested it again. Took me a while. Still have no idea why ingress didn't create HTTPS front-end automatically. Despite that, this is gonna be a long message:

Essentially, I took GCE example from this repo and added a couple lines to the ingress. Shell I ran was (also note the comments to better understand what I did manually during the pauses):

set -ex

read -p "Press any key to start..." -sn1

gcloud config set project echo-test-166405
gcloud container clusters create echo-cluster --machine-type="g1-small" --num-nodes=1

echo "Cluster created"

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/C=LT"
kubectl create secret tls echoserver-tls --key /tmp/tls.key --cert /tmp/tls.crt

# Note: At this point I just waited a minute to make sure it's all OK.
echo "Self-generated certificate created"
read -p "Press any key to continue... " -sn1

kubectl apply -f echoserver/00-namespace.yaml
kubectl apply -f echoserver/deployment.yaml
kubectl apply -f echoserver/service.yaml
kubectl apply -f echoserver/ingress-tls.yaml

# I waited for a couple of minutes but fronted didn't get created so I created it manually and tested (will post an info snippet below)
echo "Echoserver is deployed (load balancer is expected to be created)"
read -p "Press any key to continue... " -sn1

kubectl apply -f lego/00-namespace.yaml
kubectl apply -f lego/deployment.yaml

echo "kube-lego created"

echo "Will now proxy Kubernetes UI"
kubectl proxy

# I waited a couple of minutes, didn't work then got the kube-lego logs (I will put them below).

echoserver namespace was almost unchanged, except for ingress:

apiVersion: v1
kind: Namespace
metadata:
  name: echoserver
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: echoserver
  namespace: echoserver
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: echoserver
    spec:
      containers:
      - image: gcr.io/google_containers/echoserver:1.0
        imagePullPolicy: Always
        name: echoserver
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: echoserver
  namespace: echoserver
spec:
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
  type: NodePort
  selector:
    app: echoserver
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: echoserver
  namespace: echoserver
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.allow-http: "false"
spec:
  tls:
  - hosts:
    - echo.pijusn.eu
    secretName: echoserver-tls
  rules:
  - host: echo.pijusn.eu
    http:
      paths:
      - backend:
          serviceName: echoserver
          servicePort: 80

Next is the kube-lego namespace, I only inlined configurations.

apiVersion: v1
kind: Namespace
metadata:
  name: kube-lego
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kube-lego
  namespace: kube-lego
spec:
  replicas: 1
  template:
    metadata:
      labels:
        # Required for the auto-create kube-lego-nginx service to work.
        app: kube-lego
    spec:
      containers:
      - name: kube-lego
        image: jetstack/kube-lego:0.1.3
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        env:
        - name: LEGO_EMAIL
          value: "pijus.navickas@gmail.com"
        - name: LEGO_URL
          value: "https://acme-v01.api.letsencrypt.org/directory"
        - name: LEGO_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: LEGO_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 1

Here is the way I tested whether it's actually reachable from the outside:

➜  pijusn curl https://echo.pijusn.eu
curl: (60) SSL certificate problem: Invalid certificate chain
More details here: https://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.
➜  pijusn curl https://echo.pijusn.eu -k
CLIENT VALUES:
client_address=('10.0.0.1', 65239) (10.0.0.1)
command=GET
path=/
real path=/
query=
request_version=HTTP/1.1

SERVER VALUES:
server_version=BaseHTTP/0.6
sys_version=Python/3.5.0
protocol_version=HTTP/1.0

HEADERS RECEIVED:
Accept=*/*
Connection=Keep-Alive
Host=echo.pijusn.eu
User-Agent=curl/7.51.0
Via=1.1 google
X-Cloud-Trace-Context=5918318b958ba30183510bd58efe02f5/14783633724735387003
X-Forwarded-For=85.206.179.15, 35.190.0.79
X-Forwarded-Proto=https

Finally, the logs (nothing spectacular, the same message is visible) of kube-lego after a couple of minutes (please note, I checked the logs before posting this and it still was failing):

time="2017-05-02T17:40:48Z" level=info msg="kube-lego 0.1.3-d425b293 starting" context=kubelego 
time="2017-05-02T17:40:48Z" level=info msg="connected to kubernetes api v1.5.6" context=kubelego 
time="2017-05-02T17:40:48Z" level=info msg="server listening on http://:8080/" context=acme 
time="2017-05-02T17:40:48Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx 
time="2017-05-02T17:40:48Z" level=info msg="process certificates requests for ingresses" context=kubelego 
time="2017-05-02T17:40:48Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=echoserver 
time="2017-05-02T17:40:48Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=echoserver 
time="2017-05-02T17:40:48Z" level=info msg="requesting certificate for echo.pijusn.eu" context="ingress_tls" name=echoserver namespace=echoserver 
time="2017-05-02T17:40:48Z" level=info msg="creating new secret" context=secret name=kube-lego-account namespace=kube-lego 
time="2017-05-02T17:40:49Z" level=info msg="if you don't accept the TOS (https://letsencrypt.org/documents/LE-SA-v1.1.1-August-1-2016.pdf) please exit the program now" context=acme 
time="2017-05-02T17:40:49Z" level=info msg="created an ACME account (registration url: https://acme-v01.api.letsencrypt.org/acme/reg/13769191)" context=acme 
time="2017-05-02T17:40:49Z" level=info msg="creating new secret" context=secret name=kube-lego-account namespace=kube-lego 
time="2017-05-02T17:41:51Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=echo.pijusn.eu 
time="2017-05-02T17:41:51Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego 
time="2017-05-02T17:41:51Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx 
time="2017-05-02T17:41:51Z" level=info msg="process certificates requests for ingresses" context=kubelego 
time="2017-05-02T17:41:51Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=echoserver 
time="2017-05-02T17:41:51Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=echoserver 
time="2017-05-02T17:41:51Z" level=info msg="requesting certificate for echo.pijusn.eu" context="ingress_tls" name=echoserver namespace=echoserver 
time="2017-05-02T17:43:09Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=echo.pijusn.eu 
time="2017-05-02T17:43:09Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego 
time="2017-05-02T17:43:09Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx 
time="2017-05-02T17:43:09Z" level=info msg="process certificates requests for ingresses" context=kubelego 
time="2017-05-02T17:43:09Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=echoserver 
time="2017-05-02T17:43:09Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=echoserver 
time="2017-05-02T17:43:09Z" level=info msg="requesting certificate for echo.pijusn.eu" context="ingress_tls" name=echoserver namespace=echoserver 
time="2017-05-02T17:44:38Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=echo.pijusn.eu 
time="2017-05-02T17:44:38Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego 
time="2017-05-02T17:44:38Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx 
time="2017-05-02T17:44:38Z" level=info msg="process certificates requests for ingresses" context=kubelego 
time="2017-05-02T17:44:38Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=echoserver 
time="2017-05-02T17:44:38Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=echoserver 
time="2017-05-02T17:44:38Z" level=info msg="requesting certificate for echo.pijusn.eu" context="ingress_tls" name=echoserver namespace=echoserver 
time="2017-05-02T17:45:43Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=echo.pijusn.eu 
time="2017-05-02T17:45:43Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego 
time="2017-05-02T17:45:43Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx 
time="2017-05-02T17:45:43Z" level=info msg="process certificates requests for ingresses" context=kubelego 
time="2017-05-02T17:45:43Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=echoserver 
time="2017-05-02T17:45:43Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=echoserver 
time="2017-05-02T17:45:43Z" level=info msg="requesting certificate for echo.pijusn.eu" context="ingress_tls" name=echoserver namespace=echoserver 
time="2017-05-02T17:46:54Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=echo.pijusn.eu 
time="2017-05-02T17:46:54Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego 
time="2017-05-02T17:46:54Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx 
time="2017-05-02T17:46:54Z" level=info msg="process certificates requests for ingresses" context=kubelego 
time="2017-05-02T17:46:54Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=echoserver 
time="2017-05-02T17:46:54Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=echoserver 
time="2017-05-02T17:46:54Z" level=info msg="requesting certificate for echo.pijusn.eu" context="ingress_tls" name=echoserver namespace=echoserver 

Finally, here is screenshot of load-balancer right after I tested it with CURL (kube-lego still not deployed): https://drive.google.com/file/d/0B18agqOTmBF5ckRCUmFCcC1kQ0E/view?usp=sharing

ahmetb commented 7 years ago

@pijusn I'm trying to understand why you needed to create a self-signed certificate yourself. kube-lego will obtain a certificate and save it on the secret on the ingress (even though it doesn't exist yet).

In your screenshot I'm seeing that the .well-known/* URL map is not established. It looks like something is going wrong.

pijusn commented 7 years ago

Self-signed certificate is needed to establish TLS in the first place, isn't it? According to #164 it will be ignored by the certificate issuer (meaning it doesn't matter if you use valid or invalid one) but it's still needed. And what I am implying is that kube-lego should also support it because keeping port 80 open (even if you instantly drop the connection) is very prone for security-related bugs.

About the .well-known/* - screenshot was taken before the kube-lego was deployed. I should have taken another screenshot afterwards but there is such rule echo.pijusn.eu /.well-known/acme-challenge/* k8s-be-30842--586fb62a8db47e63. It also works if port 80 is opened (which does not affect routing rules).

This all fuss is about:

The reason it doesn't work (as far as I understand) is because kube-lego makes ahttp request (port 80) only which is closed. In GCE terms "closed" means HTTP error 404:

➜  ~ curl -i http://echo.pijusn.eu
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Referrer-Policy: no-referrer
Content-Length: 1561
Date: Wed, 03 May 2017 04:05:22 GMT

<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 404 (Not Found)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>404.</b> <ins>That’s an error.</ins>
  <p>The requested URL <code>/</code> was not found on this server.  <ins>That’s all we know.</ins>

What's more, I think there would be yet another issue waiting - reachability test not ignoring invalid certificates but it's part of test-implement-repeat development cycle 😉

Looking at the source code (one I linked) I don't see how it could fallback to https because protocol is simply hard-coded. and there is no other "reachability test" around. And if kube-lego doesn't pass its own reachability test, it doesn't go any further and doesn't connect to Let's encrypt.

Do you understand what issue I am implying? You said that it should work. Could you link part of source code (test or main) which is responsible for falling back / switching to https for the reachability test? Maybe then I can figure out why it doesn't behave as expected in my case.

ahmetb commented 7 years ago

Self-signed certificate is needed to establish TLS in the first place, isn't it?

I was able to get it to work without this. Just delete the ingress and deploy with the allow-http annotation. It doesn't require additional configuration and things work out just fine.

pijusn commented 7 years ago

Are you talking about creating ingress without the annotation and then (after certificate is created), deleting the ingress and deploying with the annotation? If so, will it be able to re-new the certificate since it will still be getting 404 during the reachability test if port 80 is not open?

ahmetb commented 7 years ago

@pijusn

Are you talking about creating ingress without the annotation and then (after certificate is created)

nope just delete the ingress and when you create it make sure you deploy with the annotation.

If so, will it be able to re-new the certificate since it will still be getting 404 during the reachability test if port 80 is not open?

if allow-http:false, it knows that it should port 443 instead of 80. just give it a try.

pijusn commented 7 years ago

What is wrong with this ingress? Especially considering that it was deployed on a fresh cluster, in a fresh project? You saw the script. It's a full list of actions which is just following example in the repo.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: echoserver
  namespace: echoserver
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.allow-http: "false"
spec:
  tls:
  - hosts:
    - echo.pijusn.eu
    secretName: echoserver-tls
  rules:
  - host: echo.pijusn.eu
    http:
      paths:
      - backend:
          serviceName: echoserver
          servicePort: 80

Or this (actual ingress I used originally for another cluster; created GCE load balancer with port 443 only; functioning) ? :

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  namespace: default
  name: public-traffic
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.allow-http: "false"
    kubernetes.io/ingress.global-static-ip-name: "public-entry"
spec:
  tls:
  - hosts:
    - bb.pijusn.eu
    secretName: tls-bb-certificate
  rules:
  - host: bb.pijusn.eu
    http:
      paths:
      - backend:
          serviceName: public-gateway
          servicePort: 8000
      - path: /echo
        backend:
          serviceName: echo
          servicePort: 8010

These two ingresses succeeded to function on GCE load balancer but kube-lego was still testing reachability through HTTP and not HTTPS!

With the later one (used on actual cluster), I did actually test lot's of variations, including deleting and re-creating it many times.

Did you look at the source code I linked in the first comment? Could you comment on why it contains hard-coded http and how it would be making HTTPS requests when needed?

ahmetb commented 7 years ago

No I haven't looked at the source code, I am not familiar with the source code. I am sharing my experience of how I got it to work. When asked #164 and got my answer, I deleted my ingress and redeployed with the allow-http:false annotation and everything worked for me smoothly. (This was for a domain name that had no certs issued before.)

Your manifests look fine to me. Maybe kube-lego is caching a resolved DNS record about your domain name (#162). Try deleting the kube-lego pod so that it gets rescheduled onto another node perhaps.

pijusn commented 7 years ago

I see. I thought you are familiar with the source code. Sorry for the frustration. This is on me. I will look into the source code more later figuring out how to test and fix it.

I think what happened in your case:

Based on what I found in the source code, kube-lego does not pass it's internal test if HTTP is disabled meaning it will not refresh the certificate when it's about to expire. You should get a warning message from Let's Encrypt if that's the case, though. Keep everyone posted if that happens, will you? :wink:

ahmetb commented 7 years ago

@pijusn Perhaps you're right after all. I just tried this again and it didn’t work. I can’t even get Ingress to get a public IP with that annotation. The moment I drop allow-http annotation, I get an IP. Weird.

camsjams commented 7 years ago

So in about 70 days we'll know what happens? :smile:

Seems like you need HTTP enabled at all times just in case. Perhaps the downstream app needs to check X-Forwarded-Proto in the headers and handle things (as kube-lego will take the acme challenge requests).

munnerz commented 7 years ago

Following on from #218, I think the big issue here is with kube-lego's reachability test, and not with the underlying ACME implementation.

I think the action item from this is to make kube-lego aware of this annotation (and whatever other similar annotations there are for different ingress controllers), and switch the http scheme (https://github.com/jetstack/kube-lego/blob/master/pkg/acme/cert_request.go#L48) to https in this case, as well as allowing invalid certificates on the request.

The way we go about implementing this is non-trivial however, as it means littering the kube-lego codebase with yet another controller-specific hack. This leads us to really needing a policy on which hacks/annotations we will and won't support in kube-lego. There are already a large number of differing controller implementations, each with their own slightly different flavour of ingress specification. Until this situation is improved, I fear kube-lego will continue to bloat with difficult to maintain feature support.

bmcustodio commented 6 years ago

@munnerz is there anything that can be done, even as a workaround, to bypass this situation?

robermorales commented 6 years ago

kube-lego could allow for an config option to chose the desired reachability test to try.

lego.test: http lego.test: https lego.test: dns ...

I think it is important to support setups that need 'use-proxy-protocol' at IC level.

Morriz commented 6 years ago

FYI: I hacked it by port forwarding port 80 to 8080 on the kube lego pod. It could be scripted into a job, but it sure as hell is hacky and not nice to maintain with regards to rbac and policies.

Why can’t it try 443 without testing for cert validity first? And then fallback to 80? And maybe even try dns first? No need to configure anything imo, as the preferred order would be dns,https,http anyway. No?

tcnksm commented 6 years ago

We also set kubernetes.io/ingress.allow-http: "false" and it caused cert expiration error ... We wan t to allow only HTTPS for our domain... So it's better if we can control acme test protocol.

Jokero commented 6 years ago

Faced with the same issue after adding kubernetes.io/ingress.allow-http: "false"