solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.08k stars 438 forks source link

ACME HTTP01 validation resp. cert-manager integration #2654

Closed mash-graz closed 4 years ago

mash-graz commented 4 years ago

could you please add a small hint for users in the integration section of the manual, how to realize HTTP01 letsencrypt validation with cert-manager and gloo?

it's a little bit disappointing, if only DNS validation is documented, and honestly it couldn't figure out a working solution for the HTTP01 variant nor find any useful gloo specific descriptions about this topic anywhere on the net.

is it at all compatible with gloos gateway mode?

mash-graz commented 4 years ago

after spending another day on this particular issue, i think, it's mostly related to the question, if gloo is indeed able to use the traditional kubernetes ingress-proxy and it's more advanced gatway-proxy at the same time, because cert-manager unfortunately utilizes only the first mentioned mechanism.

the gloo documentation states, that both handlers can be installed at the same time (e.g. here: "If you want to take advantage of greater routing capabilities of Gloo, you should look at Gloo in gateway mode, which complements Gloo’s Ingress support, i.e., you can use both modes together in a single cluster."), but in practice this doesn't seem to work on my simple k3s setup, which is running in a docker-compose envrionment, using --network=host and bootstraped only by utilizing simple helm charts and other statitic yaml fragmenets in the manifests folder.

in real world one of the two proxies always stays in a <pending> condition, and i can only reach the letsencrypt related secret handled by the ingress-proxy or the main web content published via the gateway-proxy from the outside.

maybe it's somehow caused by other troubles and insufficient setups on my side, which could be very likely concern loadbalancer related requirements.

because i disabled the default (traefik1 based) ingress solution of k3s and want to replace it by gloo, k3s doesn't deploy it's loadbalancer, too. that's in fact an intended behavior, because i could otherwise hardly use the more advanced TCP proxing features of gloo, but maybe it could interfere with other expectations and requirements of gloo.

i would be really happy, if you could tell me some hints, how to overcome this troubles and find a working solution in the context of the described minimalist setup.

if you need any additional information to debug and understand the issue, don't hesitate to ask. i'm really happy if i'm able to cooperate and find a solution, which may be useful for others as well.

thanks!

rickducott commented 4 years ago

Thanks @mash-graz for all the detail. Let me ask the team and get back.

kdorosh commented 4 years ago

i believe you want to install gloo once with the following helm values:

gateway:
  enabled: true
ingress:
  enabled: true

Which it sounds like you have already done. Now, to figure out why both aren't routeable at once, run glooctl check for me to track down any erroneous config.

Please share the results of glooctl check with each proxy pending. Additionally, which version are you on?

mash-graz commented 4 years ago

sorry for my late reply -- we are working really hard these days to maintain and expand video services for users in the local art scene, but sometimes i simply need a break and little bit of sleep ;)

i believe you want to install gloo once with the following helm values:

gateway:
 enabled: true
ingress:
 enabled: true

yes -- in fact i use this yaml-fragment for the automatized roll out , which only explicitly sets ingress, because gateway is already enabled by default:

---
apiVersion: v1
kind: Namespace
metadata:
    name: gloo-system
    labels:
        name: gloo-system
---
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: gloo
  namespace: kube-system
spec:
  chart: gloo
  targetNamespace: gloo-system
  repo: https://storage.googleapis.com/solo-public-helm
  set:
    ingress.enabled: "true"

Which it sounds like you have already done. Now, to figure out why both aren't routeable at once, run glooctl check for me to track down any erroneous config.

ms@kaffee:~/ms_kube$ glooctl check
Checking deployments... OK
Checking pods... Pod svclb-gateway-proxy-jmn6t in namespace gloo-system is not yet scheduled! Message: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.
Problems detected!

this message from glooctl sound rather plausible, but on the other hand it's more or less inevitable conflict, because cert-manager has to claim the same port and IP as the main webserver, otherwise it wouldn't be accepted to answer the challanges resp. validate certificat requests for this particula machine. unfortunately this kind of common helpers often require traditional Ingress support and can not be modified to utilize the more advanced gloo gateway mechanism.

if you see any chance, how to e.g. chain both modules in a more compatible manner or some other way to work around this flaw, i would be really happy!

btw. i also had to disable the ssl-section on the gateway side, because gloo doesn't accept a requested but not finally validated letsencrypt certificate. that's in fact another very unpleasant behavior, because it's hard to workaround this fundamental hindrance on the way to working letsencrypt based TLS handling without manual intervention by an automatized rollout.

ms@kaffee:~/ms_kube$ glooctl get proxy
+---------------+-----------+---------------+----------+
|     PROXY     | LISTENERS | VIRTUAL HOSTS |  STATUS  |
+---------------+-----------+---------------+----------+
| gateway-proxy | :::1935   | 1             | Accepted |
|               | :::8080   |               |          |
|               | :::8443   |               |          |
| ingress-proxy | :::80     | 2             | Accepted |
+---------------+-----------+---------------+----------+

ms@kaffee:~/ms_kube$ glooctl get virtualservice
+-----------------+--------------+---------+------+----------+-----------------+-------------------------------------+
| VIRTUAL SERVICE | DISPLAY NAME | DOMAINS | SSL  |  STATUS  | LISTENERPLUGINS |               ROUTES                |
+-----------------+--------------+---------+------+----------+-----------------+-------------------------------------+
| video-server    |              | *       | none | Accepted |                 | /live ->                            |
|                 |              |         |      |          |                 | gloo-system.default-video-server-80 |
|                 |              |         |      |          |                 | (upstream)                          |
+-----------------+--------------+---------+------+----------+-----------------+-------------------------------------+

ms@kaffee:~/ms_kube$ kubectl get all -n gloo-system 
NAME                                 READY   STATUS    RESTARTS   AGE
pod/svclb-gateway-proxy-jmn6t        0/2     Pending   0          2d
pod/svclb-ingress-proxy-zp5fl        2/2     Running   0          2d
pod/gloo-5957f474-6hqrg              1/1     Running   0          2d
pod/ingress-proxy-68b5c957f9-xxdkj   1/1     Running   0          2d
pod/gateway-proxy-5bb4c8f9b7-c2jpl   1/1     Running   0          2d
pod/gateway-5b48d7fc4d-jt2vf         1/1     Running   0          2d
pod/discovery-5bf9b4489f-zfsnf       1/1     Running   0          2d
pod/ingress-66496f667-pl2wn          1/1     Running   0          2d
pod/cm-acme-http-solver-29rlq        1/1     Running   0          2d
pod/cm-acme-http-solver-jvhdm        1/1     Running   0          2d

NAME                                TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                               AGE
service/gateway                     ClusterIP      10.43.204.58    <none>          443/TCP                               2d
service/gloo                        ClusterIP      10.43.24.249    <none>          9977/TCP,9988/TCP,9966/TCP,9979/TCP   2d
service/gateway-proxy               LoadBalancer   10.43.214.135   <pending>       80:31375/TCP,443:32187/TCP            2d
service/ingress-proxy               LoadBalancer   10.43.31.36     xx.xx.xx.xx   80:31554/TCP,443:30811/TCP            2d
service/cm-acme-http-solver-k4z79   NodePort       10.43.184.183   <none>          8089:31323/TCP                        2d
service/cm-acme-http-solver-67m89   NodePort       10.43.55.184    <none>          8089:30480/TCP                        2d

NAME                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/svclb-gateway-proxy   1         1         0       1            0           <none>          2d
daemonset.apps/svclb-ingress-proxy   1         1         1       1            1           <none>          2d

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/gloo            1/1     1            1           2d
deployment.apps/ingress-proxy   1/1     1            1           2d
deployment.apps/gateway-proxy   1/1     1            1           2d
deployment.apps/gateway         1/1     1            1           2d
deployment.apps/discovery       1/1     1            1           2d
deployment.apps/ingress         1/1     1            1           2d

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/gloo-5957f474              1         1         1       2d
replicaset.apps/ingress-proxy-68b5c957f9   1         1         1       2d
replicaset.apps/gateway-proxy-5bb4c8f9b7   1         1         1       2d
replicaset.apps/gateway-5b48d7fc4d         1         1         1       2d
replicaset.apps/discovery-5bf9b4489f       1         1         1       2d
replicaset.apps/ingress-66496f667          1         1         1       2d
kdorosh commented 4 years ago

@mash-graz can you drop into our slack to get more quick turnaround on feedback and debugging with our team?

you'll probably need to resolve the pod port conflict, not sure which one is giving you troubles but you can change the gateway proxy ports using helm values:

helm value type default description
gatewayProxies.NAME.podTemplate.httpPort int   HTTP port for the gateway service
gatewayProxies.NAME.podTemplate.httpsPort int   HTTPS port for the gateway service

Full list of helm values here: https://docs.solo.io/gloo/latest/reference/helm_chart_values/

rickducott commented 4 years ago

We just merged in a docs update to clarify the cert-manager docs, it should be live soon.

mash-graz commented 4 years ago

thanks -- that's really helpful!

in the meanwhile i had to switch my whole setup to traefik2.2, because i couldn't figure out a solution to this issue, but i will give a try, as soon as i find some spare time to revert this rather complex chain of necessary modifications again.

if your advice works, i would definitely prefer to utilize gloo, because it's IMHO the more efficient solution and significant better integrated in the kubernetes ecosystem (e.g. in traefik2.2 you need some 'static' endpoint configurations for port forwarding, which can not be be changed in a more delegated manner by other service manifests and without a restart...), but on the other hand it's a really seducing kind of comfort, how effortless the integrated automatic letsencrypt handling and the more open authentication middleware capabilities work on this competing other solution. it's really hard to decide, which one of both should be seen as the more suitable solution for ones specific demands?

but thanks again for your help!

linecolumn commented 4 years ago

Reviewing latest doc on cert_manager integration at https://docs.solo.io/gloo/latest/guides/integrations/cert_manager/ still not showing ACME HTTP01 support.

Is this the page or the merge went on some other doc space?

efistokl commented 4 years ago

@linecolumn they've just added it https://github.com/solo-io/gloo/commit/cdd8daac83dfc4a9922169717d8d169c91f77c67

It shows that it is possible, but it doesn't look automatic at all. Wondering if there is a way to automate this (like cert-manager works with Nginx ingress for example) so that it can be called an "integration".

lgadban commented 4 years ago

As mentioned, the docs that outline how to integrate with the HTTP01 challenge are live.

The process is admittedly a bit manual, we should have a clearer picture of the path forward based off https://github.com/solo-io/gloo/issues/2993.

lgadban commented 4 years ago

After some discussion internally, we will use https://github.com/solo-io/gloo/issues/2993 to track adding an automatic integration with ACME HTTP01 and will close this issue for now.