kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.53k stars 8.26k forks source link

ACME challenge fails with `nginx.ingress.kubernetes.io/permanent-redirect` #11315

Open renepupil opened 6 months ago

renepupil commented 6 months ago

What happened:

Error: Error accepting authorization: acme: authorization error for our-domain.ch: 403 urn:ietf:params:acme:error:unauthorized: During secondary validation: <ip>: Invalid response from http://our-domain.ch/.well-known/acme-challenge/NoqmY7zPrO4y7EQB9r6gmlgQUMXFJg9GNtum9FgYn8s: 403

What you expected to happen:

I would expect the ingress to configure nginx in such a way that it doesn't redirect on accessing http://our-domain.ch/.well-known/acme-challenge/NoqmY7zPrO4y7EQB9r6gmlgQUMXFJg9GNtum9FgYn8s, but this case is handled specially, so that the acme challenge is successful...

This http://our-domain.ch/.well-known/acme-challenge/NoqmY7zPrO4y7EQB9r6gmlgQUMXFJg9GNtum9FgYn8s redirects to https://foo.com after the failed acme challenge, but likely also during the acme challenge.

As we are on a managed kubernetes cluster, I can not take a look at the created nginx config, or the cluster-issuer configuration, but I strongly suggest this problem lays in the wrong handling of the permanent-redirect by the ingress nginx, as other certificates in our cluster are created without problem.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

helm.sh/chart=ingress-nginx-4.9.1

Kubernetes version (use kubectl version):

WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.15+rke2r1", GitCommit:"da6089da4974a0a180c226c9353e1921fa3c248a", GitTreeState:"clean", BuildDate:"2023-10-18T16:31:24Z", GoVersion:"go1.20.10 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1

Environment:

kubectl --namespace=rest-ingress describe challenge

Name:         ingress-cert-ppzsw-363926471-2456233727
Namespace:    rest-ingress
Labels:       <none>
Annotations:  <none>
API Version:  acme.cert-manager.io/v1
Kind:         Challenge
Metadata:
  Creation Timestamp:  2024-04-25T16:29:12Z
  Finalizers:
    finalizer.acme.cert-manager.io
  Generation:  1
  Owner References:
    API Version:           acme.cert-manager.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Order
    Name:                  ingress-cert-ppzsw-363926471
    UID:                   537efb1f-9b4d-4de8-be46-6e278a6f0a82
  Resource Version:        345155008
  UID:                     ed3f5cf8-e2f8-49d7-a902-a3b2f9d38a64
Spec:
  Authorization URL:  https://acme-v02.api.letsencrypt.org/acme/authz-v3/342936176147
  Dns Name:           our-domain.ch 
  Issuer Ref:
    Group:  cert-manager.io
    Kind:   ClusterIssuer
    Name:   letsencrypt-prod
  Key:      QHVv0l6NXBnzHCNTqmkaI9OyYJ3gIQyEXMmyjGE8U9M.Y4nXKJHjF4S08EPIIlBf0KSxnwzNB9ws_Oa1FD-00nI
  Solver:
    http01:
      Ingress:
        Pod Template:
          Metadata:
          Spec:
            Tolerations:
              Operator:  Exists
  Token:                 QHVv0l6NXBnzHCNTqmkaI9OyYJ3gIQyEXMmyjGE8U9M
  Type:                  HTTP-01
  URL:                   https://acme-v02.api.letsencrypt.org/acme/chall-v3/342936176147/oc5jsA
  Wildcard:              false
Status:
  Presented:   false
  Processing:  false
  Reason:      Error accepting authorization: acme: authorization error for ourdomain.ch: 403 urn:ietf:params:acme:error:unauthorized: During secondary validation: 85.131.147.68: Invalid response from http://our-domain.ch /.well-known/acme-challenge/QHVv0l6NXBnzHCNTqmkaI9OyYJ3gIQyEXMmyjGE8U9M: 403
  State:       invalid
Events:        <none>

How to reproduce this issue:

Apply a minimal ingress like this:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rest-ingress
  namespace: our-rest-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    acme.cert-manager.io/http01-edit-in-place: "false" # Error with or without
    cert-manager.io/issue-temporary-certificate: "true" # Error with or without
    nginx.ingress.kubernetes.io/permanent-redirect: https://foo.com
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - our-domain.ch 
    secretName: ingress-cert
  rules:
  - host: our-domain.ch 

Is there something wrong with the configuration?

k8s-ci-robot commented 6 months ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 6 months ago

/remove-kind bug /kind support /triage needs-information

renepupil commented 6 months ago

cert-manager works for tons of users already so this requires more detail

The question is still if the redirect is expected behavior and as others said in the linked issue, they expect different behavior, as well as I do, so I just think it's a bug.

the template of a new bug report asks questions that collect info for readers to look at data to base comments on

Yeah, sry, force of habit to empty the template on issue creation. I added all info I "could", meaning some details especially regarding setup I con not provide, because we are managed, also not sure if I can just expose or IPs.

I think the issue is general and not a specific version bug, but again could be layer 8 here.

look at the template and answer the questions in md format

Done that.

@longwuyuan Can you or some of your team just try the minimal configuration?

I think it could be fastly discovered that the given behaviour exist in newer kubernetes/ingress configuration...

longwuyuan commented 6 months ago

Thanks for the info. Your thought process is a bit clearer now, but in my opinion, its not super helpful for all readers. Just my opinion and others will opine differently. Basically the info that helps readers is ;

Redacted info also works as there is some insight on the state of the resources and the log of the events. But since its redirection issue, it helps if you cognitively replace FQDNs as appropriate with real behaviour, so context is retained without the real-hostname used.

I am assuming that you want to use permanent-redirect annotation, but you want it to NOT do a blanket-redirection of all the requests, that match that specific ingress rule. You want that annotation to do selective intelligent redirection by NOT-redirecting acme traffic, because you want the cert, but you want it to redirect everything else. Is this a fair view of what you expect. And you suspect this is a bug ?

renepupil commented 6 months ago

@longwuyuan Thanks for you feedback

You want that annotation to do selective intelligent redirection by NOT-redirecting acme traffic, because you want the cert, but you want it to redirect everything else. Is this a fair view of what you expect. And you suspect this is a bug ?

Exactly, this was already requested in https://github.com/kubernetes/ingress-nginx/issues/6853:

Most likely, this could be solved by ensuring that a redirect is not added if the location starts with ".well-known/acme-challenge".

A workaround was suggested by @danton721

I have done it a bit differently if it is still valid for discussion:

nginx.ingress.kubernetes.io/server-snippet: |
  if ($request_uri !~* ^/.well-known) {
    return 301 $scheme://my.site.com$request_uri;
  }

AFAIK, without this, for HTTPS, there can not be created a valid certificate, so browsers will not redirect traffic when accessing https://our-domain.ch, but complain about the missing certificate.

I would consider this a "bug" if the user has to use the .well-known workaround, at least it should be documented for nginx.ingress.kubernetes.io/permanent-redirect, that is doesn't work for HTTPS.

longwuyuan commented 6 months ago

ok. that info is helpful.

Can you explain this paradox that first you create a ingress with a specific hostname. And then instead of routing traffic to a backend pod behind that FQDN, you want to do seemingly contradictory things simultaneously on that rule being matched. Why generate a certificate for a FQDN when you are going to permanently-redirect from there with a 30X response.

renepupil commented 6 months ago

you want to do seemingly contradictory things simultaneously on that rule being matched.

That is a valid question (I will try a solution by DNS later on, but I think this will not work):

  1. We deploy tenant websites like <tenant>.our-domain.ch in a custom namespace for each tenant, within such a tenant namespaces there is an ingress routing requests to <tenant>.our-domain.ch to the backend service of that tenant. For each tenant ingress, we create a certificate using the cert manager.
  2. For the "main" domain our-domain.ch we want to redirect the traffic to our company homepage (the target of the permanent redirect) https://foo.com, as currently accessing our-domain.ch without a tenant gives 404. This works when accessing http://our-domain.ch, but when accessing https://our-domain.ch the browser is complaining about missing certificate, as we only have auto certificates for each subdomain, not a single wildcard certificate for all subdomains.

Does that clarify our case?

As long as there is not valid certificate for the "main" domain our-domain.ch, we can not redirect traffic from it, when accessing https://our-domain.ch.

But if you ask yourself why we want this, shouldn't you ask yourself why the nginx.ingress.kubernetes.io/permanent-redirect annotation exists in the first place?

It seems like you question the very existence of the annotation you maintain...

longwuyuan commented 6 months ago

Thanks for the information. Its very helpful in understanding your use case.

Please wait for comments from otners. Maybe someone has a practical solution to selectively do redirection

renepupil commented 6 months ago

I think there is some fairness to expect that permanent-redirect annotation should redirect selectively so that it can integrate with cert-manager

Thanks for acknowledgement.

can not practically integrate with every single HTTP/HTTPS related software out there.

Fair point, I wouldn't expect this to have a Prio...

@longwuyuan Can we maybe hint that nginx.ingress.kubernetes.io/permanent-redirect doesn't work out of the box with HTTPS in the docs?

longwuyuan commented 6 months ago

The summary is ;

longwuyuan commented 6 months ago

/remove-triage needs-information

rouke-broersma commented 6 months ago

@renepupil Disabling http01-edit-in-place (default) should work, because cert-manager will generate a new ingress with a more specific path instead of editing your existing ingress. This more specific path takes precedence and does not contain the redirect annotation.

You said

This http://our-domain.ch/.well-known/acme-challenge/NoqmY7zPrO4y7EQB9r6gmlgQUMXFJg9GNtum9FgYn8s redirects to https://foo.com after the failed acme challenge, but likely also during the acme challenge.

This is not a fair assumption because cert-manager may have removed the ingress by the time you tested this after the failed acme challenge. If cert-manager has already removed the acme challenge ingress due to failures, only the redirect ingress remains so of course the acme challenge url would redirect to foo.com.

It's interesting that the error mentions 403 as the response code, while entirely possible that your corporate website returns Forbidden for .well-known I would rather expect 404 instead.

Have you confirmed that the acme challenge actually hits your corporate website?

renepupil commented 6 months ago

Have you confirmed that the acme challenge actually hits your corporate website?

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: redirect-to-another
  namespace: our-redirects
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    acme.cert-manager.io/http01-ingress-class: nginx
    acme.cert-manager.io/http01-edit-in-place: "false"
    nginx.ingress.kubernetes.io/permanent-redirect: https://another-domain.ch
    nginx.ingress.kubernetes.io/from-to-www-redirect: "false"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - our-domain.ch
    secretName: ingress-cert
  rules:
  - host: our-domain.ch

So, we found out that our WAF was blocking the acme challenge cause Let's encrypt uses google services that we now allow. The ingress has an event, that the certificate could get created successfully:

Normal CreateCertificate 21s cert-manager-ingress-shim Successfully created Certificate "ingress-cert"

But when calling https://our-domain.ch in Chrome, I still get NET::ERR_CERT_COMMON_NAME_INVALID. (Redirect works from non https page http://our-domain.ch.

Subject: *.our-domain.ch Issuer: R3 Expires on: Jul 8, 2024 Current date: May 8, 2024

Maybe it's the wildcard in the certificate, but I would have expected the certificate creation to create a certificate that works without subdomain...

@rouke-broersma Any idea what the problem could be?

github-actions[bot] commented 5 months ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.