Impossible to change hostname<->ingress mapping without downtime due to admission webhook

airhorns commented 10 months ago

What happened:

I would like to break up one multi-host ingress into many single host ingresses so I can adjust the annotations between them independently.

I have an ingress like this:

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: main
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  tls:
    - secretName: foo
      hosts:
        - foo.company.com
    - secretName: bar
      hosts:
        - bar.company.com
  rules:
    - host: foo.company.com
      http:
        paths:
          - pathType: ImplementationSpecific
            backend:
              service:
                name: api
                port:
                  number: 80
    - host: bar.company.com
      http:
        paths:
          - pathType: ImplementationSpecific
            backend:
              service:
                name: api
                port:
                  number: 80

I am trying to add a second ingress like this:

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: foo
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/proxy-read-timeout: "960"
spec:
  tls:
    - secretName: foo
      hosts:
        - foo.company.com
  rules:
    - host: foo.company.com
      http:
        paths:
          - pathType: ImplementationSpecific
            backend:
              service:
                name: api
                port:
                  number: 80

The second ingress has a different annotation than the main one that I want to apply to foo only, and not to bar. For this reason, I need to create two different ingresses, one for foo and one for bar. I can't currently, as I get a message like this when trying to create the second one:

Error from server (BadRequest): error when creating "deploy/gadget-production/test-custom-ingress.yaml": admission webhook "validate.nginx.ingress.kubernetes.io" denied the request: host "<snip>" and path "/" is already defined in ingress gadget-production/<snip>

What you expected to happen:

I expected that creating a second, new ingress that re-used one of my existing hostnames would succeed, and there would be some precedence rule which selected which ruleset processed requests until I deleted one or the other. It did not -- instead, I can't create the second ingress.

If this was a test cluster, this would be very easy, as I could just delete the one multi-hostame ingress and then create several new, single hostname ingresses. But, it's a production cluster where traffic is flowing through all these hostnames at once. I am looking for a zero-downtime way of splitting up this ingress and I cannot find one.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

NGINX Ingress controller
  Release:       v1.9.4
  Build:         846d251814a09d8a5d8d28e2e604bfc7749bcb49
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

Kubernetes version (use kubectl version):

Client Version: v1.28.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.24.17-gke.2230000

Environment:

Cloud provider or hardware configuration: GCP
OS (e.g. from /etc/os-release): GKE's default
Kernel (e.g. uname -a): GKE default
How was the ingress-nginx-controller installed: helm chart, can provide censored values on request
Current State of the controller: can provide on request
Current state of ingress object, if applicable: can provide on request

How to reproduce this issue:

deploy controller with admission webhooks enabled
create ingress with multiple hostnames using above config
attempt to create a second ingress re-using an existing hostname

Anything else we need to know:

This issue is very similar to https://github.com/kubernetes/ingress-nginx/issues/8216, but different. That issue concerned re-using hostnames across different ingress classes. I am finding that is working fine. This issue is about re-using hostnames within the same ingress-class on purpose.

Key comments concerning this issue specifically from the previous thread: https://github.com/kubernetes/ingress-nginx/issues/8216#issuecomment-1761449597 https://github.com/kubernetes/ingress-nginx/issues/8216#issuecomment-1870031991

My take:

I'm opening this issue as I believe this issue is not cosmetic. If you start an ingress out with multiple hostnames, you can't ever change your mind about the arrangement of them. If you ever need to adjust the annotations on some but not other hostnames, you have to have picked the arrangement correctly from the get-go.

I suspect a workaround is deploying a new ingress controller with a different class, re-creating all my ingresses on that ingress controller, deleting the one I want to change on that side, creating hostname-split ingresses there, validating that it is working, and then doing a scary big-bang DNS change to move over to that ingress. That is a pretty risky operation and a lot of effort just to be able to split one ingress apart. I'd really like to avoid it if possible.

Another workaround would be to disable the admission controller. But, as captured in that thread, the admission controller does a lot of critical validation that I don't want to lose. I just want to pass through an ambiguous state for a brief time, and return back to the nice normal state. I am ok with saying "when there are duplicate hostnames there is some rule that says one ingress is totally ignored", but allowing the creation of the duplicate ingresses allows for a zero-downtime cutover when the ambiguity is resolved by deleting one or the other.

Also, if I disable the admission webhook and create ambiguous ingresses, and the nginx reload succeeds, that means the ingress controller is a more zealous validator than nginx itself, which to me merits a configuration option, as it's not a strictly necessary validation for continued operation.

k8s-ci-robot commented 10 months ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

longwuyuan commented 10 months ago

/remove-kind bug

What you expect is not possible currently because that rules you create transform into nginx.conf directives and you are concurrently expecting 2 directives with the exact same config for server block and location block.

airhorns commented 10 months ago

What you expect is not possible currently because that rules you create transform into nginx.conf directives and you are concurrently expecting 2 directives with the exact same config for server block and location block.

Is that fully invalid nginx configuration that will cause the reload to fail? Or just bad practice?

And, that's how it works now but we could change that, no? Could we not do a first pass and omit the second one? Or add an option that is on by default, but when off, makes the admission controller allow this situation?

Also, I would still call this a bug even if it is a wontfix, as if I were using raw nginx by myself, I could coordinate a 0 downtime reload by preparing the new configuration ahead of time and atomically reloading to create the new one, but because I'm using ingress-nginx, I can't atomically delete and create the new ingress objects all at once.

longwuyuan commented 10 months ago

I can not comment on bug/wontfix type of topics just yet based on the data here.

You could try to attempt this on vanilla nginx, without kubernetes, and present results here. The mass use case of ingress-controller does not include the use-case you described for one thing. And allowing for duplicate host+path is not viable for real-life production use because ingress resource fundamental spec is routing based on hostname and path.

What you expect seems like a desired feature for users who design and alter-design as you described. But my opinion is that the amount of resources required to develop such features is not available.

airhorns commented 10 months ago

You could try to attempt this on vanilla nginx, without kubernetes, and present results here.

I have done this in the past, but requires careful orchestration, though its the same kind you'd have to do whenever reloading nginx at all. Let's say you have one server block that you need to split into two -- you'd edit your config file on disk, get your two server blocks into place, maybe validate, and then trigger a reload with service nginx reload or whatever your OS of choice uses. The reload is atomic and 0 downtime, in that in the moment before there is one config, and in the moment after there is a different config, and no requests see the absence of a config. Because its only one nginx and one config file, you don't need to pass through the state where there are two ambiguous server blocks.

And allowing for duplicate host+path is not viable for real-life production use because ingress resource fundamental spec is routing based on hostname and path.

Well, the k8s Ingress resource supports this just fine, it is the admission webhook that is declaring this configuration invalid. What is not viable about the semantics I described above, where ingress-nginx resolves the ambiguity using some set of rules? I think viability and "we don't want to encourage this as it is confusing" are two different things, and it may very well be that it is the wrong default to allow this situation. But for folks with flowing production traffic, we're kind of sunk without some way around this, so to me this seems like a real bug.

But my opinion is that the amount of resources required to develop such features is not available.

What would be involved in a fix do you think?

airhorns commented 10 months ago

Also, what did versions 1.15 and before do, when the admission webhook didn't prevent this situation? This comment from the previous thread seems to indicate this actually used to work. If they managed to stay up, to me that is evidence this is technically possible, and might just work if we added an option to disable this specific check in the admission webhook.

longwuyuan commented 10 months ago

I don't know what is right when there are 2 identical configurations for routing. Although momentary and maybe even incidental, I really do not understand that scene. My limited understanding is that the uniqueness of hostname+path decides routing.

Unlike vanilla non-K8S nginx reverseproxy, the connection terminates on the controller so even a brief, flashing, momentary config where 2 identical configurations hinder routing, potentially break established connections. This could be the reason that check/validation was put in place. I don't know for sure.

I agree that the change you described can not avoid downtime but please join the community meeting and bring up this feature requirement. I am nowhere near that decision making.

longwuyuan commented 10 months ago

/kind feature

hobti01 commented 10 months ago

What you expect is not possible currently because that rules you create transform into nginx.conf directives and you are concurrently expecting 2 directives with the exact same config for server block and location block.

Ingresses with the same hostname + path are most certainly possible in Kubernetes and have a defined behaviour in ingress-nginx controller: duplicate host/path is permitted and oldest wins https://kubernetes.github.io/ingress-nginx/how-it-works/#building-the-nginx-model
It's the webhook validation that prevents duplicate host/path but users want duplicate host/path since this is not a validation failure, but otherwise want validation on Ingress resources (i.e. we don't want to disable the validating webhook since there are many other validations that are completely valid and aligned with ingress controller functionality)
We'd prefer the validating webhook did not mark duplicate host/path as invalid (since it is valid). Probably an argument to the webhook could make that user-configurable.

longwuyuan commented 10 months ago

Since you state that the first of 2, among duplicates, wins, you can also write about all scenarios that will occur, when duplicates are valid. Then wait for comments on that change to the controller, from others.

hobti01 commented 10 months ago

Hi @longwuyuan , the webhook is not required for operation of the controller. The controller already has defined behaviour for duplicate host+path. So there is no need to re-identify all scenarios, there is only a request for the webhook validator to NOT enforce a check that is unnecessary.

longwuyuan commented 10 months ago

Sorry I was not clear.

In the scene of a transition from clubbed rules to individual rules, temporarily/momentarily/transitionally allowing duplicates is an advantage I see.

But for the entire user community to remove protection from duplication in the validation, is not a improvement, you will agree (because unexpected routing will be a impact).

Hence my comment that you write how you would handle duplicates, in real world. That will be input fpr readers here. It will help dive into the "oldest wins" design.

airhorns commented 10 months ago

I feel like if there is already existing behaviour in the controller itself for handling duplicates, we'd have to come up with a really good reason for making a breaking change to that behaviour. This is subjective but I don't think there is really a resolution to the ambiguity that 90% of people would predict -- it's kind of arbitrary IMO. You could argue the oldest one is best, or the newest one is best, or that there should be some tool for specifying which one is the right one, but IMO because there isn't an obviously-the-best solution that folks would guess is what would happen, anyone dealing with this needs to read the docs to understand what is going to happen anyways. With that being the case, I feel like the existing logic would be fine for my use case, so no need to change anything in the controller at al! Why churn if we don't have to?

The solution I think would be best (and it sounds like @hobti01 could use too) would be to add a new option that governs the behaviour of just the webhook validator. By default, this option would keep the behaviour of today to avoid making a breaking change, and duplicates shouldn't be allowed to catch mistakes. But, for brave folks like us, we could enable duplicate support and have duplicates still pass webhook validation. That seems like a low risk way to add support for this.

hobti01 commented 10 months ago

My understanding is that the controller has long-established behaviour for handling duplicate host+path. The existing behaviour of the controller allows zero-downtime Ingress replacement by creating two Ingress resources with the same host+path but different names. This is a valuable use case.

However, the webhook validation has long prevented this use case and there is no method to disable only the overlap check without disabling the entire admission webhook.

zigmund commented 10 months ago

We're currently trying to integrate admission webhook in our clusters, and the issue blocks integration, since we used nginx ingress controller with allow-duplicates behaviour for years now.

Admission webhook could emit warning instead of deny. I think it will be good to have flag like --allow-duplicates or --warn-on-duplicates with false by default.

jahvon commented 9 months ago

Another alternative/workaround here that my team is using in our prod environment is to disable validations for a subset of ingresses with a particular label. This was pretty easy to do by updating the admission webhook's object selectors. This still guarantees that other ingresses created without our special label are fully validated. This has been working well in our case as we are mostly copying existing, validated ingresses and moving them around / renaming them.

kubernetes / ingress-nginx

Impossible to change hostname<->ingress mapping without downtime due to admission webhook #10820