kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.47k stars 8.25k forks source link

Question: why the case insensitive regular expression location modifier will be enforced on ALL paths If the use-regex is set #11397

Open simonellefsen opened 5 months ago

simonellefsen commented 5 months ago

From the ingress-nginx doc https://kubernetes.github.io/ingress-nginx/user-guide/ingress-path-matching/#warning

IMPORTANT NOTES:

If the use-regex OR rewrite-target annotation is used on any Ingress for a given host, then the case insensitive regular expression location modifier will be enforced on ALL paths for a given host regardless of what Ingress they are defined on.

My question is why? why does it have to be this way? why can't it just be the ingress that has "use-regex=true" or "rewrite-target" defined that uses case insensitive regular expression locations match.

We have a very old website with some not so well designed paths like

/[a-z0-9_.-]{3,40}(\/)?([\?\#]|$)

And when we add that as an ingress and sets the annotation use-regex=true then all the Exact and Prefix ingress are all converted into case insensitive regular expression location modifiers and we loose that ability to have a simple prioritisation.

Like having a Prefix ingress for path /foo that will have a higher priority than the regex above.

And we cannot for the life in us understand why anyone would want this default behaviour, that if one ingress is using regex then all is converted into regex location matches.

We have to patch the nginx.tmpl to get around this

This is our patch and it works fine for us but maybe we are missing something (most likely we are, hence why I'm asking this question)

With this patch an Exact ingress type will not be changed into case insensitive regular expression location modifiers but remain "=" location modifier.

Simon Ellefsen

--- nginx.tmpl  2024-05-17 13:54:19
+++ nginx.tmpl  2024-05-17 13:56:09
@@ -1056,8 +1056,26 @@

         {{ buildMirrorLocations $server.Locations }}

-        {{ $enforceRegex := enforceRegexModifier $server.Locations }}
+        ## {{ $enforceRegex := enforceRegexModifier $server.Locations }}
         {{ range $location := $server.Locations }}
+
+        # These two lines are the custom change. The rest of the file is the same as the original file.
+        # The goal of this change is to change the default behaviour of the ingress-nginx controller that
+        # makes all location blocks for a host to be regex based if at least one of ingresses has `use-regex` annotation set to true.
+        # The way we're changing this behaviour is by limiting $enforceRegex to be true only if the location has `use-regex` in it's Ingress resource set to true not in the whole host.
+        # Also we're preventing "Exact" paths to be regex based EVEN IF the `use-regex` is set to true on it's Ingress by checking the return value of buildLocation with second argument set to false
+        # as it's only going to have "=" if it's an exact path.
+        # The default behaviour while being expected and documented is strange and doesn't fit us as it causes problems with `profile-page` and might cause problems going down the road.
+        # Also see this issue in the ingress-nginx repository: https://github.com/kubernetes/ingress-nginx/issues/10618
+        #
+        # It still leaves us with a problem when using Prefix pathType with `use-regex` as a prefix path like `/foo` will match `/foobar`
+        # as there's no way to figure out if it's a Prefix or a Regex in the template ($location.PathType is a reference to a string which we can't deref here to know for sure).
+        # This can be mitigated by using `/foo/` as a path in the Ingress resource as it will end up into being a regex like `^/foo/` in the final config.
+        # So when using `use-regex` with Prefix pathType we should make sure to add a trailing slash to the path in the Ingress resource.
+        # So if someone wants to match `/foo` and `/foo/bar` they should add both `Exact:/foo` and `Prefix:/foo/` paths in the Ingress resource.
+        {{ $plainPath := buildLocation $location false }}
+        {{ $enforceRegex := and (eq $location.Rewrite.UseRegex true) (not (contains $plainPath "=")) }}
+
         {{ $path := buildLocation $location $enforceRegex }}
         {{ $proxySetHeader := proxySetHeader $location }}
         {{ $authPath := buildAuthLocation $location $all.Cfg.GlobalExternalAuth.URL }}
k8s-ci-robot commented 5 months ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 5 months ago

cc @rikatz @tao12345666333 @strongjz @Gacko

Gacko commented 5 months ago

Because, IIRC, we also have a check for multiple Ingress resources with the same hostname and do not allow them. I'm not 100% sure about this, but if we do have it, then this would mean you can't define another Ingress resource without use-regex anyway.

arkberezkin commented 5 months ago

Maybe I didn't understand correctly, but we do have multiple Ingress resources with the same hostname and it works completely fine.

arkberezkin commented 5 months ago

But even if it's the case. Why turn Exact and Prefix into a regex matcher instead of leaving them as is.

While being "expected" behavior it's definitely confusing

https://github.com/kubernetes/ingress-nginx/issues/10618

simonellefsen commented 5 months ago

A small example to illustrate.

If you define two ingresses using the same host header and the one ingress is ImplementationSpecific and has the annotation nginx.ingress.kubernetes.io/use-regex: "true" and the other is an Exact path type, like

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend-exact
  namespace: echoserver
spec:
  ingressClassName: nginx
  rules:
  - host: example.com
    http:
      paths:
      - path: /foo
        pathType: Exact
        backend:
          service:
            name: echoserver-exact
            port:
              name: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend-regex
  namespace: echoserver
  annotations:
    nginx.ingress.kubernetes.io/use-regex: "true"
spec:
  ingressClassName: nginx
  rules:
  - host: example.com
    http:
      paths:
      - path: /[a-z0-9_.-]{3,40}
        pathType: ImplementationSpecific
        backend:
          service:
            name: echoserver-regex
            port:
              name: http

Then the default nginx.tmpl rendering will create a config that looks like

        ## start server example.com
        server {
                server_name example.com ;

                location ~* "^/[a-z0-9_.-]{3,40}" {

                        set $namespace      "echoserver";
                        set $ingress_name   "frontend-regex";
                        …
                }

                location ~* "^/foo" {

                        set $namespace      "echoserver";
                        set $ingress_name   "frontend-exact";
                        …
                }
                ….
        }
        ## end server example.com

And both of the location directives have converted into a regex match - this means that the Exact Pathtype /foo does not have higher priority than the regex path "^/[a-z0-9_.-]{3,40}"

But with the patch we are applying to the nginx.tmpl we get

        ## start server example.com
        server {
                server_name example.com ;

                location ~* "^/[a-z0-9_.-]{3,40}" {

                        set $namespace      "echoserver";
                        set $ingress_name   "frontend-regex";
                        …
                }

                location = /foo {

                        set $namespace      "echoserver";
                        set $ingress_name   "frontend-exact";
                        …
                }
                ….
        }
        ## end server example.com

Notice that location = /foo is no longer a regex match and hence will have higher priority than the location ~* "^/[a-z0-9_.-]{3,40}"

And we/I just cannot understand why this would not be the default behaviour (or maybe have an option for it)

longwuyuan commented 5 months ago

ANY annotation from an ingress, applies to the entire ingress and the impact of the annotation is not limited to just one single path, configured in the ingress. Or so I had assumed AFAIK. I did not think that only the use-regex annotation worked like that. I think annotations are implemented to apply to all paths in the ingress. I am not a developer so it will take a long time to read and understand the code.

The server-block that all that annotations apply to are the server from the ingress.spec.rules.host field.

The location blocks inside the server block are from the ingress.spec.rules.http.paths.path field.

When multiple ingress resources have the same value for the ingress.spec.rules.host field, they are merged in the nginx.conf file, under the same server block.

Following above 3 assumptions, what you are experiencing seems consistent with expected behavior.

I guess you are expecting that this use-regex annotation should do something exceptional and not apply one annotation to ALL the paths, and not apply to all the locations blocks of the server. Because it is messing with the priority for matching the rule to the incoming request headers.

Your expectation seems fair. If I am not wrong some other people have also reported this, in other issues.

And yet there are lots of users who have not reported this problem so I guess they do not have the same use case and/or same conflict of interest with prioritization of the location blocks.

I too will wait for comments and thoughts on this. It will be interesting to learn, how one annotation should NOT apply to all the paths.

Gacko commented 5 months ago

/assign

arkberezkin commented 5 months ago

While we use a template to work-around this behavior it'd be better to change the actual go code for it to work in a "more expected way". #10618 mentions the specific place where the search for "any use-regex for host" is happening to make all the paths in the server declaration regex based.

If you want I can try to prepare a PR that would make it behave the way that would make more sense (IMO).

There're two possibilities:

  1. Change all paths declared in an ingress to regex if it has the use-regex annotation which IMO would still be strange as it would change Exact paths to use regex based location block. (I'm not an expert in nginx but maybe it makes sense if there's a rewrite)
  2. Change only ImplementationSpecific paths to regex based location block which would be the most sane behavior. Especially looking at this table (it doesn't have ImplementationSpecific but the way Exact and Prefix work makes me expect that ImplementationSpecific is more like Prefix rather than having priority over Exact).

Edit:

Also one point in favor of implementing 2 is that it would make the most sense for ImplementationSpecific routes to deviate from the behavior described in the Ingress API documentation. Currently this annotation turns Exact routes into Prefix routes on the whole host which is technically a deviation from the API specification.

longwuyuan commented 5 months ago

Without being a developer, I can just opine that it will not be easy to change the go code behavior, if that go code is generic code path for all annotations. If You want the generic go code to behave differently, just for the use-regex annotation, then its not going to be an improvement.

But all that is assumption so we just wait for other comments.

arkberezkin commented 5 months ago

it will not be easy to change the go code behavior, if that go code is generic code path for all annotations

It's actually not. It's a special case only for the use-regex and rewrite-target annotations which can be seen in the enforceRegexModifier function.

longwuyuan commented 5 months ago

@arkberezkin thank you for that info. It helps a lot I think. We have to wiat for comments from others.

github-actions[bot] commented 4 months ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.