Closed philomory closed 1 year ago
@philomory this seems very reasonable since we can easily save a lot of work and moving parts for operators. would you be interested in implementing this feature? no problem if not, just want to check.
By the way, thanks for the detailed explanation. that was very helpful for my understanding
I would love to help implement this, but I don't think I have the required time or the required skillset. I can sort of read Go, a little bit, but I'm not going to be able to write Go to an acceptable level of quality for a project like this.
@philomory no problem. We can implement it. Before that, though, I'd like to get clarity on how this feature should interact with #281. Do you have any thoughts?
@arschles That's a good question! I don't have a strong opinion, since we wouldn't be using the wildcard support, ourselves. That said, I think it probably should be supported to use both at once, even if I wouldn't recommend anyone actually do so. Ingress
supports both at once, so HTTPScaledObject
probably should as well.
There's no (explicit) documentation that I can see on the behavior of Ingress
in situations like this:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-specific-host
spec:
rules:
- host: "foo.bar.com"
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: service1
port:
number: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-wildcard-host
spec:
rules:
- host: "*.foo.com"
http:
paths:
- pathType: Prefix
path: "/api"
backend:
service:
name: service2
port:
number: 80
That is, one of those has an exact match of the host but a less specific path prefix, and the other has a more specific path prefix but only a wildcard host match (assuming a request for https://foo.bar.com/api
). Whatever the behavior of Ingress
is in this situation, though, HTTPScaledObject
should probably match it. Although it's possible that the behavior in this situation is left up to the individual Ingress Controller, which would be unfortunate, if understandable.
@philomory perhaps this would be a start to a unified spec for path-based routing and wildcard support:
*
character anywhere in the hostname or path definition*.foo.com/a/*
and foo.com/*
will match the latter. This is because the former requires a subdomain to come before the foo.com
*foo.com/a/*
and foo.com/*
will match the former because the /a/*
after foo.com
is more specific in the former than the /*
in the latterHTTPScaledObject
s, the one submitted last will have an error reported on its status
field, and will not take effect.What are your thoughts?
cc/ @comron and @andresrsanchez because you are both involved with #281
I'd ultimately like to come up with an over-arching issue that covers all use-cases in here and #281, before we proceed with implementing anything.
I think there are significant advantages to matching the specified behavior of ingress.networking.k8s.io/v1
where possible; although if there really is no specification for how multiple potential wildcard hostname matches are resolved, then presumably we'd have to choose something for HTTPScaledObject. That said, I could probably be convinced by a more detailed proposal for the unified spec (and I'm not the one implementing it, so in the end I'll be pretty happy as long as we end up with something workable overall).
Three up-front concerns come to mind:
foo.*.bar.com
isn't valid in DNS, it's not valid in TLS, and it's not valid in Ingress (though people have asked for it to be made valid)foo.com/a/b/c
would match foo.com/a/b/*
with higher priority than foo.com/a/*
, but what about if we had both foo.com/a/b/*
, foo.com/a/*/c
and foo.com/*/b/c
? Which one would be matched? Or are those "potentially conflicting patterns"?foo.com/a/*
and foo.com/*
couldn't both be defined on the same server, which would certainly be problematic.In general I don't think it's a good idea to define rules that say resource b is invalid because resource a already exists (with the obvious exception of multiple resources of the same type in the same namespace with the same name); I think it's better to simply define precedence rules that clearly specify which option will be chosen when multiple valid options exist (something that the Ingress documentation spells out for paths but oddly omits for hosts)
@philomory you're right, there are indeed holes in my spec. perhaps you're right and it's best to just duplicate the ingress rules, along with a rule for overlapping wildcard hosts, as you said. I don't know enough of the specifics of those rules and I have to go read up. Do you have any preliminary ideas on what do with overlapping wildcard hosts?
@arschles So, I did some digging, and I finally found some at least some information on the mandated behavior of Ingress, in the API reference (Emphasis mine):
Host is the fully qualified domain name of a network host, as defined by RFC 3986. Note the following deviations from the "host" part of the URI as defined in RFC 3986: 1. IPs are not allowed. Currently an IngressRuleValue can only apply to the IP in the Spec of the parent Ingress. 2. The
:
delimiter is not respected because ports are not allowed. Currently the port of an Ingress is implicitly :80 for http and :443 for https. Both these may change in the future. Incoming requests are matched against the host before the IngressRuleValue. If the host is unspecified, the Ingress routes all traffic based on the specified IngressRuleValue.
The immediately relevant portion (the part I bolded) is fairly terse, and honestly could be interpreted multiple ways. I decided to verify the behavior of the specific controller we use, ingress-nginx
:
ingress-nginx
uses Ingress objects to construct an nginx.conf
file.host
field becomes a server
block, and each path
field becomes a location
block within that server
block.host
, then the respective location
blocks are all merged into a single shared server
block in the final nginx.conf
file.server
block to use to handle the request before it ever starts looking at the request path.server
block to use, Nginx will prefer an exact match to a wildcard match.The practical upshot of this is that ingress-nginx
always chooses a server
block based on the best (most specific) match of the hostname, before it even starts looking at the path. Indeed, if you have one ingress with host: foo.example.com
and path: /bar
, and another ingress with host: *.example.com
and path: /baz
, a request for foo.example.com/baz
will simply result in a 404 being returned, because the more specific match with the foo.example.com
ingress means that the *.example.com
ingress is not checked at all.
This behavior may be specific to ingress-nginx
, but, I think it's fairly reasonable to assume that this is what "Incoming requests are matched against the host before the IngressRuleValue" is meant to mean.
All that said, I did a bit more digging around, because a full-on replacement for the entirety of Ingress is already in development by sig-networking
, the Gateway API. The specification for the Gateway API gets a lot more specific about how ambiguity/multiple-matches is meant to be handled:
Proxy or Load Balancer routing configuration generated from HTTPRoutes MUST prioritize rules based on the following criteria, continuing on ties. Precedence must be given to the the Rule with the largest number of:
- Characters in a matching non-wildcard hostname.
- Characters in a matching hostname.
- Characters in a matching path.
- Header matches.
- Query param matches.
(It's worth looking at the actual specification, there's a lot of interesting detail in there).
There's also a more general section on resolving conflicts.
Obviously, the Gateway API is still in alpha and isn't widely supported (and also, may change), but the way the rules are currently written seems to indicate that the behavior of the ingress-nginx
is probably the way to go.
@philomory thanks for this detail. This gives me an idea - I wonder if we should simply reference an Ingress
from the HTTPScaledObject
. I would hate to re-implement all of these rules and keep them current with changes going forward.
With this scheme, HTTPScaledObject
s would keep the information on the backing Deployment
, but the addon would read the referenced Ingress
for all the routing information. This way, the interceptor would still need to understand and implement most of the rules, but would not need to duplicate them inside the HTTPScaledObject
.
WDYT?
@arschles For the project, that does sound like a great idea (although it may become more complicated once the Gateway API is GA, since, then what do you do? I guess you could reference by kind
, in the same way that KEDA itself can have scaleTargetRef
s with other kinds
than Deployment
s).
The only problem I see is that, currently, HTTPScaledObject
is like a much-simplified version of Ingress. For example, the following is an entirely valid Ingress object, that means essentially the same thing as the two ingresses in my previous example combined:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-specific-host
spec:
rules:
- host: "foo.bar.com"
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: service1
port:
number: 80
- host: "*.foo.com"
http:
paths:
- pathType: Prefix
path: "/api"
backend:
service:
name: service2
port:
number: 80
spec.rules
and spec.rules.*.paths
are both arrays, and while I tend never to include more than one of each per Ingress object, I'm sure other users do, especially depending on what Ingress Controller they use (for example the old aws-alb-ingress-controller
- since replaced by the aws-load-balancer-controller
- used to create one load balancer per Ingress object, so with that controller you were definitely incentivized to combine all your rules into a single Ingress object). I'm not sure if there's any other Ingress Controllers out there that still act that way, but, I'm sure there are people who prefer grouping all their rules into one Ingress anyway.
In any case, when presented with an Ingress object with multiple rules and multiple paths, I'm not sure how HTTPScaledObject
should indicate which one it is meant to match; I see only three obvious options:
HTTPScaledObject
matches all requests that a given Ingress object matches; if a user wants to scale different deployments depending on the host or path, they'll need to refactor their "mono-Ingress" into multiple separate Ingresses.HTTPScaledObject
allows indicating array indices to match, e.g.
spec:
routingRef:
name: example
kind: Ingress
rule: 1
path: 0
This option feels super brittle and confusing.
HTTPScaledObject
allows you to reference individual routing elements within the Ingress
by name, e.g.
spec:
routingRef:
name: example
kind: Ingress
host: foo.bar.com
path: /
This option is less confusing than option 2, but at this point, you don't really need the Ingress
anymore, since you're already specifying the path
and host
right there in the HTTPScaledObject
.
Another (less pressing) problem is that, for example, we have an older application which consists of multiple microservices behind an API gateway, and there's only one Ingress, which points to the API gateway (and the API gateway, in turn, routes traffic to different backend services depending on the request path; honestly the API Gateway's functionality became mostly redundant after we moved the application to Kubernetes, but it's not entirely redundant). Ultimately my goal for that particular application is to have HTTPScaledObject
scale the appropriate backend microservice deployment based on the request path, even though all the traffic ends up getting routed to the API Gateway. If HTTPScaledObject
references an Ingress
to match traffic to scaling targets, we'd have to create a bunch of dummy Ingress objects that all route to the same place.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
I don't think this is stale, exactly, although personally I'm investigating at this point using a service mesh (linkerd) to scale off of traffic metrics as an alternative to using the KEDA http-add-on (there will probably be cases that the http-add-on adresses that service mesh metrics do not, and vice versa, but I don't think any of them apply to my situation specifically).
This issue has been automatically closed due to inactivity.
This is a very crucial feature so that Keda http-add-on and ingress objects work in tandem. This issue should be re-opened
This is a necessary feature for using http-add-on with ingress.
Please, reopen this issue. Routing path's definitely should be supported - any complex app scenario usually have it.
I vote this feature too. I understand team not have much capacity, but we can put it in backlog items first. @tomkerkhove
@arschles is this project maintained ? we are considering to add a solution for this but we are not sure it is maintained...
This project is definitely still maintained but community driven. We definitely welcome contributions!
@tomkerkhove thanks for the response... The reason I have asked is that this issue seems to be really important if one is willing to work with the Keda interceptor and path-based routing ... this use case is pretty standard in our scenarios ...
I started working on this I'll appreciate some help integrating with the current setup, as we use this ONLY with nginx-ingress passing requests into the cluster, so for use-cases without path in the route table etc. it'll take more work
I think this is related to @t0rr3sp3dr0's proposal in #605 as well
The interceptor should be able to route requests for the same host to different services based on the request path.
Use-Case
When using an ingress controller, you can create Ingress objects that route different requests for the same host to different backend services, depending on the path portion of the request. This can be done by within a single Ingress object or across multiple of them.
When using the HTTP add-on, however, this won't work; you can, of course, still define multiple Ingress objects with different paths (or one Ingress object with multiple path entries), but they all need to point to the
keda-add-ons-http-interceptor-proxy
as their backend. Unfortunately, once the interceptor has intercepted them, it's not able to route them or scale them appropriately. The routing portion of this problem could be cumbersomely worked around by having a second ingress controller that the proxy could just route all traffic too (actually you could just make the proxy a service of type LoadBalancer and only have an ingress controller behind the proxy), but that still means that the scaling won't be handled as intended.Consider the following setup (pre-KEDA):
This will route a request for e.g.
https://api.example.com/media/foo
to themedia-provider
service, and a request tohttps://api.example.com/customer/bar?baz=1
to thecustomer-provider
service.Once you put the KEDA HTTP Add-on in place, things begin to break down:
Both
HTTPScaledObject
specs have the samehost
field, and there's nopath
field, so there's no way for the scaler to know where to route traffic to. Even if you put the Ingress Controller "behind" the interceptor (once a cluster-global installation is possible) to handle the routing, there's still no way for the scaler to know that when a bunch of requests come in for a/customer
endpoint, it should scale thecustomer-provider
Deployment rather than themedia-provider
Deployment.To resolve this problem, the
HTTPScaledObject
field should have an optionalspec.path
field, and possibly also an optionalspec.pathType
field. For example:Specification
For simplicity, I think you could get away with supporting only either 0 or 1 paths per
HTTPScaledObject
resource. Resolution would work as follows:HTTPScaledObject
which omitsspec.path
is equivalent to one which specifiesspec.path: /
spec.pathType
is implemented at all, it should have the same basic semantics asspec.rules[*].http.paths[*].pathType
inIngress
, except that the only supported values would bePrefix
andExact
(no need to include historical baggage likeImplementationSpecific
since there's only one implementation and there's also no history to derive baggage from)spec.pathType
is implemented at all, omittingspec.pathType
will be equivalent tospec.pathType: Prefix
HTTPScaledObject
s with a matchinghost
and evaluate them as follows:spec.pathType
is implemented, check all matchedHTTPScaledObject
resources withspec.pathType: Exact
and see if thespec.path
listed in theHTTPScaledObject
exactly matches the path part of the request URI. If so, the first match found is selected.spec.pathType
isn't implemented, or if none of theExact
typeHTTPScaledObject
s matched the path, take the remainingHTTPScaledObject
s matching thehost
field and sort them by the length of thespec.path
field. Check each resource'sspec.path
field to see if it is an "element-wise prefix" of the path portion of the request URI, given a separator of "/"; the first match found is selected.HTTPScaledObject
resource was selected in step 4, route the request to the service specified in thatHTTPScaledObject
, and scale the deployment specified in thatHTTPScaledObject
accordingly.This procedure defined in step 4 is essentially the same as that defined for matching requests to Ingress objects by the
networking.k8s.io
API, except for the absence of theImplementationSpecific
path type.With Ingress, users have the option to specify multiple
path
s for the same host (or even multiple hosts) in a single Ingress resource, but, I think that would be unnecessary complexity forHTTPScaledObject
; simply creating a distinctHTTPScaledObject
per path/prefix should suffice.