Open dprotaso opened 3 years ago
/assign @dprotaso
I've been working on 2.1
agree 2.1 feels like the most promising approach
2.1 feels fairly promising. I wanted to make one comment on the implementation:
I think I prefer a single IP vs N IPs - this is to avoid cascading herds - ie. contour uses a daemon set so a large cluster would result in many IPs being propagated & when deployments are rolled vs. a stable service IP.
It's possible to use subsetting here to reduce the number of IPs handed out for each Service, but (on further reflection), I think the single-IP approach is probably acceptable.
I agree that we should also engage with the Gateway API to enable this sort of rewriting within a single Gateway; our current network data path looks like:
[internet] -> [lb] -> [external kingress] -> [k8s service lb] -> [internal kingress] --> rest of knative
And it would be nice to remove the [k8s service lb]
and [internal kingress]
from the traffic path, particularly for high-traffic services.
If we plan on migrating to EndpointSlices with CNAME records in the future we should confirm ingresses support them. I would hope it's transparent but we should confirm.
ie. https://github.com/projectcontour/contour/issues/3950#issuecomment-904967484
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen
. Mark the issue as
fresh by adding the comment /remove-lifecycle stale
.
/lifecycle frozen
+1 for option 2.1 as this would help also with creating a multi-tenant setup via NetworkPolicies which is currently not possible, because NetworkPolicy
s don't work with DNS resolved external names either.
BTW, is there documentation on how to create a multi-tenant setup (i.e. to prevent KServices from ns A accessing KServices in ns B) ? (don't want to sidetrack this issue, but before opening a dedicated issue for this want to be sure that there is not already something).
+1 for option 2.1 as this would help also with creating a multi-tenant setup via NetworkPolicies which is currently not possible, because NetworkPolicys don't work with DNS resolved external names either.
Long term I think I think we want to use 2.2 - FQDN EndpointSlices so we're not managing endpoints and IPs. The K8s services mentioned in this issue front the L7 proxies.
the multi-tenant stuff is probably a separate issue
/unassign dprosato
Since this is in "Next"
/unassign @dprotaso
That will teach me to manage issues with my phone.
@dprotaso you mention we should use 2.2 for the long term but IIUR that is only possible for K8s >= 1.22 (which introduces EndpointSlices). Since Knative supports K8s < 1.22 for quite some time to come and we can't use EndpointSlices
until we switch to a Knative minimal requirement to K8s 1.22, shouldn't we target the (intermediate) solution 2.1 first? We switch to K8s 1.22 when K8s 1.25 gets released (according to https://kubernetes.io/releases/version-skew-policy/ and https://github.com/knative/community/blob/main/mechanics/RELEASE-VERSIONING-PRINCIPLES.md), and 1.25 will not be around within the next 6 months (assuming that 1.23 will be out anytime soon).
That last problem doesn't seem to be a problem now as this issue hasn't moved since Dec 2021 :-)
@richardvflux unfortunately upstream wants to deprecate FQDN for endpoint slices - see https://github.com/kubernetes/kubernetes/issues/114210
I don't think there is an alternative at the moment
Yes, we ran smack into that issue with cilium when trying to create consistent database service redirection. So is there an alternative under discussion by any chance?
So is there an alternative under discussion by any chance?
I'm not aware of an alternate proposal in the k8s upstream.
FYI - I consider this blocked on the result of this discussion - so far there doesn't seem to be an alternative https://github.com/kubernetes/kubernetes/issues/114210
How does Knative use
Service.ExternalName
We currently create an
ExternalName
service for each Knative Route. This results in a CNAME DNS record making the Route accessible within the cluster. This path supports all the desired traffic splitting features etc.The
ExternalName
value is typically sourced from theKIngress.Status.(Public|Private)LoadBalancer property
. Current ingress implementations set aDomainInternal
value for the LoadBalancer status which typically point to an internal K8s service pointing at the ingress's L7 Proxy. The other status that can be propagated back are:IP
,Domain
, andMeshOnly (bool)
The current DomainMapping implementation will create a KIngress object pointing to the
Status.Address.URL
property of the referenced object. For Knative Service/Route this ends up being the K8s ExternalName Service.Problem
There's an advisory (https://github.com/kubernetes/kubernetes/issues/103675) suggesting ingress implementations should not forward traffic to a Service of type
ExternalName
and/or allow disabling this functionality.This is exactly what Contour has done because of https://github.com/projectcontour/contour/security/advisories/GHSA-5ph6-qq5x-7jwc and I'm assuming long term more will follow suit providing options to disable.
Thus long term we should avoid the use of
ExternalName
services in order for DomainMapping to function correctly.Potential Approaches
1) Changing DomainMapping implementation
1.1) Resolving ExternalName to the target Namespace
PoC PR: https://github.com/knative/serving/pull/11747
The approach here is to resolve the
Status.Address.URL
and create the KIngress object in the target namespace. This would require moving and keeping related resources in sync across namespaces. This is because certain ingress implementation don't allow you to reach across namespaces to access resource - ie. contour & k8s secrets.1.2) Don't use KIngress
Since DomainMapping is essentially a Host/URI rewrite this can be done with a filter approach instead of using a KIngress. The filter would rewrite the host & path etc prior to forwarding requests upstream.
We don't have abstractions for this type of functionality and whether it's supported by different ingresses. Given that maybe this is a feature request to bring to the upstream Gateway API folks.
2) Changing Route Service to not use ExternalName
2.1) Use Endpoints
If an Ingress implementation disables
ExternalName
forwarding they could populate the KIngress Status with anIP
and the Route reconciler can opt to use a Headless Service with our own managed Endpoint. Endpoints don't support CNAME addresses so we'll still require ServiceExternalName
if theIP
is not set.Some observations/considerations: Headless means switching the type to
ClusterIP
and setting theclusterIP
property toNone
. These properties are immutable (context: https://groups.google.com/g/kubernetes-sig-network/c/8opCQZhxpxM/m/CynHrF_cAQAJ). Also switch the type toClusterIP
=>ExternalName
seems broken see the issue: https://github.com/kubernetes/kubernetes/issues/104329So if a an ingress implementation toggles between the
IP
it means we would have to delete the service/endpoint and recreate it when that happens. That's probably ok.I think I prefer a single IP vs N IPs - this is to avoid cascading herds - ie. contour uses a daemon set so a large cluster would result in many IPs being propagated & when deployments are rolled vs. a stable service IP.
2.2) Use EndpointSlices
EndpointSlices support A and CNAME records- so we could create a headless service and a single EndpointSlice.
This solves the
Service.ExternalName
problems since the resource lifecycle can be limited by RBAC (related: https://github.com/kubernetes/kubernetes/issues/103675)Unfortunately whether this is supported depends the DNS implementation.
EndpointSlice support was added in CoreDNS 1.8.1 (https://coredns.io/2021/01/20/coredns-1.8.1-release/) - which means K8s/kubeadm didn't pick up this change until K8s 1.22. We won't adopt K8s 1.22 as a minimum until next year so switch now would be a breaking change.
Current Approach
I think we should use Endpoints (2.1) even with the caveats mentioned and next year switch over to EndpointSlices (2.2) once 1.22 becomes our min K8s version. Then we can assume the CNAME record will work as expected.
Long term we should start the discussion with Gateway API folks to expose a filter type concept. Then DomainMapping could potentially bypass the Route's K8s service and hit the Revision's K8s service directly. It's currently unclear to me if that's an actual win since you're going through the kibe-proxy anyway. Probably worth an exploration.