knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.54k stars 1.15k forks source link

Migrating Route off Service.ExternalName #11821

Open dprotaso opened 3 years ago

dprotaso commented 3 years ago

How does Knative use Service.ExternalName

We currently create an ExternalName service for each Knative Route. This results in a CNAME DNS record making the Route accessible within the cluster. This path supports all the desired traffic splitting features etc.

The ExternalName value is typically sourced from the KIngress.Status.(Public|Private)LoadBalancer property. Current ingress implementations set a DomainInternal value for the LoadBalancer status which typically point to an internal K8s service pointing at the ingress's L7 Proxy. The other status that can be propagated back are: IP, Domain, and MeshOnly (bool)

The current DomainMapping implementation will create a KIngress object pointing to the Status.Address.URL property of the referenced object. For Knative Service/Route this ends up being the K8s ExternalName Service.

Problem

There's an advisory (https://github.com/kubernetes/kubernetes/issues/103675) suggesting ingress implementations should not forward traffic to a Service of type ExternalName and/or allow disabling this functionality.

This is exactly what Contour has done because of https://github.com/projectcontour/contour/security/advisories/GHSA-5ph6-qq5x-7jwc and I'm assuming long term more will follow suit providing options to disable.

Thus long term we should avoid the use of ExternalName services in order for DomainMapping to function correctly.

Potential Approaches

1) Changing DomainMapping implementation

1.1) Resolving ExternalName to the target Namespace

PoC PR: https://github.com/knative/serving/pull/11747

The approach here is to resolve the Status.Address.URL and create the KIngress object in the target namespace. This would require moving and keeping related resources in sync across namespaces. This is because certain ingress implementation don't allow you to reach across namespaces to access resource - ie. contour & k8s secrets.

1.2) Don't use KIngress

Since DomainMapping is essentially a Host/URI rewrite this can be done with a filter approach instead of using a KIngress. The filter would rewrite the host & path etc prior to forwarding requests upstream.

We don't have abstractions for this type of functionality and whether it's supported by different ingresses. Given that maybe this is a feature request to bring to the upstream Gateway API folks.

2) Changing Route Service to not use ExternalName

2.1) Use Endpoints

If an Ingress implementation disables ExternalName forwarding they could populate the KIngress Status with an IP and the Route reconciler can opt to use a Headless Service with our own managed Endpoint. Endpoints don't support CNAME addresses so we'll still require Service ExternalName if the IP is not set.

Some observations/considerations: Headless means switching the type to ClusterIP and setting the clusterIP property to None. These properties are immutable (context: https://groups.google.com/g/kubernetes-sig-network/c/8opCQZhxpxM/m/CynHrF_cAQAJ). Also switch the type to ClusterIP => ExternalName seems broken see the issue: https://github.com/kubernetes/kubernetes/issues/104329

So if a an ingress implementation toggles between the IP it means we would have to delete the service/endpoint and recreate it when that happens. That's probably ok.

I think I prefer a single IP vs N IPs - this is to avoid cascading herds - ie. contour uses a daemon set so a large cluster would result in many IPs being propagated & when deployments are rolled vs. a stable service IP.

2.2) Use EndpointSlices

EndpointSlices support A and CNAME records- so we could create a headless service and a single EndpointSlice.

This solves the Service.ExternalName problems since the resource lifecycle can be limited by RBAC (related: https://github.com/kubernetes/kubernetes/issues/103675)

Unfortunately whether this is supported depends the DNS implementation.

EndpointSlice support was added in CoreDNS 1.8.1 (https://coredns.io/2021/01/20/coredns-1.8.1-release/) - which means K8s/kubeadm didn't pick up this change until K8s 1.22. We won't adopt K8s 1.22 as a minimum until next year so switch now would be a breaking change.

Current Approach

I think we should use Endpoints (2.1) even with the caveats mentioned and next year switch over to EndpointSlices (2.2) once 1.22 becomes our min K8s version. Then we can assume the CNAME record will work as expected.

Long term we should start the discussion with Gateway API folks to expose a filter type concept. Then DomainMapping could potentially bypass the Route's K8s service and hit the Revision's K8s service directly. It's currently unclear to me if that's an actual win since you're going through the kibe-proxy anyway. Probably worth an exploration.

dprotaso commented 3 years ago

/assign @dprotaso

I've been working on 2.1

julz commented 3 years ago

agree 2.1 feels like the most promising approach

evankanderson commented 3 years ago

2.1 feels fairly promising. I wanted to make one comment on the implementation:

I think I prefer a single IP vs N IPs - this is to avoid cascading herds - ie. contour uses a daemon set so a large cluster would result in many IPs being propagated & when deployments are rolled vs. a stable service IP.

It's possible to use subsetting here to reduce the number of IPs handed out for each Service, but (on further reflection), I think the single-IP approach is probably acceptable.

I agree that we should also engage with the Gateway API to enable this sort of rewriting within a single Gateway; our current network data path looks like:

[internet] -> [lb] -> [external kingress] -> [k8s service lb] -> [internal kingress] --> rest of knative

And it would be nice to remove the [k8s service lb] and [internal kingress] from the traffic path, particularly for high-traffic services.

dprotaso commented 3 years ago

If we plan on migrating to EndpointSlices with CNAME records in the future we should confirm ingresses support them. I would hope it's transparent but we should confirm.

ie. https://github.com/projectcontour/contour/issues/3950#issuecomment-904967484

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

dprotaso commented 2 years ago

/lifecycle frozen

rhuss commented 2 years ago

+1 for option 2.1 as this would help also with creating a multi-tenant setup via NetworkPolicies which is currently not possible, because NetworkPolicys don't work with DNS resolved external names either.

BTW, is there documentation on how to create a multi-tenant setup (i.e. to prevent KServices from ns A accessing KServices in ns B) ? (don't want to sidetrack this issue, but before opening a dedicated issue for this want to be sure that there is not already something).

dprotaso commented 2 years ago

+1 for option 2.1 as this would help also with creating a multi-tenant setup via NetworkPolicies which is currently not possible, because NetworkPolicys don't work with DNS resolved external names either.

Long term I think I think we want to use 2.2 - FQDN EndpointSlices so we're not managing endpoints and IPs. The K8s services mentioned in this issue front the L7 proxies.

the multi-tenant stuff is probably a separate issue

evankanderson commented 2 years ago

/unassign dprosato

Since this is in "Next"

evankanderson commented 2 years ago

/unassign @dprotaso

That will teach me to manage issues with my phone.

rhuss commented 2 years ago

@dprotaso you mention we should use 2.2 for the long term but IIUR that is only possible for K8s >= 1.22 (which introduces EndpointSlices). Since Knative supports K8s < 1.22 for quite some time to come and we can't use EndpointSlices until we switch to a Knative minimal requirement to K8s 1.22, shouldn't we target the (intermediate) solution 2.1 first? We switch to K8s 1.22 when K8s 1.25 gets released (according to https://kubernetes.io/releases/version-skew-policy/ and https://github.com/knative/community/blob/main/mechanics/RELEASE-VERSIONING-PRINCIPLES.md), and 1.25 will not be around within the next 6 months (assuming that 1.23 will be out anytime soon).

richardvflux commented 1 year ago

That last problem doesn't seem to be a problem now as this issue hasn't moved since Dec 2021 :-)

dprotaso commented 1 year ago

@richardvflux unfortunately upstream wants to deprecate FQDN for endpoint slices - see https://github.com/kubernetes/kubernetes/issues/114210

I don't think there is an alternative at the moment

rvowles commented 1 year ago

Yes, we ran smack into that issue with cilium when trying to create consistent database service redirection. So is there an alternative under discussion by any chance?

dprotaso commented 1 year ago

So is there an alternative under discussion by any chance?

I'm not aware of an alternate proposal in the k8s upstream.

dprotaso commented 7 months ago

FYI - I consider this blocked on the result of this discussion - so far there doesn't seem to be an alternative https://github.com/kubernetes/kubernetes/issues/114210