Open tssurya opened 4 months ago
/cc @joestringer
There are two things we are trying to clarify here:
Networks
peer and services and policies.For clusterIPs and externalIPs we know that policy will be applied after the packet is DNATed a.k.a rewritten, so having a service is actually like not having any. However the tricky part is with loadBalancers and this is what we are trying to address in this issue.
The text we have for case1:
// Note that presence of a Service object with this policy subject as its backend
// has no impact on the behavior of the policy applied to the peer
// trying to talk to the Service. It will work in the same way as if the
// Service didn't exist since policy is applied after ServiceVIP (clusterIP,
// externalIP, loadBalancerIngressIP) is rewritten to the backendIPs.
//
AND
//
// Note that presence of a Service object with this peer as its backend
// has no impact on the behavior of the policy applied to the subject
// trying to talk to the Service. It will work in the same way as if the
// Service didn't exist since policy is applied after ServiceVIP (clusterIP,
// externalIP, loadBalancerIngressIP) is rewritten to the backendIPs.
//
Text we have for case2:
Note that because policies are applied after Service VIPs (clusterIPs, externalIPs,
load balancer IPs) are rewritten to endpoint IPs, a Networks selector cannot match
such a VIP. For example, a Networks selector that denies traffic to the entire
service CIDR will not actually block any service traffic.
IN all these cases the wordings are wrong for load balancers because the way AWS works using hostnames where traffic leaves the cluster probably cannot be predicted at all as to what happens with the policy versus for GCP/Azure where we can hairpin the DNAT within the cluster without sending it out will work differently.
I'd like to capture what was said here https://github.com/kubernetes-sigs/network-policy-api/pull/185#discussion_r1506114425 by @danwinship
Unfortunately, this isn't quite right; ClusterIPs and ExternalIPs are always rewritten before NP enforcement, but load balancer IPs sometimes aren't; in some cases, pod-to-LoadBalancer traffic must be sent to the load balancer rather than being processed locally. (This would happen either if the CloudProvider sets hostname rather than ip in the service's LoadBalancerStatus (like AWS), or if it sets ip but also uses the new ipMode: Proxy.)
And once a packet leaves the node and goes to an external load balancer, then all bets are off. When the packet comes back into the cluster, it's not even theoretically possible for the ANP implementation to reliably figure out its original source IP (since the LB may have transformed the connection in arbitrary ways).
(And of course, if the Service has endpoint IPs that are cluster-external, then a pod-to-LoadBalancer connection that goes to the LB would presumably then go directly to the cluster-external IPs, without coming back into the cluster at all, so there's no way the ANP implementation could enforce a Networks selector that blocked cluster-external IPs at that point.)
So I can think of four options:
FTR, note that in the cluster-ingress case (which is even worse), I feel like there's a chance the answer will end up being "delegate policy enforcement to the Gateway rather than trying to do it in ANP". I'm not sure if that has any implications for trying to decide what to do here.
(People may also want to consider how they expect ANPs to interact with other implementation-specific egress functionality (eg, EgressIPs), and whether that implies anything about the best choice for LB behavior.)
@joestringer and Nathan Sweet also shared how this works in Cilium world via https://github.com/kubernetes-sigs/network-policy-api/pull/185#discussion_r1506883310 and https://github.com/kubernetes-sigs/network-policy-api/pull/185#discussion_r1506848868
One small note I'd make is that I think that for policy ingress towards a Pod this is simple: I would never expect a service IP to be the peer for the subject of an ingress policy. For the traffic to arrive at the subject of the policy, it must be routed there, and typically that routing decision must be made using the actual IP of the subject. So it's mainly the egress policy case that is ambiguous.
One small note I'd make is that I think that for policy ingress towards a Pod this is simple: I would never expect a service IP to be the peer for the subject of an ingress policy. For the traffic to arrive at the subject of the policy, it must be routed there, and typically that routing decision must be made using the actual IP of the subject. So it's mainly the egress policy case that is ambiguous.
So I had the exact same thoughts at first, but really its the flip case for ingress where the peer pod tries to talk to the serviceVIP which has the subject pod as the backend right? In that case if we have a policy in place between the peer pod and subject pod, then it won't really respect this serviceVIP....
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
What happened: Meeting notes from 27th Feb 2024:
RE: policies and services : In Network Policies
Cluster ingress and egress mechanisms often require rewriting the source or destination IP of packets. In cases where this happens, it is not defined whether this happens before or after NetworkPolicy processing, and the behavior may be different for different combinations of network plugin, cloud provider, Service implementation, etc.
For egress, this means that connections from pods to Service IPs that get rewritten to cluster-external IPs may or may not be subject to ipBlock-based policies.
We should certainly make pod2pod work even if its through the services (clusterIP)
we should get rid of “undefined” and “unspecified” behaviour for ANP; we shouldn’t have the same ambiguity as we have for NetPol
Svc rewrite happens before the netpol application => rule that matches service IPs has no effect since it anyways doesn’t or won’t match on that!
User case: block users from connecting to x service => block the endpoints instead?
CONSENSUS: Write, don't do it for service VIPs! => clarify which IPs? clusterIPs, externalIPs, loadBalancerVIPs
Let’s also update the NPEP and ensure we call this out!
Cilium’s implementation ignores CIDR block totally for internal traffic
Cannot define pods by IP
Only labels
What you expected to happen:
We are trying to clarify at least the egress bits in the API change here: https://github.com/kubernetes-sigs/network-policy-api/pull/185
See ideas from @danwinship 's comment here: https://github.com/kubernetes-sigs/network-policy-api/pull/185#discussion_r1506114425
We need to zoom in on an agreement and get that done in a separate PR so that the original PR can move forward first.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: