Open npinaeva opened 10 months ago
As a user, someone may want this extra support to be available for people who only have namespace-level write access.
Making "protocols" top-level instead of "ports" seems a little less convenient/understandable for the common case.
But while we're thinking about how to rearrange things, I've always found ports
really weird in NP because its behavior is asymmetric; in an ingress rule, ports
effectively becomes part of the subject, while in an egress
rule, it becomes part of the peer. Especially, when named ports are used, the ports
section can affect which pods are part of the subject/peer set (because pods without any matching named port are considered non-matching for the rule).
I don't think there's any easy way to fix that as long as a single policy can have all of (a) a subject, (b) ingress peers, (c) egress peers. If we said that policies had to be either ingress-only or egress-only, then we could move the ports to be part of the subject section in ingress policies, and leave them with the peers in egress policies. But that might be annoying.
OTOH, the whole idea of having separate "ingress" and "egress" rules (rather than "transgress" ("transit"?) rules) doesn't make as much sense in ANP as in plain NP. With plain NP, you can only write rules with reference to your own pods, so the owner of namespace A can say "pods in namespace A don't accept most traffic, but they accept traffic from pods in namespace B", and the owner of namespace B can say "pods in namespace B can't send most traffic, but they can send traffic to pods in namespace A", but neither can specify the other. With ANP OTOH, the admin ought to be able to just say "traffic is allowed from pods in namespace B to pods in namespace A" without having to think about ingress vs egress...
So... instead of having "subject", "ingress", and "egress", an ANP could just have a list of rules, where each rule has a Source (which can be a Pods or Namespaces), and a Destination (which can be Pods, Namespaces, Nodes, or Networks). And then the Ports/Protocols obviously are associated with the Destination section only.
there are more protocols (ICMP, ICMPv6 are the most popular requests) that may be useful. An example use case is "I want to only allow ICMP connections to implement health monitoring and deny everything else."
We need to be careful here: the Kubernetes Network Model does not specify anything about ICMP, and thus network plugins are allowed to implement the pod network in ways that don't give them full control over ICMP traffic. In particular, if pod-to-pod traffic goes unencapsulated over an underlying cloud network, there may not be any API that allows the network plugin to Deny ICMP traffic, and there may be no way to prevent the cloud network from denying ICMP traffic that the admin wants to Allow.
I guess that means we really do need "Extended" or whatever...
The network plugin usually has some lever to block packets, eg nftables locally to the node.
If the challenge is allowing traffic, that's different. But… not much different to using a NetworkPolicy to allow a Pod to make an egress connection on tcp/25, and then finding that a separate policy blocks that traffic. “Allow” doesn't mean “guaranteed to reach the destination” after all.
The network plugin usually has some lever to block packets, eg nftables locally to the node.
When using something like amazon-vpc-cni-k8s, packets go directly from pods to the cloud network without passing through the host network namespace of the node. In that case, NP is implemented using AWS APIs rather than Linux kernel APIs. (Of course, AWS does let you specify ICMP rules, so this isn't a perfect example, but you get the idea.)
Anyway what I was saying about "Extended" is that there's this idea that there might be some parts of the ANP API that we say not all plugins will support, and I had been arguing before that we should get rid of that idea (which was copied from Gateway where it's more necessary), but if we're going to add ICMP rules, we might need to define it in a way that implementations aren't required to support it (though then we also need a way for users to figure out if their implementation supports it).
I'm thinking admission-time because that's when people would actually want to get feedback (ie, putting it in .status
a few milliseconds later doesn't help nearly as much; nor does firing an Event). Also, you can get the warning or rejection from a dry-run, whereas the other two mechanisms wouldn't show up.
For now we don't have any fields in ANP that are "extended", but I think the protocols use case is a good example of why we may need this mechanism (together with a mechanism to report unsupported configs).
So... instead of having "subject", "ingress", and "egress", an ANP could just have a list of rules, where each rule has a Source (which can be a Pods or Namespaces), and a Destination (which can be Pods, Namespaces, Nodes, or Networks).
I like this idea, but that will be a very big change to the existing API (that we should do before going beta). Changing only the way ports are expressed is a smaller change, but the deadline is the same. So @danwinship do you think we should fully reconsider our ingress/egress design? I will bring it up on the next meeting.
Making "protocols" top-level instead of "ports" seems a little less convenient/understandable for the common case.
Suggested in the meeting this week: we could keep ports
as it is, but also add protocols
for other stuff, which would currently be just ICMP.
We'd need to figure out the semantics though... the simplest thing would be to say that the ports
and protocols
sections are mutually exclusive.
Why do you think they should be mutually exclusive? (btw do we say somewhere in ANP docs to which protocols it applies when no protocols are specificed?)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Why do you think they should be mutually exclusive?
Hm... I guess it doesn't have to be
(btw do we say somewhere in ANP docs to which protocols it applies when no protocols are specificed?)
no, I don't think so
Ok, what are the expectations today for network policy implementation, should they only care about the possible protocols that can be defined SCTP, TCP, UDP or should block everything if no port/protocol is specified ?
NetworkPolicy only describes the behavior at L4 (and only for unicast, and only for the pod network's supported IP families). It's undefined what happens at lower levels, other than that the NetworkPolicy can't block traffic that is needed to make higher levels work. So if you have a policy that says "Pod A can't talk to Pod B at all", the implementation could choose to block all L2, L3, and L4 traffic between them, but if you have "Pod A can talk to Pod B only on port 80", then you implicitly need to also allow ARP traffic between them, and probably should allow ICMP traffic, etc.
Ok, then I think that "allow what you don't know about by default" on the implementation is the less surprising behavior for users
Ok, then I think that "allow what you don't know about by default" on the implementation is the less surprising behavior for users.
This will surprise people, even if it's allowed by the undefined-ness. For example, a cluster operator might expect that an untrusted Pod should not be allowed to flood-ping a Pod when NetworkPolicy is used and nothing grants it.
The surprises will happen. If we can warn users about those surprises (eg a message from kubectl
, great. If we can't, then we really do want even better documentation and some cautionary notes about making sure to read it.
This will surprise people, even if it's allowed by the undefined-ness. For example, a cluster operator might expect that an untrusted Pod should not be allowed to flood-ping a Pod when NetworkPolicy is used and nothing grants it.
I see, good counter example ... We may need to discuss this in a widely audience? @thockin @danwinship @shaneutt @khenidak some thoughts on this?
It's undefined what happens at lower levels, other than that the NetworkPolicy can't block traffic that is needed to make higher levels work.
Can we define this for ANP and other objects moving forward @danwinship? i.e if you're blocking L4 traffic also explicitly block L1-L3 and if you're allowing L4 traffic explicitly also allow L1-L3. That would at least make sure we fail closed + would still allow us to select lower protocols (L3 etc) in the future with the "default" always being deny
Also could we define it differently moving forward for NPV1, because I would imagine many implementations actually do implement it like so ^^
let's get some real feedback, what are Calico and Cilium doing today @nathanjsweet @caseydavenport ?
btw we did update the docs somewhat on that matter https://kubernetes.io/docs/concepts/services-networking/network-policies/#network-traffic-filtering
yeah, but if undefined happen to be the same behavior in antrea, cilium, calico , ovn .... 😄 have we done that analysis?
This will surprise people
I'm describing existing behavior; if it's surprising, it's already surprising. (And as Nadia pointed out, we document this now.)
For example, a cluster operator might expect that an untrusted Pod should not be allowed to flood-ping a Pod when NetworkPolicy is used
The pod network implementation absolutely can implement that behavior if it wants to. It can also rate-limit (or completely prohibit) pings even when there aren't NPs. It can also provide third-party APIs to allow configuring this. At the present time, Kubernetes does not impose any requirements around ICMP, so implementations can do anything at all, so long as they don't break L4 traffic that was supposed to work.
Can we define this for ANP and other objects moving forward
As with #230, this would expand the set of "powers" that all ANP implementations are required to have and so would need to be considered carefully (https://github.com/kubernetes-sigs/network-policy-api/issues/187#issuecomment-1904272046).
i.e if you're blocking L4 traffic also explicitly block L1-L3
(The details are more complicated than that.) (Also, L1 is the physical wires or radio waves. You don't need to block that. :joy:)
Also could we define it differently moving forward for NPV1, because I would imagine many implementations actually do implement it like so ^^
We can certainly recommend a particular implementation (for NP or ANP). We definitely can't require anything specific for NPv1 at this point.
Cilium has an entirely separate CRD and policy system for ICMP filtration. It has support for IP family selection and ICMP message type (an OR'd list). The Cilium code base could accommodate K8s putting ICMP enforcement somewhere in the NetworkPolicy resource though.
So if you have a policy that says "Pod A can't talk to Pod B at all", the implementation could choose to block all L2, L3, and L4 traffic between them
This is the approach Calico takes. We consider a "default deny" state to apply to all L3 and above traffic from the pod (we don't block L2 / ARP, most Calico networks are point-to-point so pods aren't communicating on L2 other than to their hosts). If it's not explicitly allowed, then it is denied.
Calico does provide its own APIs to allow more precise selection of other types of traffic not supported in k8s NetworkPolicy (e.g., ICMP).
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
Is your enhancement request related to a problem? Please describe. Currently only TCP, UDP, and SCTP protocols are supported, but there are more protocols (ICMP, ICMPv6 are the most popular requests) that may be useful. An example use case is "I want to only allow ICMP connections to implement health monitoring and deny everything else."
Describe the solution you'd like To potentially implement it in the future, we may need to re-consider
AdminNetworkPolicyPort
https://github.com/kubernetes-sigs/network-policy-api/blob/main/apis/v1alpha1/shared_types.go#L52 design, which puts protocol inside the port definition, while some protocols don't have ports, it may be difficult to expand.Describe alternatives you've considered We could add an extra
protocols
field at the same level asports
https://github.com/kubernetes-sigs/network-policy-api/blob/main/apis/v1alpha1/adminnetworkpolicy_types.go#L151, but that may be confusing.Example solution:
then the current ports spec
may look like