kubernetes-sigs / network-policy-api

This repo addresses further work involving Kubernetes network security beyond the initial NetworkPolicy resource
Apache License 2.0
50 stars 28 forks source link

[ENHANCEMENT] More protocols support #187

Open npinaeva opened 6 months ago

npinaeva commented 6 months ago

Is your enhancement request related to a problem? Please describe. Currently only TCP, UDP, and SCTP protocols are supported, but there are more protocols (ICMP, ICMPv6 are the most popular requests) that may be useful. An example use case is "I want to only allow ICMP connections to implement health monitoring and deny everything else."

Describe the solution you'd like To potentially implement it in the future, we may need to re-consider AdminNetworkPolicyPort https://github.com/kubernetes-sigs/network-policy-api/blob/main/apis/v1alpha1/shared_types.go#L52 design, which puts protocol inside the port definition, while some protocols don't have ports, it may be difficult to expand.

Describe alternatives you've considered We could add an extra protocols field at the same level as ports https://github.com/kubernetes-sigs/network-policy-api/blob/main/apis/v1alpha1/adminnetworkpolicy_types.go#L151, but that will be confusing.

Example solution:

type AdminNetworkPolicyProtocol struct {
    NamedPort *string `json:"namedPort,omitempty"`

    TCP *PortProtocol `json:"TCP,omitempty"`
    UDP *PortProtocol `json:"UDP,omitempty"`
    SCTP *PortProtocol `json:"SCTP,omitempty"`
        // may be added in the future as
    ICMP *SimpleProtocol `json:"ICMP,omitempty"`
}

type SimpleProtocol struct {}

type PortProtocol struct {
    Ports *[]int32 `json:"ports,omitempty"`
    PortRanges *[]PortRange `json:"portRanges,omitempty"`
}

type PortRange struct {
    Start int32 `json:"start"`
    End int32 `json:"end"`
}

then the current ports spec

ports:
  - namedPort: containerPort
  - portNumber:
      protocol: TCP
      port: 1111
  - portNumber:
      protocol: TCP
      port: 2222
  - portRange:
      protocol: UDP
      start: 1
      end: 9999
  - portRange:
      protocol: SCTP
      start: 1
      end: 65535

may look like

protocols:
  - namedPort: containerPort
  - TCP:
      ports: [1111, 2222]
  - UDP:
      portRanges:
        - start: 1
          end: 9999
  - SCTP: {}
sftim commented 5 months ago

As a user, someone may want this extra support to be available for people who only have namespace-level write access.

danwinship commented 5 months ago

Making "protocols" top-level instead of "ports" seems a little less convenient/understandable for the common case.

But while we're thinking about how to rearrange things, I've always found ports really weird in NP because its behavior is asymmetric; in an ingress rule, ports effectively becomes part of the subject, while in an egress rule, it becomes part of the peer. Especially, when named ports are used, the ports section can affect which pods are part of the subject/peer set (because pods without any matching named port are considered non-matching for the rule).

I don't think there's any easy way to fix that as long as a single policy can have all of (a) a subject, (b) ingress peers, (c) egress peers. If we said that policies had to be either ingress-only or egress-only, then we could move the ports to be part of the subject section in ingress policies, and leave them with the peers in egress policies. But that might be annoying.

OTOH, the whole idea of having separate "ingress" and "egress" rules (rather than "transgress" ("transit"?) rules) doesn't make as much sense in ANP as in plain NP. With plain NP, you can only write rules with reference to your own pods, so the owner of namespace A can say "pods in namespace A don't accept most traffic, but they accept traffic from pods in namespace B", and the owner of namespace B can say "pods in namespace B can't send most traffic, but they can send traffic to pods in namespace A", but neither can specify the other. With ANP OTOH, the admin ought to be able to just say "traffic is allowed from pods in namespace B to pods in namespace A" without having to think about ingress vs egress...

So... instead of having "subject", "ingress", and "egress", an ANP could just have a list of rules, where each rule has a Source (which can be a Pods or Namespaces), and a Destination (which can be Pods, Namespaces, Nodes, or Networks). And then the Ports/Protocols obviously are associated with the Destination section only.

danwinship commented 5 months ago

there are more protocols (ICMP, ICMPv6 are the most popular requests) that may be useful. An example use case is "I want to only allow ICMP connections to implement health monitoring and deny everything else."

We need to be careful here: the Kubernetes Network Model does not specify anything about ICMP, and thus network plugins are allowed to implement the pod network in ways that don't give them full control over ICMP traffic. In particular, if pod-to-pod traffic goes unencapsulated over an underlying cloud network, there may not be any API that allows the network plugin to Deny ICMP traffic, and there may be no way to prevent the cloud network from denying ICMP traffic that the admin wants to Allow.

I guess that means we really do need "Extended" or whatever...

sftim commented 5 months ago

The network plugin usually has some lever to block packets, eg nftables locally to the node.

If the challenge is allowing traffic, that's different. But… not much different to using a NetworkPolicy to allow a Pod to make an egress connection on tcp/25, and then finding that a separate policy blocks that traffic. “Allow” doesn't mean “guaranteed to reach the destination” after all.

danwinship commented 5 months ago

The network plugin usually has some lever to block packets, eg nftables locally to the node.

When using something like amazon-vpc-cni-k8s, packets go directly from pods to the cloud network without passing through the host network namespace of the node. In that case, NP is implemented using AWS APIs rather than Linux kernel APIs. (Of course, AWS does let you specify ICMP rules, so this isn't a perfect example, but you get the idea.)

Anyway what I was saying about "Extended" is that there's this idea that there might be some parts of the ANP API that we say not all plugins will support, and I had been arguing before that we should get rid of that idea (which was copied from Gateway where it's more necessary), but if we're going to add ICMP rules, we might need to define it in a way that implementations aren't required to support it (though then we also need a way for users to figure out if their implementation supports it).

sftim commented 5 months ago

I'm thinking admission-time because that's when people would actually want to get feedback (ie, putting it in .status a few milliseconds later doesn't help nearly as much; nor does firing an Event). Also, you can get the warning or rejection from a dry-run, whereas the other two mechanisms wouldn't show up.

npinaeva commented 5 months ago

For now we don't have any fields in ANP that are "extended", but I think the protocols use case is a good example of why we may need this mechanism (together with a mechanism to report unsupported configs).

So... instead of having "subject", "ingress", and "egress", an ANP could just have a list of rules, where each rule has a Source (which can be a Pods or Namespaces), and a Destination (which can be Pods, Namespaces, Nodes, or Networks).

I like this idea, but that will be a very big change to the existing API (that we should do before going beta). Changing only the way ports are expressed is a smaller change, but the deadline is the same. So @danwinship do you think we should fully reconsider our ingress/egress design? I will bring it up on the next meeting.

danwinship commented 5 months ago

Making "protocols" top-level instead of "ports" seems a little less convenient/understandable for the common case.

Suggested in the meeting this week: we could keep ports as it is, but also add protocols for other stuff, which would currently be just ICMP.

We'd need to figure out the semantics though... the simplest thing would be to say that the ports and protocols sections are mutually exclusive.

npinaeva commented 5 months ago

Why do you think they should be mutually exclusive? (btw do we say somewhere in ANP docs to which protocols it applies when no protocols are specificed?)

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

astoycos commented 1 month ago

/remove-lifecycle stale

danwinship commented 1 month ago

Why do you think they should be mutually exclusive?

Hm... I guess it doesn't have to be

(btw do we say somewhere in ANP docs to which protocols it applies when no protocols are specificed?)

no, I don't think so

aojea commented 6 days ago

Ok, what are the expectations today for network policy implementation, should they only care about the possible protocols that can be defined SCTP, TCP, UDP or should block everything if no port/protocol is specified ?

danwinship commented 6 days ago

NetworkPolicy only describes the behavior at L4 (and only for unicast, and only for the pod network's supported IP families). It's undefined what happens at lower levels, other than that the NetworkPolicy can't block traffic that is needed to make higher levels work. So if you have a policy that says "Pod A can't talk to Pod B at all", the implementation could choose to block all L2, L3, and L4 traffic between them, but if you have "Pod A can talk to Pod B only on port 80", then you implicitly need to also allow ARP traffic between them, and probably should allow ICMP traffic, etc.

aojea commented 6 days ago

Ok, then I think that "allow what you don't know about by default" on the implementation is the less surprising behavior for users

sftim commented 6 days ago

Ok, then I think that "allow what you don't know about by default" on the implementation is the less surprising behavior for users.

This will surprise people, even if it's allowed by the undefined-ness. For example, a cluster operator might expect that an untrusted Pod should not be allowed to flood-ping a Pod when NetworkPolicy is used and nothing grants it.

The surprises will happen. If we can warn users about those surprises (eg a message from kubectl, great. If we can't, then we really do want even better documentation and some cautionary notes about making sure to read it.

aojea commented 6 days ago

This will surprise people, even if it's allowed by the undefined-ness. For example, a cluster operator might expect that an untrusted Pod should not be allowed to flood-ping a Pod when NetworkPolicy is used and nothing grants it.

I see, good counter example ... We may need to discuss this in a widely audience? @thockin @danwinship @shaneutt @khenidak some thoughts on this?

astoycos commented 6 days ago

It's undefined what happens at lower levels, other than that the NetworkPolicy can't block traffic that is needed to make higher levels work.

Can we define this for ANP and other objects moving forward @danwinship? i.e if you're blocking L4 traffic also explicitly block L1-L3 and if you're allowing L4 traffic explicitly also allow L1-L3. That would at least make sure we fail closed + would still allow us to select lower protocols (L3 etc) in the future with the "default" always being deny

Also could we define it differently moving forward for NPV1, because I would imagine many implementations actually do implement it like so ^^

aojea commented 6 days ago

let's get some real feedback, what are Calico and Cilium doing today @nathanjsweet @caseydavenport ?

npinaeva commented 5 days ago

btw we did update the docs somewhat on that matter https://kubernetes.io/docs/concepts/services-networking/network-policies/#network-traffic-filtering

aojea commented 5 days ago

yeah, but if undefined happen to be the same behavior in antrea, cilium, calico , ovn .... 😄 have we done that analysis?

danwinship commented 14 hours ago

This will surprise people

I'm describing existing behavior; if it's surprising, it's already surprising. (And as Nadia pointed out, we document this now.)

For example, a cluster operator might expect that an untrusted Pod should not be allowed to flood-ping a Pod when NetworkPolicy is used

The pod network implementation absolutely can implement that behavior if it wants to. It can also rate-limit (or completely prohibit) pings even when there aren't NPs. It can also provide third-party APIs to allow configuring this. At the present time, Kubernetes does not impose any requirements around ICMP, so implementations can do anything at all, so long as they don't break L4 traffic that was supposed to work.

Can we define this for ANP and other objects moving forward

As with #230, this would expand the set of "powers" that all ANP implementations are required to have and so would need to be considered carefully (https://github.com/kubernetes-sigs/network-policy-api/issues/187#issuecomment-1904272046).

i.e if you're blocking L4 traffic also explicitly block L1-L3

(The details are more complicated than that.) (Also, L1 is the physical wires or radio waves. You don't need to block that. :joy:)

Also could we define it differently moving forward for NPV1, because I would imagine many implementations actually do implement it like so ^^

We can certainly recommend a particular implementation (for NP or ANP). We definitely can't require anything specific for NPv1 at this point.

nathanjsweet commented 13 hours ago

Cilium has an entirely separate CRD and policy system for ICMP filtration. It has support for IP family selection and ICMP message type (an OR'd list). The Cilium code base could accommodate K8s putting ICMP enforcement somewhere in the NetworkPolicy resource though.

caseydavenport commented 9 hours ago

So if you have a policy that says "Pod A can't talk to Pod B at all", the implementation could choose to block all L2, L3, and L4 traffic between them

This is the approach Calico takes. We consider a "default deny" state to apply to all L3 and above traffic from the pod (we don't block L2 / ARP, most Calico networks are point-to-point so pods aren't communicating on L2 other than to their hosts). If it's not explicitly allowed, then it is denied.

Calico does provide its own APIs to allow more precise selection of other types of traffic not supported in k8s NetworkPolicy (e.g., ICMP).