kubernetes-sigs / network-policy-api

This repo addresses further work involving Kubernetes network security beyond the initial NetworkPolicy resource
Apache License 2.0
54 stars 29 forks source link

[BUG] ANP description disables KNP denies on upgrade #234

Open matthewdupre opened 3 months ago

matthewdupre commented 3 months ago

I created this PR (https://github.com/kubernetes/enhancements/pull/4688) against the KEP repo that's apparently not used and there was some discussion so I figured I'd open this here to clarify what the intended specification is before I raise a PR against this repo.

Example To make this more concrete, let's imagine a scenario with the following three pods, which I will abbreviate later as R, G and B:

kind: Pod
metadata:
  name: red-pod
  labels:
    color: red
kind: Pod
metadata:
  name: green-pod
  labels:
    color: green
kind: Pod
metadata:
  name: blue-pod
  labels:
    color: blue

And the following Kubernetes NetworkPolicy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: red-only-from-green
spec:
  podSelector:
    matchLabels:
      color: red
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          color: green

Current (pre-ANP/BANP) behavior: The current behavior of this policy is that G->R traffic is allowed and B->R traffic is denied. Traffic to the B or G pods is allowed. G->R Allowed explicitly by KNP B->R Denied because R has a KNP that polices ingress traffic and no KNP allowed the traffic ->B Allowed by absence of KNP and BANP ->G Allowed by absence of KNP and BANP

What should happen when upgrading to a BANP capable implementation but not creating any BANPs: Surely the intention here is that the behavior doesn't change, even if the reason may change to account for the new capabilities. G->R Allowed B->R Denied ->B Allowed ->G Allowed

What does the spec actually say?: The only User Story in the site for BaselineAdminNetworkPolicy is number 5. This story fully defines what happens when there's a Deny BANP, but doesn't say anything about when there are no BANPs (or only BANPs that select pods other than R, G and B).

The BANP Resource section (https://network-policy-api.sigs.k8s.io/api-overview/#the-baselineadminnetworkpolicy-resource) implies that the behavior shouldn't change.

However, the only concrete description of what should happen to a specific connection after the KNP stage is found in the Pass section of the ANP actions (https://network-policy-api.sigs.k8s.io/api-overview/#adminnetworkpolicy-actions). It states that "If there is no K8s NetworkPolicy of BaselineAdminNetworkPolicy rule match traffic will be governed by the implementation".

Governed by the implementation typically means allowed, so this line states that in our situation all traffic that does not have a "rule match" will be allowed. In this example, that means that the behavior changes to: G->R Allowed explicitly by KNP B->R Allowed because this traffic does not match any rule in the KNP ->B Allowed by absence of KNP and BANP ->G Allowed by absence of KNP and BANP

Since KNPs only have Allow rules, the distinction between a "rule match" and "the pod is selected by at least one KNP that polices the specific ingress/egress direction in question" becomes important. I don't really expect anyone to implement this as a complete bypass of KNP, but it may confuse users later on.

I'd like to change this wording so it doesn't talk about rule matches. Assuming the purpose of a BANP is to force developers to create KNPs covering every pod, I think it makes sense to continue allowing KNPs to deny traffic (in this example B->R), and the way to do that would be to have traffic only pass to the BANPs if the source (egress) or destination (ingress) pod is not selected by any (egress / ingress respectively) KNP and not Allowed or Denied by an ANP.

Dyanngg commented 3 months ago

@matthewdupre Thanks for bringing up the issue and providing the exhaustive writeup. My thoughts:

  1. Agreed that the wording needs to be improved. On the other hand, I would like to clarify that the intention of this API is during upgrade, the original behavior of KNPs should NOT change, meaning that B->R should still be denied given that it falls under the "implicit isolation" umbrella, which means, an implicit rule created by the original KNP that says, "for any traffic coming to R, if it's not G, deny it". I would think this is what we originally meant by "rule match", which is totally not ideal.
  2. We've long been trying to add "deny" mechanism to the KNP as a subgroup, however it is already a v1 API and we're not supposed to make such disruptive changes. This is why we have a NetworkPolicy v2 initiative, and you can find more info here.
  3. Back to your original proposal(https://github.com/kubernetes/enhancements/pull/4688),
    If there is no K8s NetworkPolicy rule match, and no BaselineAdminNetworkPolicy rule match
    (more on this in the [priority section](#priority)), traffic will be governed by the implementation.

    I’d say s/rule match/selecting the pod/g is probably not enough for complete clarification. In KNP, pod selected in spec.podSelector and selected as an ingress/egress peer have different implications. As you pointed out in the writeup, the former puts some “isolation effect” on the pod, while the later does not. So in essence, if we consider a connection from G -> R, it is regulated by a KNP if at least one of the two following applies: 1) a KNP selects R in spec.podSelector and it has ingress rules 2) a KNP selects G in spec.podSelector and it has egress rules. If any of these KNPs are there, they already dictates if the traffic will be allowed/dropped. BANP won’t even be evaluated unless none of the above two scenarios apply. I believe we need to somehow convey this in the doc.

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale