hashicorp / vault-helm

Helm chart to install Vault and other associated components.
Mozilla Public License 2.0
1.08k stars 873 forks source link

Vault Injector NetworkPolicy requires additional `from` rules in OCP 4.13 #953

Closed komish closed 1 year ago

komish commented 1 year ago

This is a copy of the issue created https://github.com/openshift-helm-charts/charts/issues/934. That issue will be closed in favor of this one.

Describe the bug

We had issues in our testing with vault's MutatingWebhookConfiguration not being accessible by kube-apiserver in OCP 4.13. Per my testing, this is just because the NetworkPolicy that's created needs an extra match criteria.

The policy looks like this in your latest chart: https://github.com/openshift-helm-charts/charts/blob/main/charts/partners/hashicorp/vault/0.24.0/src/templates/injector-network-policy.yaml

The from items needs to include an additional entry allowing host network workloads to reach it.

...
spec:
  podSelector:
    ... removed for brevity ...
  ingress:
    - from:
        - namespaceSelector: {}
        - namespaceSelector:      # add this and the following lines
            matchLabels:
              policy-group.network.openshift.io/host-network: ""
...

This is documented here: https://docs.openshift.com/container-platform/4.13/networking/network_policy/about-network-policy.html#nw-networkpolicy-allow-from-hostnetwork_about-network-policy

We discovered it when chart-testing would run. The webhook would be live and the chart-test would immediately fail because the pod it creates would fail at admission (due to the webhook being inaccessible from the kube-apiserver).

To Reproduce Steps to reproduce the behavior:

  1. Run chart-testing against the existing chart on an OCP 4.13 cluster.

Expected behavior chart-testing would pass

Environment OCP 4.13

Additional context Add any other context about the problem here.

komish commented 1 year ago

One additional note, @tvoran currently has a PR to certify a new version that appears to be failing because of this issue.

https://github.com/openshift-helm-charts/charts/pull/1036

tvoran commented 1 year ago

Hi @komish, thanks for looking into this! So far I'm unable to reproduce this on an OpenShift 4.13 cluster (using openshift-installer on GCP) with chart-verifier or a manual injector example, so it would be great to get more details on the setup that's failing.

It looks to me like our existing NetworkPolicy should allow incoming traffic from all namespaces (like the example in the these docs)? So I'm a little puzzled why the extra rule is required.

komish commented 1 year ago

Interesting @tvoran. I'm running an openshift-installer provisioned cluster on AWS on 4.13, which is where the issue presented. Let me see if I can identify any cluster-specific features, and I'll report back.

komish commented 1 year ago

Just a quick note that I was able to replicate a successful chart-test execution without the host-network related NetworkPolicy addition on 4.13/GCP.

I didn't observe any CNI specific differences in the cluster I provisioned vs. the AWS cluster I used for testing. I also confirmed 4.13/AWS still has the problem.

I have open questions internally that I'm pursuing to get insights on if there are any logical platform differences at play here, and will report back when I have more on it.

I did get explicit confirmation that this is how you allow access from the APIServer to the admission webhook. The only caveat would be colocated webhook workloads with the requesting source (APIServer), in which case the network policy may not apply, but I don't think that's the case you are I encounter in our testing.

More to come as I have it.

komish commented 1 year ago

My apologies.

It seems like there's some sort of bug that exists in 4.13.4 that must have been resolved in newer Z-stream version of OCP. Testing against 4.13.12, I saw successful installations on both GCP and AWS, which prompted me to go try GCP on 4.13.4 (matches our certification cluster). I confirmed things failed there. I haven't identified an exact bug that's in scope here, but I'll peruse the bug reports at a later time.

We'll roll out updated infrastructure, test the chart to make sure it installs with this exact network policy, and then if all is well, I'll go ahead and close this out. I'll also follow up with your PR to certify the next version of your chart.

Thanks for taking a look.

komish commented 1 year ago

I rolled out the newer infrastructure and your PR cleared up https://github.com/openshift-helm-charts/charts/pull/1036. Thanks for checking this out, and apologies for the confusion.