prometheus-community / helm-charts

Prometheus community Helm charts
Apache License 2.0
5.12k stars 5.03k forks source link

[kube-prometheus-stack] Calling admission webhook fails with network policy enabled #3810

Open Jeroen0494 opened 1 year ago

Jeroen0494 commented 1 year ago

Describe the bug a clear and concise description of what the bug is.

When using TLS with the Prometheus operator, port 443 is missing from the network policy. This causes issues during upgrades because the Admission Controller cannot access the operator.

Error: UPGRADE FAILED: cannot patch "prometheus-kubernetes-system-apiserver" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": EOF && cannot patch "prometheus-kubernetes-system-controller-manager" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused && cannot patch "prometheus-kubernetes-system-kube-proxy" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused && cannot patch "prometheus-kubernetes-system-kubelet" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused && cannot patch "prometheus-kubernetes-system" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused && cannot patch "prometheus-node-exporter.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused && cannot patch "prometheus-node-network" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused && cannot patch "prometheus-node.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused && cannot patch "prometheus-prometheus-operator" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused && cannot patch "prometheus-prometheus" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused
Error: UPGRADE FAILED: cannot patch "prometheus-node-exporter.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused && cannot patch "prometheus-node-network" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused && cannot patch "prometheus-node.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused && cannot patch "prometheus-prometheus-operator" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator.monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": dial tcp 10.233.60.97:443: connect: connection refused

What's your helm version?

version.BuildInfo{Version:"v3.12.0", GitCommit:"c9f554d75773799f72ceef38c51210f1842a1dea", GitTreeState:"clean", GoVersion:"go1.20.3"}

What's your kubectl version?

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:58:30Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v4.5.7

Which chart?

kube-prometheus-stack

What's the chart version?

51.0.3

What happened?

The Admission controller tries to connect to the Operator, but the connection is refused. Currently, only the containerPort is set in the network policy, but not port 443 as specified in the service.

What you expected to happen?

The Admission controller to be able to connect to the operator

How to reproduce it?

Default deployment with helm.

How to fix it

Port 443 to be added to the network policy. https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/prometheus-operator/networkpolicy.yaml

  ingress:
    - ports:
      {{- if .Values.prometheusOperator.tls.enabled }}
      - port: {{ .Values.prometheusOperator.tls.internalPort }}
      - port: 443 <-- add this value
      {{- else }}
      - port: 8080
      {{- end }}

Enter the changed values of values.yaml?

prometheusOperator:
  enabled: true

  ## Prometheus-Operator v0.39.0 and later support TLS natively.
  ##
  tls:
    enabled: true
    # Value must match version names from https://golang.org/pkg/crypto/tls/#pkg-constants
    tlsMinVersion: VersionTLS13
    # The default webhook port is 10250 in order to work out-of-the-box in GKE private clusters and avoid adding firewall rules.
    internalPort: 10250

  ## Admission webhook support for PrometheusRules resources added in Prometheus Operator 0.30 can be enabled to prevent incorrectly formatted
  ## rules from making their way into prometheus and potentially preventing the container from starting
  admissionWebhooks:
    failurePolicy: Fail
    ## The default timeoutSeconds is 10 and the maximum value is 30.
    timeoutSeconds: 10
    enabled: true
    ## A PEM encoded CA bundle which will be used to validate the webhook's server certificate.
    ## If unspecified, system trust roots on the apiserver are used.
    caBundle: ""
    ## If enabled, generate a self-signed certificate, then patch the webhook configurations with the generated data.
    ## On chart upgrades (or if the secret exists) the cert will not be re-generated. You can use this to provide your own
    ## certs ahead of time if you wish.
    ##
    annotations: {}
    #   argocd.argoproj.io/hook: PreSync
    #   argocd.argoproj.io/hook-delete-policy: HookSucceeded
    patch:
      enabled: true
      image:
        registry: k8s.gcr.io
        repository: ingress-nginx/kube-webhook-certgen
        tag: v1.3.0
        sha: ""
        pullPolicy: IfNotPresent
      resources:
        limits:
         memory: 1Gi
        requests:
          cpu: 1m
          memory: 190Mi

      ## SecurityContext holds pod-level security attributes and common container settings.
      ## This defaults to non root user with uid 2000 and gid 2000. *v1.PodSecurityContext  false
      ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
      ##
      securityContext:
        runAsGroup: 2000
        runAsNonRoot: true
        runAsUser: 2000

    # Security context for create job container
    createSecretJob:
      securityContext:
        allowPrivilegeEscalation: false

      # Security context for patch job container
    patchWebhookJob:
      securityContext:
        allowPrivilegeEscalation: false

  networkPolicy:
    ## Enable creation of NetworkPolicy resources.
    ##
    enabled: true

Enter the command that you execute and failing/misfunctioning.

/usr/local/bin/helm --version=51.0.3 upgrade -i --reset-values --wait --create-namespace -f=/tmp/tmpdxyid3ou.yml kube-prometheus-stack prometheus-community/kube-prometheus-stack

Anything else we need to know?

No response

Jeroen0494 commented 11 months ago

Kind reminder, please review this bug report.