prometheus-community / helm-charts

Prometheus community Helm charts
Apache License 2.0
5.06k stars 5.01k forks source link

[prometheus-kube-stack] Failed to call webhook #1762

Closed fdarif closed 2 years ago

fdarif commented 2 years ago

Describe the bug a clear and concise description of what the bug is.

I am trying to install the kube-prometheus-stack from the officiel prometheus-community helm repo, I am using the command below helm install <release-name> prometheus-community/kube-prometheus-stack -n <namespace>-f values.yaml . And the error below has been occured. Error: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://kube-prometheus-kube-prome-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": service "kube-prometheus-kube-prome-operator" not found

What's your helm version?

Version:"v3.6.2"

What's your kubectl version?

Client Version: v1.21.1 Server Version: v1.22.4

Which chart?

prometheus-community/kube-prometheus-stack

What's the chart version?

kube-prometheus-stack-30.2.0

What happened?

No response

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

operator:
  affinity: {}
  configReloaderResources: {}
  containerSecurityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL
    enabled: true
    readOnlyRootFilesystem: false
    runAsNonRoot: true
  enabled: true
  hostAliases: []
  image:
    pullPolicy: IfNotPresent
    pullSecrets: []
    registry: docker.io
    repository: bitnami/prometheus-operator
    tag: 0.50.0-debian-10-r8
  kubeletService:
    enabled: true
    namespace: kube-system
  livenessProbe:
    enabled: true
    failureThreshold: 6
    initialDelaySeconds: 120
    path: /metrics
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 5
  logFormat: logfmt
  logLevel: info
  nodeAffinityPreset:
    key: ""
    type: ""
    values: []
  nodeSelector: {}
  podAffinityPreset: ""
  podAntiAffinityPreset: soft
  podSecurityContext:
    enabled: true
    fsGroup: 1002
    runAsUser: 1002
  priorityClassName: ""
  prometheusConfigReloader:
    containerSecurityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      enabled: true
      readOnlyRootFilesystem: false
      runAsNonRoot: true
    image: {}
    livenessProbe:
      enabled: true
      failureThreshold: 6
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    readinessProbe:
      enabled: true
      failureThreshold: 6
      initialDelaySeconds: 15
      periodSeconds: 20
      successThreshold: 1
      timeoutSeconds: 5
  readinessProbe:
    enabled: true
    failureThreshold: 6
    initialDelaySeconds: 30
    path: /metrics
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 5
  resources: {}
  schedulerName: ""
  service:
    annotations: {}
    clusterIP: ""
    externalTrafficPolicy: Cluster
    healthCheckNodePort: ""
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    nodePort: ""
    port: 8080
    type: ClusterIP
  serviceAccount:
    create: false
    name: ""
  serviceMonitor:
    enabled: true
    interval: ""
    metricRelabelings: []
    relabelings: []
  tolerations: []

Enter the command that you execute and failing/misfunctioning.

None

Anything else we need to know?

None

obvionaoe commented 2 years ago

I also get this error when I disable TLS:

prometheusOperator:
  tls:
    enabled: false
  admissionWebhooks:
    enabled: false

Chart version: 32.2.1

fdarif commented 2 years ago

Hey @obvionaoe Thanks for your comment, I will check that and try to deploy it again

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

obvionaoe commented 2 years ago

Not stale, still happens

hostalp commented 2 years ago

Did you perform a clean install? I encountered the same issue when installing the kube-prometheus-stack chart to an environment where it was previously installed (and multiple timess), then deleted. After digging into it I found that there were still some old resources left from the previous installs, including mutatingwebhookconfigurations and validatingwebhookconfigurations that are directly related to this.

After cleaning up as much as possible (namespace into which the kube-prometheus-stack was installed, all kube-prometheus-* services , all *.monitoring.coreos.com crds, all kube-prometheus-* mutatingwebhookconfigurations and validatingwebhookconfigurations) I was able to install it successfully

scratchmex commented 2 years ago

I want to second @hostalp and say that it took me a very long time to debug this. At the end I tried searching all resources using the release name following this https://github.com/kubernetes/kubectl/issues/151#issuecomment-1095735493. I used this command:

kubectl api-resources --verbs=list -o name \
  | xargs -n 1 kubectl get --show-kind --ignore-not-found -n kube-system -o name \
  | grep <helm-release-name> \
  | xargs -n 1 kubectl delete -A
stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

hostalp commented 2 years ago

keep-alive 👍

stale[bot] commented 2 years ago

This issue is being automatically closed due to inactivity.

obvionaoe commented 2 years ago

please reopen, it still hasn't been fixed

hostalp commented 2 years ago

Right, This aggressive closing of open issues isn't really useful here, the issue didn't magically disappear in the meantime.

nrekretep commented 1 year ago

Any updates on this one? We are currently running into the same issue.

gplessis commented 1 year ago

I faced the same issue today while deploying a fresh kube-prometheus-stack Helm release to a cluster I inherited.

Someone has installed a prior version of the chart on this cluster - in a different namespace and with another name - and cleaned it up manually (as in "not using helm uninstall"), leaving a few non-namespaces resources over, such as admission and mutating webhooks. The underlying services not being present anymore, my installation failed.

Manually deleting the old mutating and admission webhooks from the previous installation fixed my problem.