Describe the bug a clear and concise description of what the bug is.
We are trying to upgrade the Prometheus version from 39.13.1 to 43.0.0 and while doing so getting error.
When tried to upgrade, it was getting stuck in uninstall state giving this error "helm.go:84: [debug] failed to delete release: prometheus" .
If we remove it manually and when tried to do fresh install of 43.0.0, we are getting this error " _client.go:735: [debug] Add/Modify event for prometheus-kube-prometheus-admission-create: MODIFIED
client.go:774: [debug] prometheus-kube-prometheus-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[debug]Re-evaluate condition on job cancellation for step: 'Deploy Platform'._ after this it was cancelled after some time.
AKS version is 1.25
Helm version: 3.11.3
What's your helm version?
3.11.3
What's your kubectl version?
1.25.0
Which chart?
kube-prometheus-stack
What's the chart version?
43.0.0
What happened?
We are trying to upgrade the Prometheus version from 39.13.1 to 43.0.0 and while doing so getting error.
When tried to upgrade, it was getting stuck in uninstall state giving this error "helm.go:84: [debug] failed to delete release: prometheus" .
If we remove it manually and when tried to do fresh install of 43.0.0, we are getting this error " _client.go:735: [debug] Add/Modify event for prometheus-kube-prometheus-admission-create: MODIFIED
client.go:774: [debug] prometheus-kube-prometheus-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[debug]Re-evaluate condition on job cancellation for step: 'Deploy Platform'._ after this it was cancelled after some time.
AKS version is 1.25
Helm version: 3.11.3
What you expected to happen?
That it will uninstall successfully and install the new version or directly install new version successfully after no version was present.
ListenLocal makes the Alertmanager server listen on loopback, so that it does not bind against the Pod IP.
Note this is only for the Alertmanager UI, not the gossip communication.
listenLocal: false
Containers allows injecting additional containers. This is meant to allow adding an authentication proxy to an Alertmanager pod.
containers: []
Priority class assigned to the Pods
priorityClassName: ""
AdditionalPeers allows injecting a set of additional Alertmanagers to peer with to form a highly available cluster.
additionalPeers: []
PortName to use for Alert Manager.
portName: "http-web"
ClusterAdvertiseAddress is the explicit address to advertise in cluster. Needs to be provided for non RFC1918 [1] (public) addresses. [1] RFC1918: https://tools.ietf.org/html/rfc1918
If your API endpoint address is not reachable (as in AKS) you can replace it with the kubernetes service
relabelings: []
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
##
interval: ""
jobLabel: component
selector:
matchLabels:
component: apiserver
provider: kubernetes
## metric relabel configs to apply to samples before ingestion.
##
metricRelabelings: []
kubelet:
enabled: false
namespace: kube-system
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
##
interval: ""
## Enable scraping the kubelet over https. For requirements to enable this see
## https://github.com/prometheus-operator/prometheus-operator/issues/926
##
https: true
## Enable scraping /metrics/cadvisor from kubelet's service
##
cAdvisor: true
## Enable scraping /metrics/probes from kubelet's service
##
probes: true
## Enable scraping /metrics/resource from kubelet's service
## This is disabled by default because container metrics are already exposed by cAdvisor
##
resource: false
# From kubernetes 1.18, /metrics/resource/v1alpha1 renamed to /metrics/resource
resourcePath: "/metrics/resource"
## Metric relabellings to apply to samples before ingestion
##
cAdvisorMetricRelabelings: []
probesMetricRelabelings: []
cAdvisorRelabelings:
- sourceLabels: [__metrics_path__]
targetLabel: metrics_path
probesRelabelings:
- sourceLabels: [__metrics_path__]
targetLabel: metrics_path
resourceRelabelings:
- sourceLabels: [__metrics_path__]
targetLabel: metrics_path
metricRelabelings: []
relabelings:
- sourceLabels: [__metrics_path__]
targetLabel: metrics_path
kubeControllerManager:
enabled: false
If your kube controller manager is not deployed as a pod, specify IPs it can be found on
Scrape interval. If not set, the Prometheus default scrape interval is used.
##
interval: ""
## Enable scraping kube-controller-manager over https.
## Requires proper certs (not self-signed) and delegated authentication/authorization checks
##
https: false
# Skip TLS certificate validation when scraping
insecureSkipVerify: null
# Name of the server to use when validating TLS certificate
serverName: null
## metric relabel configs to apply to samples before ingestion.
##
metricRelabelings: []
relabelings: []
Scrape interval. If not set, the Prometheus default scrape interval is used.
##
interval: ""
## Enable scraping kube-scheduler over https.
## Requires proper certs (not self-signed) and delegated authentication/authorization checks
##
https: false
## Skip TLS certificate validation when scraping
insecureSkipVerify: null
## Name of the server to use when validating TLS certificate
serverName: null
## metric relabel configs to apply to samples before ingestion.
##
metricRelabelings: []
relabelings: []
kubeProxy:
enabled: false
If your kube proxy is not deployed as a pod, specify IPs it can be found on
endpoints: []
service:
port: 10249
targetPort: 10249
selector:
# k8s-app: kube-proxy
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
##
interval: ""
## Enable scraping kube-proxy over https.
## Requires proper certs (not self-signed) and delegated authentication/authorization checks
##
https: false
## metric relabel configs to apply to samples before ingestion.
##
metricRelabelings: []
relabelings: []
kubeStateMetrics:
enabled: true
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
##
interval: ""
## metric relabel configs to apply to samples before ingestion.
##
metricRelabelings: []
relabelings: []
Use the value configured in prometheus-node-exporter.podLabels
jobLabel: jobLabel
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
##
interval: ""
## How long until a scrape request times out. If not set, the Prometheus default scape timeout is used.
##
scrapeTimeout: ""
## metric relabel configs to apply to samples before ingestion.
##
metricRelabelings: []
relabelings: []
If enabled, generate a self-signed certificate, then patch the webhook configurations with the generated data.
## On chart upgrades (or if the secret exists) the cert will not be re-generated. You can use this to provide your own
## certs ahead of time if you wish.
##
patch:
enabled: true
image:
repository: registry.k8s.io/ingress-nginx/kube-webhook-certgen
tag: v1.1.1
sha: ""
pullPolicy: IfNotPresent
resources:
limits:
cpu: 200m
memory: 100Mi
requests:
cpu: 100m
memory: 50Mi
## Provide a priority class name to the webhook patching job
##
priorityClassName: ""
podAnnotations: {}
nodeSelector: {}
affinity: {}
tolerations: []
Namespaces to scope the interaction of the Prometheus Operator and the apiserver (allow list).
This is mutually exclusive with denyNamespaces. Setting this to an empty object will disable the configuration
namespaces:
{}
releaseNamespace: true
# additional:
# - kube-system
Namespaces not to scope the interaction of the Prometheus Operator (deny list).
denyNamespaces: []
Filter namespaces to look for prometheus-operator custom resources
## Port to expose on each node
## Only used if service.type is 'NodePort'
##
nodePort: Provided Values
nodePortTls: Provided Values
## Additional ports to open for Prometheus service
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#multi-port-services
##
additionalPorts: []
## Loadbalancer IP
## Only use if service.type is "loadbalancer"
##
loadBalancerIP: ""
loadBalancerSourceRanges: []
## Service type
## NodePort, ClusterIP, loadbalancer
##
type: ClusterIP
## List of IP addresses at which the Prometheus server service is available
## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
##
externalIPs: []
Scrape interval. If not set, the Prometheus default scrape interval is used.
##
interval: ""
## Scrape timeout. If not set, the Prometheus default scrape timeout is used.
scrapeTimeout: ""
selfMonitor: true
## metric relabel configs to apply to samples before ingestion.
##
metricRelabelings: []
relabelings: []
## Port for Prometheus Service to listen on
##
port: Provided Values
## To be used with a proxy extraContainer port
targetPort: Provided Values
## List of IP addresses at which the Prometheus server service is available
## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
##
externalIPs: []
## Port to expose on each node
## Only used if service.type is 'NodePort'
##
nodePort: Provided Values
## Loadbalancer IP
## Only use if service.type is "loadbalancer"
loadBalancerIP: ""
loadBalancerSourceRanges: []
## Service type
##
type: ClusterIP
sessionAffinity: ""
Configuration for creating a separate Service for each statefulset Prometheus replica
servicePerReplica:
enabled: false
annotations: {}
## Port for Prometheus Service per replica to listen on
##
port: Provided Values
## To be used with a proxy extraContainer port
targetPort: Provided Values
## Port to expose on each node
## Only used if servicePerReplica.type is 'NodePort'
##
nodePort: Provided Values
## Loadbalancer source IP ranges
## Only used if servicePerReplica.type is "loadbalancer"
loadBalancerSourceRanges: []
## Service type
##
type: ClusterIP
Ingress exposes thanos sidecar outside the clsuter
thanosIngress:
enabled: false
annotations: {}
labels: {}
servicePort: Provided Values
## Hosts must be provided if Ingress is enabled.
##
hosts: []
# - thanos-gateway.domain.com
## Paths to use for ingress rules
##
paths: []
tls: []
# - secretName: thanos-gateway-tls
# hosts:
# - thanos-gateway.domain.com
ingress:
enabled: false
# For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
# See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
# ingressClassName: nginx
annotations: {}
labels: {}
## Hostnames.
## Must be provided if Ingress is enabled.
##
# hosts:
# - prometheus.domain.com
hosts: []
## Paths to use for ingress rules - one path should match the prometheusSpec.routePrefix
##
paths: []
# - /
## TLS configuration for Prometheus Ingress
## Secret must be manually created in the namespace
##
tls: []
# - secretName: prometheus-general-tls
# hosts:
# - prometheus.example.com
Configuration for creating an Ingress that will map to each Prometheus replica service
prometheus.servicePerReplica must be enabled
ingressPerReplica:
enabled: false
# For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
# See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
# ingressClassName: nginx
annotations: {}
labels: {}
hostPrefix: ""
## Domain that will be used for the per replica ingress
hostDomain: ""
## Paths to use for ingress rules
##
paths: []
# - /
## Secret name containing the TLS certificate for Prometheus per replica ingress
## Secret must be manually created in the namespace
tlsSecretName: ""
## Separated secret for each per replica Ingress. Can be used together with cert-manager
##
tlsSecretPerReplica:
enabled: false
## Final form of the secret for each per replica ingress is
## {{ tlsSecretPerReplica.prefix }}-{{ $replicaNumber }}
##
prefix: "prometheus"
Configure additional options for default pod security policy for Prometheus
Scrape interval. If not set, the Prometheus default scrape interval is used.
##
interval: ""
selfMonitor: true
## scheme: HTTP scheme to use for scraping. Can be used with `tlsConfig` for example if using istio mTLS.
scheme: ""
## tlsConfig: TLS configuration to use when scraping the endpoint. For example if using istio mTLS.
## Of type: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#tlsconfig
tlsConfig: {}
bearerTokenFile:
## metric relabel configs to apply to samples before ingestion.
##
metricRelabelings: []
# - action: keep
# regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
# sourceLabels: [__name__]
# relabel configs to apply to samples before ingestion.
##
relabelings: []
# - sourceLabels: [__meta_kubernetes_pod_node_name]
# separator: ;
# regex: ^(.*)$
# targetLabel: nodename
# replacement: $1
# action: replace
If true, pass --storage.tsdb.max-block-duration=2h to prometheus. This is already done if using Thanos
##
disableCompaction: false
## APIServerConfig
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#apiserverconfig
##
apiserverConfig: {}
## Interval between consecutive scrapes.
##
scrapeInterval: "1m"
## Interval between consecutive evaluations.
##
evaluationInterval: "1m"
## ListenLocal makes the Prometheus server listen on loopback, so that it does not bind against the Pod IP.
##
listenLocal: false
## EnableAdminAPI enables Prometheus the administrative HTTP API which includes functionality such as deleting time series.
## This is disabled by default.
## ref: https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-admin-apis
##
enableAdminAPI: false
## Image of Prometheus.
##
image:
repository: quay.io/prometheus/prometheus
tag: v2.40.5
sha: ""
## Tolerations for use with node taints
## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
##
tolerations: []
# - key: "key"
# operator: "Equal"
# value: "value"
# effect: "NoSchedule"
## Alertmanagers to which alerts will be sent
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#alertmanagerendpoints
##
## Default configuration will connect to the alertmanager deployed as part of this release
##
alertingEndpoints: []
# - name: ""
# namespace: ""
# port: http
# scheme: http
# pathPrefix: ""
# tlsConfig: {}
# bearerTokenFile: ""
# apiVersion: v2
## External labels to add to any time series or alerts when communicating with external systems
##
externalLabels: {}
## Name of the external label used to denote replica name
##
replicaExternalLabelName: ""
## If true, the Operator won't add the external label used to denote replica name
##
replicaExternalLabelNameClear: false
## Name of the external label used to denote Prometheus instance name
##
prometheusExternalLabelName: ""
## If true, the Operator won't add the external label used to denote Prometheus instance name
##
prometheusExternalLabelNameClear: false
## External URL at which Prometheus will be reachable.
##
externalUrl: ""
## Define which Nodes the Pods are scheduled on.
## ref: https://kubernetes.io/docs/user-guide/node-selection/
##
nodeSelector: {}
## Secrets is a list of Secrets in the same namespace as the Prometheus object, which shall be mounted into the Prometheus Pods.
## The Secrets are mounted into /etc/prometheus/secrets/. Secrets changes after initial creation of a Prometheus object are not
## reflected in the running Pods. To change the secrets mounted into the Prometheus Pods, the object must be deleted and recreated
## with the new list of secrets.
##
secrets: []
## ConfigMaps is a list of ConfigMaps in the same namespace as the Prometheus object, which shall be mounted into the Prometheus Pods.
## The ConfigMaps are mounted into /etc/prometheus/configmaps/.
##
configMaps: []
## QuerySpec defines the query command line flags when starting Prometheus.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#queryspec
##
query: {}
## Namespaces to be selected for PrometheusRules discovery.
## If nil, select own namespace. Namespaces to be selected for ServiceMonitor discovery.
## See https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#namespaceselector for usage
##
ruleNamespaceSelector: {}
## If true, a nil or {} value for prometheus.prometheusSpec.ruleSelector will cause the
## prometheus resource to be created with selectors based on values in the helm deployment,
## which will also match the PrometheusRule resources created
##
ruleSelectorNilUsesHelmValues: true
## PrometheusRules to be selected for target discovery.
## If {}, select all ServiceMonitors
##
ruleSelector: {}
serviceMonitorSelectorNilUsesHelmValues: true
## ServiceMonitors to be selected for target discovery.
## If {}, select all ServiceMonitors
##
serviceMonitorSelector: {}
serviceMonitorNamespaceSelector: {}
podMonitorSelectorNilUsesHelmValues: true
podMonitorSelector: {}
podMonitorNamespaceSelector: {}
## If true, a nil or {} value for prometheus.prometheusSpec.probeSelector will cause the
## prometheus resource to be created with selectors based on values in the helm deployment,
## which will also match the probes created
##
probeSelectorNilUsesHelmValues: true
## Probes to be selected for target discovery.
## If {}, select all Probes
##
probeSelector: {}
probeNamespaceSelector: {}
## How long to retain metrics
##
retention: 7d
## Maximum size of metrics
##
retentionSize: "150GB"
## Enable compression of the write-ahead log using Snappy.
##
walCompression: false
## If true, the Operator won't process any Prometheus configuration changes
##
paused: false
## Number of Prometheus replicas desired
##
replicas: 1
## Log level for Prometheus be configured in
##
logLevel: info
## Log format for Prometheus be configured in
##
logFormat: logfmt
## Prefix used to register routes, overriding externalUrl route.
## Useful for proxies that rewrite URLs.
##
routePrefix: /
## Standard object’s metadata. More info: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#metadata
## Metadata Labels and Annotations gets propagated to the prometheus pods.
##
podMetadata: {}
podAntiAffinity: ""
## If anti-affinity is enabled sets the topologyKey to use for anti-affinity.
## This can be changed to, for example, failure-domain.beta.kubernetes.io/zone
##
podAntiAffinityTopologyKey: kubernetes.io/hostname
## Assign custom affinity rules to the prometheus instance
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
##
affinity: {}
remoteRead: []
# - url: http://remote1/read
## The remote_write spec configuration for Prometheus.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#remotewritespec
remoteWrite: []
# - url: http://remote1/push
## Enable/Disable Grafana dashboards provisioning for prometheus remote write feature
remoteWriteDashboards: false
## Resource limits & requests
##
resources:
requests:
memory: 6Gi
cpu: 1000m
limits:
memory: 12Gi
cpu: 2000m
## Prometheus StorageSpec for persistent data
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/storage.md
##
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: "managed-premium"
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 300Gi
# selector: {}
# Additional volumes on the output StatefulSet definition.
volumes: []
# Additional VolumeMounts on the output StatefulSet definition.
volumeMounts: []
additionalScrapeConfigs:
- job_name: "kubernetes-pods"
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
[__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels:
[__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
## If additional scrape configurations are already deployed in a single secret file you can use this section.
## Expected values are the secret name and key
## Cannot be used with additionalScrapeConfigs
additionalScrapeConfigsSecret: {}
# enabled: false
# name:
# key:
## additionalPrometheusSecretsAnnotations allows to add annotations to the kubernetes secret. This can be useful
## when deploying via spinnaker to disable versioning on the secret, strategy.spinnaker.io/versioned: 'false'
additionalPrometheusSecretsAnnotations: {}
additionalAlertManagerConfigs: []
additionalAlertRelabelConfigs: []
securityContext:
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
## Priority class assigned to the Pods
##
priorityClassName: ""
thanos: {}
## Containers allows injecting additional containers. This is meant to allow adding an authentication proxy to a Prometheus pod.
## if using proxy extraContainer update targetPort with proxy container port
containers: []
## InitContainers allows injecting additional initContainers. This is meant to allow doing some changes
## (permissions, dir tree) on mounted volumes before starting prometheus
initContainers: []
## PortName to use for Prometheus.
##
portName: "http-web"
Describe the bug a clear and concise description of what the bug is.
We are trying to upgrade the Prometheus version from 39.13.1 to 43.0.0 and while doing so getting error. When tried to upgrade, it was getting stuck in uninstall state giving this error "helm.go:84: [debug] failed to delete release: prometheus" . If we remove it manually and when tried to do fresh install of 43.0.0, we are getting this error " _client.go:735: [debug] Add/Modify event for prometheus-kube-prometheus-admission-create: MODIFIED client.go:774: [debug] prometheus-kube-prometheus-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[debug]Re-evaluate condition on job cancellation for step: 'Deploy Platform'._ after this it was cancelled after some time.
AKS version is 1.25 Helm version: 3.11.3
What's your helm version?
3.11.3
What's your kubectl version?
1.25.0
Which chart?
kube-prometheus-stack
What's the chart version?
43.0.0
What happened?
We are trying to upgrade the Prometheus version from 39.13.1 to 43.0.0 and while doing so getting error. When tried to upgrade, it was getting stuck in uninstall state giving this error "helm.go:84: [debug] failed to delete release: prometheus" . If we remove it manually and when tried to do fresh install of 43.0.0, we are getting this error " _client.go:735: [debug] Add/Modify event for prometheus-kube-prometheus-admission-create: MODIFIED client.go:774: [debug] prometheus-kube-prometheus-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
[debug]Re-evaluate condition on job cancellation for step: 'Deploy Platform'._ after this it was cancelled after some time.
AKS version is 1.25 Helm version: 3.11.3
What you expected to happen?
That it will uninstall successfully and install the new version or directly install new version successfully after no version was present.
How to reproduce it?
For fresh Installation : helm upgrade prometheus prometheus-community/kube-prometheus-stack --install --version 43.0.0 --debug --wait --reset-values --timeout 24000s --create-namespace --namespace namespace_name -f <(envsubst < $(dirname $BASH_SOURCE)/thirdParty/prometheusConfig.yaml)
For uninstalling first and then upgrading it: Delete CRD helm uninstall --debug prometheus -n namespace_name Delete all cluster role helm upgrade prometheus prometheus-community/kube-prometheus-stack --version 43.0.0 --debug --wait --reset-values --timeout 24000s --create-namespace --namespace namespace_name -f <(envsubst < $(dirname $BASH_SOURCE)/thirdParty/prometheusConfig.yaml)
Enter the changed values of values.yaml?
No response
Enter the command that you execute and failing/misfunctioning.
helm upgrade prometheus prometheus-community/kube-prometheus-stack --install --version 43.0.0 --debug --wait --reset-values --timeout 24000s --create-namespace --namespace namespace_name -f <(envsubst < $(dirname $BASH_SOURCE)/thirdParty/prometheusConfig.yaml)
helm uninstall --debug prometheus -n namespace_name
Anything else we need to know?
prometheusConfig.yaml
nameOverride: "" namespaceOverride: "" kubeTargetVersionOverride: "" fullnameOverride: "" commonLabels: {} defaultRules: create: true rules: alertmanager: false etcd: false general: true k8s: false kubeApiserver: false kubeApiserverAvailability: false kubeApiserverError: false kubeApiserverSlos: false kubelet: false kubePrometheusGeneral: false kubePrometheusNodeAlerting: false kubePrometheusNodeRecording: false kubernetesAbsent: false kubernetesApps: false kubernetesResources: false kubernetesStorage: false kubernetesSystem: false kubeScheduler: false kubeStateMetrics: false network: false node: false prometheus: false prometheusOperator: false time: true
Runbook url prefix for default rules
runbookUrl: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#
Reduce app namespace alert scope
appNamespacesTarget: ".*"
Labels for default rules
labels: {}
Annotations for default rules
annotations: {}
Additional labels for PrometheusRule alerts
additionalRuleLabels: {} additionalPrometheusRulesMap: {} global: rbac: create: true pspEnabled: false pspAnnotations: {} imagePullSecrets: [] alertmanager:
enabled: false
apiVersion: v2
serviceAccount: create: true name: "" annotations: {} podDisruptionBudget: enabled: false minAvailable: 1 maxUnavailable: ""
config: global: resolve_timeout: 5m route: group_by: ["job"] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: "null" routes:
match: alertname: Watchdog receiver: "null" receivers:
tplConfig: false
templateFiles: {}
ingress: enabled: false
annotations: {}
labels: {}
Hosts must be provided if Ingress is enabled.
hosts: [] paths: []
tls: []
secret: annotations: {}
ingressPerReplica: enabled: false
annotations: {} labels: {}
hostPrefix: ""
Domain that will be used for the per replica ingress
hostDomain: ""
Paths to use for ingress rules
paths: [] tlsSecretName: ""
Separated secret for each per replica Ingress. Can be used together with cert-manager
tlsSecretPerReplica: enabled: false
Final form of the secret for each per replica ingress is
{{ tlsSecretPerReplica.prefix }}-{{ $replicaNumber }}
prefix: "alertmanager"
Configuration for Alertmanager service
service: annotations: {} labels: {} clusterIP: ""
Port for Alertmanager Service to listen on
port: 9093
targetPort: 9093
nodePort: 30903
externalIPs: [] loadBalancerIP: "" loadBalancerSourceRanges: []
Service type
type: ClusterIP
Configuration for creating a separate Service for each statefulset Alertmanager replica
servicePerReplica: enabled: false annotations: {}
Port for Alertmanager Service per replica to listen on
port: Provided Values
To be used with a proxy extraContainer port
targetPort: Provided Values
Port to expose on each node
Only used if servicePerReplica.type is 'NodePort'
nodePort: Provided Values
Loadbalancer source IP ranges
Only used if servicePerReplica.type is "loadbalancer"
loadBalancerSourceRanges: []
Service type
type: ClusterIP
If true, create a serviceMonitor for alertmanager
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
interval: "" selfMonitor: true
scheme: HTTP scheme to use for scraping. Can be used with
tlsConfig
for example if using istio mTLS.scheme: ""
tlsConfig: TLS configuration to use when scraping the endpoint. For example if using istio mTLS.
Of type: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig
tlsConfig: {}
bearerTokenFile:
metric relabel configs to apply to samples before ingestion.
metricRelabelings: []
- action: keep
regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
sourceLabels: [name]
relabel configs to apply to samples before ingestion.
relabelings: []
alertmanagerSpec:
Standard object’s metadata. More info: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#metadata
Metadata Labels and Annotations gets propagated to the Alertmanager pods.
podMetadata: {}
Image of Alertmanager
image: repository: quay.io/prometheus/alertmanager tag: v0.24.0 sha: ""
If true then the user will be responsible to provide a secret with alertmanager configuration
So when true the config part will be ignored (including templateFiles) and the one in the secret will be used
useExistingSecret: false
Secrets is a list of Secrets in the same namespace as the Alertmanager object, which shall be mounted into the
Alertmanager Pods. The Secrets are mounted into /etc/alertmanager/secrets/.
secrets: []
ConfigMaps is a list of ConfigMaps in the same namespace as the Alertmanager object, which shall be mounted into the Alertmanager Pods.
The ConfigMaps are mounted into /etc/alertmanager/configmaps/.
configMaps: []
alertmanagerConfigSelector: {}
alertmanagerConfigNamespaceSelector: {}
logFormat: logfmt
Log level for Alertmanager to be configured with.
logLevel: info
Size is the expected size of the alertmanager cluster. The controller will eventually make the size of the
running cluster equal to the expected size.
replicas: 1
Time duration Alertmanager shall retain data for. Default is '120h', and must match the regular expression
[0-9]+(ms|s|m|h) (milliseconds seconds minutes hours).
retention: 120h
Storage is the definition of how storage will be used by the Alertmanager instances.
ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/storage.md
storage: {}
externalUrl:
routePrefix: /
If set to true all actions on the underlying managed objects are not going to be performed, except for delete actions.
paused: false
Define which Nodes the Pods are scheduled on.
ref: https://kubernetes.io/docs/user-guide/node-selection/
nodeSelector: {}
Define resources requests and limits for single Pods.
ref: https://kubernetes.io/docs/user-guide/compute-resources/
resources: {}
podAntiAffinity: ""
If anti-affinity is enabled sets the topologyKey to use for anti-affinity.
This can be changed to, for example, failure-domain.beta.kubernetes.io/zone
podAntiAffinityTopologyKey: kubernetes.io/hostname
Assign custom affinity rules to the alertmanager instance
ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
affinity: {}
tolerations: []
securityContext: runAsGroup: 2000 runAsNonRoot: true runAsUser: 1000 fsGroup: 2000
ListenLocal makes the Alertmanager server listen on loopback, so that it does not bind against the Pod IP.
Note this is only for the Alertmanager UI, not the gossip communication.
listenLocal: false
Containers allows injecting additional containers. This is meant to allow adding an authentication proxy to an Alertmanager pod.
containers: []
Priority class assigned to the Pods
priorityClassName: ""
AdditionalPeers allows injecting a set of additional Alertmanagers to peer with to form a highly available cluster.
additionalPeers: []
PortName to use for Alert Manager.
portName: "http-web"
ClusterAdvertiseAddress is the explicit address to advertise in cluster. Needs to be provided for non RFC1918 [1] (public) addresses. [1] RFC1918: https://tools.ietf.org/html/rfc1918
clusterAdvertiseAddress: false
Using default values from https://github.com/grafana/helm-charts/blob/main/charts/grafana/values.yaml
grafana: enabled: true priorityClassName: "" podLabels: app: prometheus-grafana
grafana.ini: users: viewers_can_edit: false auto_assign_org_role: Editor auto_assign_org: true auth: disable_login_form: true disable_signout_menu: true auth.anonymous: enabled: true org_role: Viewer
auth.basic: enabled: false
auth.proxy: enabled: true header_name: X-GRAFANA-USER header_property: username auto_sign_up: true sync_ttl: 60 server: domain: "${}" root_url: "https://" serve_from_sub_path: true security: allow_embedding: true live: max_connections: 0
namespaceOverride: ""
Deploy default dashboards.
defaultDashboardsEnabled: false
adminPassword: prom-operator
persistence: type: pvc enabled: true accessModes:
kubernetes.io/pvc-protection
resources: limits: cpu: 100m memory: 512Mi requests: cpu: 10m memory: 256Mi
ingress:
If true, Grafana Ingress will be created
enabled: false
Annotations for Grafana Ingress
annotations: {}
kubernetes.io/ingress.class: nginx
kubernetes.io/tls-acme: "true"
Labels to be added to the Ingress
labels: {}
Hostnames.
Must be provided if Ingress is enable.
hosts:
- grafana.domain.com
hosts: []
Path for grafana ingress
path: /
TLS configuration for grafana Ingress
Secret must be manually created in the namespace
tls: []
- secretName: grafana-general-tls
hosts:
- grafana.example.com
sidecar: dashboards: enabled: true label: grafana_dashboard
Annotations for Grafana dashboard configmaps
annotations: {} datasources: enabled: true defaultDatasourceEnabled: true
Annotations for Grafana datasource configmaps
annotations: {}
createPrometheusReplicasDatasources: false label: grafana_datasource
extraConfigmapMounts: []
additionalDataSources: []
service: portName: http-web
If true, create a serviceMonitor for grafana
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
interval: "" selfMonitor: true
metric relabel configs to apply to samples before ingestion.
metricRelabelings: [] relabelings: []
kubeApiServer: enabled: false tlsConfig: serverName: kubernetes insecureSkipVerify: false
If your API endpoint address is not reachable (as in AKS) you can replace it with the kubernetes service
relabelings: []
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
kubelet: enabled: false namespace: kube-system
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
kubeControllerManager: enabled: false
If your kube controller manager is not deployed as a pod, specify IPs it can be found on
endpoints: []
service: port: Provided Values targetPort: Provided Values
selector:
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
coreDns: enabled: false service: port: Provided Values targetPort: Provided Values
selector:
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
kubeDns: enabled: false service: dnsmasq: port: Provided Values targetPort: Provided Values skydns: port: Provided Values targetPort: Provided Values
selector:
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
kubeEtcd: enabled: false
If your etcd is not deployed as a pod, specify IPs it can be found on
endpoints: []
- 10.141.4.22
- 10.141.4.23
- 10.141.4.24
Etcd service. If using kubeEtcd.endpoints only the port and targetPort are used
service: port: Provided Values targetPort: Provided Values
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
kubeScheduler: enabled: false
endpoints: []
service: port: Provided Values targetPort: Provided Values
selector:
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
kubeProxy: enabled: false
If your kube proxy is not deployed as a pod, specify IPs it can be found on
endpoints: []
service: port: 10249 targetPort: 10249
selector:
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
kubeStateMetrics: enabled: true serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
kube-state-metrics: priorityClassName: "" resources: limits: cpu: 100m memory: 512Mi requests: cpu: 10m memory: 128Mi namespaceOverride: "" rbac: create: true podSecurityPolicy: enabled: false
Deploy node exporter as a daemonset to all nodes
nodeExporter: enabled: true
Use the value configured in prometheus-node-exporter.podLabels
jobLabel: jobLabel
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
prometheus-node-exporter: resources: limits: cpu: 100m memory: 256Mi requests: cpu: 10m memory: 128Mi namespaceOverride: "" podLabels:
Add the 'node-exporter' label to be used by serviceMonitor to match standard common usage in rules and grafana dashboards
extraArgs:
--collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
Assign custom affinity rules to the prometheus operator
ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms:
Tolerations for use with node taints
ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
tolerations:
Manages Prometheus and Alertmanager components
prometheusOperator: enabled: true requests: cpu: "100m" memory: "2Gi" limits: cpu: "2000m" memory: "5Gi"
Prometheus-Operator v0.39.0 and later support TLS natively.
tls: enabled: true
Admission webhook support for PrometheusRules resources added in Prometheus Operator 0.30 can be enabled to prevent incorrectly formatted
rules from making their way into prometheus and potentially preventing the container from starting
admissionWebhooks: failurePolicy: Fail enabled: true
If enabled, generate a self-signed certificate, then patch the webhook configurations with the generated data.
Namespaces to scope the interaction of the Prometheus Operator and the apiserver (allow list).
This is mutually exclusive with denyNamespaces. Setting this to an empty object will disable the configuration
namespaces: {}
releaseNamespace: true
Namespaces not to scope the interaction of the Prometheus Operator (deny list).
denyNamespaces: []
Filter namespaces to look for prometheus-operator custom resources
alertmanagerInstanceNamespaces: [] prometheusInstanceNamespaces: [] thanosInstanceNamespaces: []
Service account for Alertmanager to use.
ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccount: create: true name: ""
Configuration for Prometheus operator service
service: annotations: {} labels: {} clusterIP: ""
Labels to add to the operator pod
podLabels: {}
Annotations to add to the operator pod
podAnnotations: {}
Assign a PriorityClassName to pods if set
priorityClassName: ""
kubeletService: enabled: false namespace: kube-system
Create a servicemonitor for the operator
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
resources: requests: memory: 6Gi cpu: 1000m limits: memory: 12Gi cpu: 2000m
Required for use in managed kubernetes clusters (such as AWS EKS) with custom CNI (such as calico),
because control-plane managed by AWS cannot communicate with pods' IP CIDR and admission webhooks are not working
hostNetwork: false
Define which Nodes the Pods are scheduled on.
ref: https://kubernetes.io/docs/user-guide/node-selection/
nodeSelector: {}
Tolerations for use with node taints
ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
tolerations: []
affinity: {}
securityContext: fsGroup: 65534 runAsGroup: 65534 runAsNonRoot: true runAsUser: 65534
Prometheus-operator image
image: repository: quay.io/prometheus-operator/prometheus-operator tag: v0.61.1 sha: "" pullPolicy: IfNotPresent
Configmap-reload image to use for reloading configmaps
configmapReloadImage: repository: docker.io/jimmidyson/configmap-reload tag: v0.4.0 sha: ""
Prometheus-config-reloader image to use for config and rule reloading
prometheusConfigReloaderImage:
image to use for config and rule reloading
Thanos side-car image when configured
thanosImage: repository: quay.io/thanos/thanos tag: v0.29.0 sha: ""
Set a Field Selector to filter watched secrets
secretFieldSelector: ""
Deploy a Prometheus instance
prometheus: enabled: true
Annotations for Prometheus
annotations: {}
Service account for Prometheuses to use.
ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccount: create: true name: ""
Configuration for Prometheus service
service: annotations: {} labels: {} clusterIP: ""
Configuration for creating a separate Service for each statefulset Prometheus replica
servicePerReplica: enabled: false annotations: {}
podDisruptionBudget: enabled: false minAvailable: 1 maxUnavailable: ""
Ingress exposes thanos sidecar outside the clsuter
thanosIngress: enabled: false
ingress: enabled: false
Configuration for creating an Ingress that will map to each Prometheus replica service
prometheus.servicePerReplica must be enabled
ingressPerReplica: enabled: false
Configure additional options for default pod security policy for Prometheus
ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
podSecurityPolicy: allowedCapabilities: []
serviceMonitor:
Scrape interval. If not set, the Prometheus default scrape interval is used.
Settings affecting prometheusSpec
ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#prometheusspec
prometheusSpec:
If true, pass --storage.tsdb.max-block-duration=2h to prometheus. This is already done if using Thanos
additionalServiceMonitors: []
additionalPodMonitors: []