grafana / helm-charts

Apache License 2.0
1.55k stars 2.21k forks source link

Datasource reloading failed when Grafana is configured to use HTTPS #1024

Open d7volker opened 2 years ago

d7volker commented 2 years ago

Describe the bug a clear and concise description of what the bug is.

I got the following error message, an Grafana ended up with no datasource.

[2022-02-08 09:53:09] Starting collector
[2022-02-08 09:53:09] No folder annotation was provided, defaulting to k8s-sidecar-target-directory
[2022-02-08 09:53:09] Selected resource type: ('secret', 'configmap')
[2022-02-08 09:53:09] Loading incluster config ...
[2022-02-08 09:53:09] Config for cluster api at 'https://198.18.128.1:443' loaded...
[2022-02-08 09:53:09] Unique filenames will not be enforced.
[2022-02-08 09:53:09] 5xx response content will not be enabled.
/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py:1043: InsecureRequestWarning: Unverified HTTPS request is being made to host '198.18.128.1'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py:1043: InsecureRequestWarning: Unverified HTTPS request is being made to host '198.18.128.1'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
[2022-02-08 09:53:14] Working on ADDED configmap grafana/prometheus-kube-prometheus-grafana-datasource
[2022-02-08 09:53:47] Received unknown exception: HTTPSConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /api/admin/provisioning/datasources/reload (Caused by SSLError(CertificateError("hostname 'localhost' doesn't match either of '*.example.com'")))

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1040, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 469, in connect
    _match_hostname(cert, self.assert_hostname or server_hostname)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 542, in _match_hostname
    match_hostname(cert, asserted_hostname)
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/ssl_match_hostname.py", line 152, in match_hostname
    raise CertificateError(
urllib3.util.ssl_match_hostname.CertificateError: hostname 'localhost' doesn't match either of '*.example.com'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 813, in urlopen
    return self.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 813, in urlopen
    return self.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 813, in urlopen
    return self.urlopen(
  [Previous line repeated 2 more times]
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 785, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /api/admin/provisioning/datasources/reload (Caused by SSLError(CertificateError("hostname 'localhost' doesn't match either of '*.example.com'")))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/resources.py", line 239, in _watch_resource_loop
    _watch_resource_iterator(*args)
  File "/app/resources.py", line 227, in _watch_resource_iterator
    request(request_url, request_method, enable_5xx, request_payload)
  File "/app/helpers.py", line 123, in request
    res = r.post("%s" % url, auth=auth, json=payload, timeout=REQ_TIMEOUT)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 590, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)

I ensured the setting is set:

skipTlsVerify: true

And I verified it in the sidecar container:

/app $ env |grep SKIP SKIP_TLS_VERIFY=true

How to reproduce it?

reloadURL: "https://localhost:3000/api/admin/provisioning/datasources/reload"

Enter the changed values of values.yaml?

below sidecar:

skipTlsVerify: true
reloadURL: "https://localhost:3000/api/admin/provisioning/datasources/reload" 

grafana.ini:
  paths:
  data: /var/lib/grafana/
  logs: /var/log/grafana
  plugins: /var/lib/grafana/plugins
  provisioning: /etc/grafana/provisioning
  analytics:
  check_for_updates: true
  log:
     mode: console
  grafana_net:
     url: https://grafana.net
  server:
     protocol: https
     cert_file: /etc/grafana/tls/tls.crt
     cert_key: /etc/grafana/tls/tls.key
d7volker commented 2 years ago

That wasn't the route cause of my problem. That skipTlsVerify: true works perfectly. Closing this issue.

d7volker commented 2 years ago

I need to reopen this issue, because it turned out that the reloading still fails.

theadzik commented 2 years ago

@d7volker Did you figure this out?

kiyyer1 commented 2 years ago

We are also facing this issue. The skipTlsVerify works for k8s api but not for reloadURL.

kiyyer1 commented 2 years ago

@d7volker @theadzik How to make the code owners/authors aware of this issue?

d7volker commented 2 years ago

Here is how I solved it for me:

image

You need to aware this setting sets the whole sidecar out of order. Another option would be to deactivate the whole sidecar for auto loading datasources in your values.conf. If you have only a single datasource or you manage them via a static config that would work.

The proposed way is to set it up with an Ingress, that is why nobody is interested in this sort of problem.

kifeo commented 1 year ago

in my case, I was using the sidecar and dashboardProviders settings.

BUT from the helm values :

Requires at least Grafana 5 to work and can't be used together with parameters dashboardProviders, datasources and dashboards

So I removed the sidecar and it worked. Hope this will help

pankdhnd commented 1 year ago

I have similar issue. Grafana is on HTTPS. I have two sidecars, one is for datasources, another for dashboards.

Reload URLs for both sidecars are on http, like http://localhost:3000/...

My dashboard sidecar container works fine. It is able to load dashboard from configmaps, but the datasource container doesn't work.

Datasources are loaded from secrets and yaml file is added in the grafana shared volume but grafana won't pick it up because reload URL for datasource doesnt work. I am not sure what exactly is the issue.

SKIP_TLS_VERIFY is set to "true" in both the containers. Even if I change the reload url from http to https, things don't work out.

This is confusing, as both the containers use same image, same admin credentials. If reload url works from one container, I don't know what creates problem in the other one.

When Grafana is on HTTP, everything works fine.

grubyhs commented 12 months ago
{"time": "2023-07-12T09:31:03.000748+00:00", "msg": "Writing /etc/grafana/provisioning/alerting/alerts.yaml (ascii)", "level": "INFO"}
{"time": "2023-07-12T09:31:40.140993+00:00", "msg": "Received unknown exception: HTTPConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /api/admin/provisioning/alerting/reload (Caused by ResponseError('too many 500 error responses'))\n", "level": "ERROR"}
Traceback (most recent call last):
  File "/app/.venv/lib/python3.11/site-packages/requests/adapters.py", line 487, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  [Previous line repeated 2 more times]
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 879, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)

Same problem in my situation :/

pankdhnd commented 12 months ago

I managed to fix this problem by making a change in _pod.tpl.

With these changes, the sidecar container is able to use ca bundle for communication over HTTPS, and the reload request succeeds, which results in immediate loading of datasources.

grubyhs commented 12 months ago

I managed to fix this problem by making a change in _pod.tpl.

  • I keep my TLS certificates inside a secret. So I basically mounted a volume from the secret with TLS certs in for the grafana-sc-datasources container.
  • Then for the grafana-sc-datasources container, I added an environment variable called REQUESTS_CA_BUNDLE. This environment variable accepts the path for the ca certificate file.
  • I provided the path for the ca certificate file from the volume I mounted form the secret. Here is how the it looks
    • name: REQUESTS_CA_BUNDLE value: "/etc/grafana/certs/ca.crt"
  • Now in the grafana values.yaml file, I changed the reloadUrl endpoint to https.
  • Done

With these changes, the sidecar container is able to use ca bundle for communication over HTTPS, and the reload request succeeds, which results in immediate loading of datasources.

Maybe do you know how to do that in kube-prometheus-stack? :D

pankdhnd commented 12 months ago

@grubyhs, these changes need to be integrated, but the thing is the secret mount, setting up environment variable, changing URL to HTTPS, these all activities are applicable only in case of TLS connectivity. I feel that there has to be a flag in helm chart, like tlsEnabled, and when this flag is set to true, all of the required parameters are set automatically, and all you need to provide is the certificates from a secret.

Just a thought, suggestions are welcome :-)

grubyhs commented 12 months ago

Okey I mounted it but it didn't helped. Thanks for your help <3

pankdhnd commented 12 months ago

@grubyhs, can you please share your configuration from _pod.tpl?

grubyhs commented 12 months ago

@grubyhs, can you please share your configuration from _pod.tpl? `# Default values for kube-prometheus-stack.

This is a YAML-formatted file.

Declare variables to be passed into your templates.

Provide a name in place of kube-prometheus-stack for app: labels

nameOverride: ""

Override the deployment namespace

namespaceOverride: "monitoring"

Provide a k8s version to auto dashboard import script example: kubeTargetVersionOverride: 1.16.6

kubeTargetVersionOverride: ""

Allow kubeVersion to be overridden while creating the ingress

kubeVersionOverride: ""

Provide a name to substitute for the full names of resources

fullnameOverride: ""

Labels to apply to all resources

commonLabels: {}

scmhash: abc123

myLabel: aakkmd

Create default rules for monitoring the cluster

defaultRules: create: true rules: alertmanager: false etcd: true configReloaders: true general: true k8s: true kubeApiserverAvailability: true kubeApiserverBurnrate: true kubeApiserverHistogram: true kubeApiserverSlos: true kubeControllerManager: true kubelet: true kubeProxy: true kubePrometheusGeneral: true kubePrometheusNodeRecording: true kubernetesApps: true kubernetesResources: true kubernetesStorage: true kubernetesSystem: true kubeSchedulerAlerting: true kubeSchedulerRecording: true kubeStateMetrics: true network: true node: true nodeExporterAlerting: true nodeExporterRecording: true prometheus: true prometheusOperator: true windows: false

Reduce app namespace alert scope

appNamespacesTarget: ".*"

Labels for default rules

labels: {}

Annotations for default rules

annotations: {}

Additional labels for PrometheusRule alerts

additionalRuleLabels: {}

Additional annotations for PrometheusRule alerts

additionalRuleAnnotations: {}

Additional labels for specific PrometheusRule alert groups

additionalRuleGroupLabels: alertmanager: {} etcd: {} configReloaders: {} general: {} k8s: {} kubeApiserverAvailability: {} kubeApiserverBurnrate: {} kubeApiserverHistogram: {} kubeApiserverSlos: {} kubeControllerManager: {} kubelet: {} kubeProxy: {} kubePrometheusGeneral: {} kubePrometheusNodeRecording: {} kubernetesApps: {} kubernetesResources: {} kubernetesStorage: {} kubernetesSystem: {} kubeSchedulerAlerting: {} kubeSchedulerRecording: {} kubeStateMetrics: {} network: {} node: {} nodeExporterAlerting: {} nodeExporterRecording: {} prometheus: {} prometheusOperator: {}

Additional annotations for specific PrometheusRule alerts groups

additionalRuleGroupAnnotations: alertmanager: {} etcd: {} configReloaders: {} general: {} k8s: {} kubeApiserverAvailability: {} kubeApiserverBurnrate: {} kubeApiserverHistogram: {} kubeApiserverSlos: {} kubeControllerManager: {} kubelet: {} kubeProxy: {} kubePrometheusGeneral: {} kubePrometheusNodeRecording: {} kubernetesApps: {} kubernetesResources: {} kubernetesStorage: {} kubernetesSystem: {} kubeSchedulerAlerting: {} kubeSchedulerRecording: {} kubeStateMetrics: {} network: {} node: {} nodeExporterAlerting: {} nodeExporterRecording: {} prometheus: {} prometheusOperator: {}

Prefix for runbook URLs. Use this to override the first part of the runbookURLs that is common to all rules.

runbookUrl: "https://runbooks.prometheus-operator.dev/runbooks"

Disabled PrometheusRule alerts

disabled: {}

KubeAPIDown: true

NodeRAIDDegraded: true

Deprecated way to provide custom recording or alerting rules to be deployed into the cluster.

additionalPrometheusRules: []

- name: my-rule-file

groups:

- name: my_group

rules:

- record: my_record

expr: 100 * my_record

Provide custom recording or alerting rules to be deployed into the cluster.

additionalPrometheusRulesMap: {}

rule-name:

groups:

- name: my_group

rules:

- record: my_record

expr: 100 * my_record

global: rbac: create: true

## Create ClusterRoles that extend the existing view, edit and admin ClusterRoles to interact with prometheus-operator CRDs
## Ref: https://kubernetes.io/docs/reference/access-authn-authz/rbac/#aggregated-clusterroles
createAggregateClusterRoles: false
pspEnabled: false
pspAnnotations: {}
  ## Specify pod annotations
  ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#apparmor
  ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#seccomp
  ## Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#sysctl
  ##
  # seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
  # seccomp.security.alpha.kubernetes.io/defaultProfileName: 'docker/default'
  # apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'

Global image registry to use if it needs to be overriden for some specific use cases (e.g local registries, custom images, ...)

imageRegistry: ""

Reference to one or more secrets to be used when pulling images

ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/

imagePullSecrets: []

- name: "image-pull-secret"

or

- "image-pull-secret"

Configuration for alertmanager

ref: https://prometheus.io/docs/alerting/alertmanager/

alertmanager: enabled: false grafana: enabled: true namespaceOverride: "" env: GF_INSTALL_PLUGINS: flant-statusmap-panel GF_SERVER_ROOT_URL: there_is_url_from_script persistence: type: pvc enabled: true storageClassName: rook-ceph-block accessModes: ["ReadWriteOnce"] size: 2Gi mountPath: /var/lib/grafana

ForceDeployDatasources Create datasource configmap even if grafana deployment has been disabled

forceDeployDatasources: false deploymentStrategy: type: Recreate

alerting: contactpoints.yaml: file: grafana/contact-points/contact-points-${CLUSTER_STAGE_OR_PROD}.yaml notifiers:
notification-policies.yaml: file: grafana/notification-policies/notification-policies-${CLUSTER_STAGE_OR_PROD}.yaml extraSecretMounts:

Flag to disable all the kubernetes component scrapers

kubernetesServiceMonitors: enabled: true

Component scraping the kube api server

kubeApiServer: enabled: true tlsConfig: serverName: kubernetes insecureSkipVerify: false serviceMonitor:

Scrape interval. If not set, the Prometheus default scrape interval is used.

##
interval: ""

## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
##
sampleLimit: 0

## TargetLimit defines a limit on the number of scraped targets that will be accepted.
##
targetLimit: 0

## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelLimit: 0

## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelNameLengthLimit: 0

## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelValueLengthLimit: 0

## proxyUrl: URL of a proxy that should be used for scraping.
##
proxyUrl: ""

jobLabel: component
selector:
  matchLabels:
    component: apiserver
    provider: kubernetes

## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
metricRelabelings:
  # Drop excessively noisy apiserver buckets.
  - action: drop
    regex: apiserver_request_duration_seconds_bucket;(0.15|0.2|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2|3|3.5|4|4.5|6|7|8|9|15|25|40|50)
    sourceLabels:
      - __name__
      - le
# - action: keep
#   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
#   sourceLabels: [__name__]

## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
relabelings: []
# - sourceLabels:
#     - __meta_kubernetes_namespace
#     - __meta_kubernetes_service_name
#     - __meta_kubernetes_endpoint_port_name
#   action: keep
#   regex: default;kubernetes;https
# - targetLabel: __address__
#   replacement: kubernetes.default.svc:443

## Additional labels
##
additionalLabels: {}
#  foo: bar

Component scraping the kubelet and kubelet-hosted cAdvisor

kubelet: enabled: true namespace: kube-system

serviceMonitor:

Scrape interval. If not set, the Prometheus default scrape interval is used.

##
interval: ""

## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
##
sampleLimit: 0

## TargetLimit defines a limit on the number of scraped targets that will be accepted.
##
targetLimit: 0

## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelLimit: 0

## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelNameLengthLimit: 0

## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelValueLengthLimit: 0

## proxyUrl: URL of a proxy that should be used for scraping.
##
proxyUrl: ""

## Enable scraping the kubelet over https. For requirements to enable this see
## https://github.com/prometheus-operator/prometheus-operator/issues/926
##
https: true

## Enable scraping /metrics/cadvisor from kubelet's service
##
cAdvisor: true

## Enable scraping /metrics/probes from kubelet's service
##
probes: true

## Enable scraping /metrics/resource from kubelet's service
## This is disabled by default because container metrics are already exposed by cAdvisor
##
resource: false
# From kubernetes 1.18, /metrics/resource/v1alpha1 renamed to /metrics/resource
resourcePath: "/metrics/resource/v1alpha1"

## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
cAdvisorMetricRelabelings:
  # Drop less useful container CPU metrics.
  - sourceLabels: [__name__]
    action: drop
    regex: 'container_cpu_(cfs_throttled_seconds_total|load_average_10s|system_seconds_total|user_seconds_total)'
  # Drop less useful container / always zero filesystem metrics.
  - sourceLabels: [__name__]
    action: drop
    regex: 'container_fs_(io_current|io_time_seconds_total|io_time_weighted_seconds_total|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)'
  # Drop less useful / always zero container memory metrics.
  - sourceLabels: [__name__]
    action: drop
    regex: 'container_memory_(mapped_file|swap)'
  # Drop less useful container process metrics.
  - sourceLabels: [__name__]
    action: drop
    regex: 'container_(file_descriptors|tasks_state|threads_max)'
  # Drop container spec metrics that overlap with kube-state-metrics.
  - sourceLabels: [__name__]
    action: drop
    regex: 'container_spec.*'
  # Drop cgroup metrics with no pod.
  - sourceLabels: [id, pod]
    action: drop
    regex: '.+;'
# - sourceLabels: [__name__, image]
#   separator: ;
#   regex: container_([a-z_]+);
#   replacement: $1
#   action: drop
# - sourceLabels: [__name__]
#   separator: ;
#   regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
#   replacement: $1
#   action: drop

## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
probesMetricRelabelings: []
# - sourceLabels: [__name__, image]
#   separator: ;
#   regex: container_([a-z_]+);
#   replacement: $1
#   action: drop
# - sourceLabels: [__name__]
#   separator: ;
#   regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
#   replacement: $1
#   action: drop

## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
## metrics_path is required to match upstream rules and charts
cAdvisorRelabelings:
  - action: replace
    sourceLabels: [__metrics_path__]
    targetLabel: metrics_path
# - sourceLabels: [__meta_kubernetes_pod_node_name]
#   separator: ;
#   regex: ^(.*)$
#   targetLabel: nodename
#   replacement: $1
#   action: replace

## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
probesRelabelings:
  - action: replace
    sourceLabels: [__metrics_path__]
    targetLabel: metrics_path
# - sourceLabels: [__meta_kubernetes_pod_node_name]
#   separator: ;
#   regex: ^(.*)$
#   targetLabel: nodename
#   replacement: $1
#   action: replace

## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
resourceRelabelings:
  - action: replace
    sourceLabels: [__metrics_path__]
    targetLabel: metrics_path
# - sourceLabels: [__meta_kubernetes_pod_node_name]
#   separator: ;
#   regex: ^(.*)$
#   targetLabel: nodename
#   replacement: $1
#   action: replace

## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
metricRelabelings: []
# - sourceLabels: [__name__, image]
#   separator: ;
#   regex: container_([a-z_]+);
#   replacement: $1
#   action: drop
# - sourceLabels: [__name__]
#   separator: ;
#   regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
#   replacement: $1
#   action: drop

## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
## metrics_path is required to match upstream rules and charts
relabelings:
  - action: replace
    sourceLabels: [__metrics_path__]
    targetLabel: metrics_path
# - sourceLabels: [__meta_kubernetes_pod_node_name]
#   separator: ;
#   regex: ^(.*)$
#   targetLabel: nodename
#   replacement: $1
#   action: replace

## Additional labels
##
additionalLabels: {}
#  foo: bar

Component scraping the kube controller manager

kubeControllerManager: enabled: true

If your kube controller manager is not deployed as a pod, specify IPs it can be found on

endpoints: []

- 10.141.4.22

- 10.141.4.23

- 10.141.4.24

If using kubeControllerManager.endpoints only the port and targetPort are used

service: enabled: true

If null or unset, the value is determined dynamically based on target Kubernetes version due to change

## of default port in Kubernetes 1.22.
##
port: null
targetPort: null
# selector:
#   component: kube-controller-manager

serviceMonitor: enabled: true

Scrape interval. If not set, the Prometheus default scrape interval is used.

##
interval: ""

## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
##
sampleLimit: 0

## TargetLimit defines a limit on the number of scraped targets that will be accepted.
##
targetLimit: 0

## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelLimit: 0

## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelNameLengthLimit: 0

## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelValueLengthLimit: 0

## proxyUrl: URL of a proxy that should be used for scraping.
##
proxyUrl: ""

## Enable scraping kube-controller-manager over https.
## Requires proper certs (not self-signed) and delegated authentication/authorization checks.
## If null or unset, the value is determined dynamically based on target Kubernetes version.
##
https: null

# Skip TLS certificate validation when scraping
insecureSkipVerify: null

# Name of the server to use when validating TLS certificate
serverName: null

## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
metricRelabelings: []
# - action: keep
#   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
#   sourceLabels: [__name__]

## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
relabelings: []
# - sourceLabels: [__meta_kubernetes_pod_node_name]
#   separator: ;
#   regex: ^(.*)$
#   targetLabel: nodename
#   replacement: $1
#   action: replace

## Additional labels
##
additionalLabels: {}
#  foo: bar

Component scraping coreDns. Use either this or kubeDns

coreDns: enabled: true service: port: 9153 targetPort: 9153

selector:

#   k8s-app: kube-dns

serviceMonitor:

Scrape interval. If not set, the Prometheus default scrape interval is used.

##
interval: ""

## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
##
sampleLimit: 0

## TargetLimit defines a limit on the number of scraped targets that will be accepted.
##
targetLimit: 0

## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelLimit: 0

## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelNameLengthLimit: 0

## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelValueLengthLimit: 0

## proxyUrl: URL of a proxy that should be used for scraping.
##
proxyUrl: ""

## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
metricRelabelings: []
# - action: keep
#   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
#   sourceLabels: [__name__]

## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
relabelings: []
# - sourceLabels: [__meta_kubernetes_pod_node_name]
#   separator: ;
#   regex: ^(.*)$
#   targetLabel: nodename
#   replacement: $1
#   action: replace

## Additional labels
##
additionalLabels: {}
#  foo: bar

Component scraping kubeDns. Use either this or coreDns

kubeDns: enabled: false service: dnsmasq: port: 10054 targetPort: 10054 skydns: port: 10055 targetPort: 10055

selector:

#   k8s-app: kube-dns

serviceMonitor:

Scrape interval. If not set, the Prometheus default scrape interval is used.

##
interval: ""

## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
##
sampleLimit: 0

## TargetLimit defines a limit on the number of scraped targets that will be accepted.
##
targetLimit: 0

## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelLimit: 0

## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelNameLengthLimit: 0

## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelValueLengthLimit: 0

## proxyUrl: URL of a proxy that should be used for scraping.
##
proxyUrl: ""

## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
metricRelabelings: []
# - action: keep
#   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
#   sourceLabels: [__name__]

## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
relabelings: []
# - sourceLabels: [__meta_kubernetes_pod_node_name]
#   separator: ;
#   regex: ^(.*)$
#   targetLabel: nodename
#   replacement: $1
#   action: replace

## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
dnsmasqMetricRelabelings: []
# - action: keep
#   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
#   sourceLabels: [__name__]

## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
dnsmasqRelabelings: []
# - sourceLabels: [__meta_kubernetes_pod_node_name]
#   separator: ;
#   regex: ^(.*)$
#   targetLabel: nodename
#   replacement: $1
#   action: replace

## Additional labels
##
additionalLabels: {}
#  foo: bar

Component scraping etcd

kubeEtcd: enabled: true

If your etcd is not deployed as a pod, specify IPs it can be found on

endpoints: []

- 10.141.4.22

- 10.141.4.23

- 10.141.4.24

Etcd service. If using kubeEtcd.endpoints only the port and targetPort are used

service: enabled: true port: 2381 targetPort: 2381

selector:

#   component: etcd

Configure secure access to the etcd cluster by loading a secret into prometheus and

specifying security configuration below. For example, with a secret named etcd-client-cert

serviceMonitor:

scheme: https

insecureSkipVerify: false

serverName: localhost

caFile: /etc/prometheus/secrets/etcd-client-cert/etcd-ca

certFile: /etc/prometheus/secrets/etcd-client-cert/etcd-client

keyFile: /etc/prometheus/secrets/etcd-client-cert/etcd-client-key

serviceMonitor: enabled: true

Scrape interval. If not set, the Prometheus default scrape interval is used.

##
interval: ""

## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
##
sampleLimit: 0

## TargetLimit defines a limit on the number of scraped targets that will be accepted.
##
targetLimit: 0

## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelLimit: 0

## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelNameLengthLimit: 0

## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelValueLengthLimit: 0

## proxyUrl: URL of a proxy that should be used for scraping.
##
proxyUrl: ""
scheme: http
insecureSkipVerify: false
serverName: ""
caFile: ""
certFile: ""
keyFile: ""

## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
metricRelabelings: []
# - action: keep
#   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
#   sourceLabels: [__name__]

## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
relabelings: []
# - sourceLabels: [__meta_kubernetes_pod_node_name]
#   separator: ;
#   regex: ^(.*)$
#   targetLabel: nodename
#   replacement: $1
#   action: replace

## Additional labels
##
additionalLabels: {}
#  foo: bar

Component scraping kube scheduler

kubeScheduler: enabled: true

If your kube scheduler is not deployed as a pod, specify IPs it can be found on

endpoints: []

- 10.141.4.22

- 10.141.4.23

- 10.141.4.24

If using kubeScheduler.endpoints only the port and targetPort are used

service: enabled: true

If null or unset, the value is determined dynamically based on target Kubernetes version due to change

## of default port in Kubernetes 1.23.
##
port: null
targetPort: null
# selector:
#   component: kube-scheduler

serviceMonitor: enabled: true

Scrape interval. If not set, the Prometheus default scrape interval is used.

##
interval: ""

## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
##
sampleLimit: 0

## TargetLimit defines a limit on the number of scraped targets that will be accepted.
##
targetLimit: 0

## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelLimit: 0

## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelNameLengthLimit: 0

## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelValueLengthLimit: 0

## proxyUrl: URL of a proxy that should be used for scraping.
##
proxyUrl: ""
## Enable scraping kube-scheduler over https.
## Requires proper certs (not self-signed) and delegated authentication/authorization checks.
## If null or unset, the value is determined dynamically based on target Kubernetes version.
##
https: null

## Skip TLS certificate validation when scraping
insecureSkipVerify: null

## Name of the server to use when validating TLS certificate
serverName: null

## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
metricRelabelings: []
# - action: keep
#   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
#   sourceLabels: [__name__]

## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
relabelings: []
# - sourceLabels: [__meta_kubernetes_pod_node_name]
#   separator: ;
#   regex: ^(.*)$
#   targetLabel: nodename
#   replacement: $1
#   action: replace

## Additional labels
##
additionalLabels: {}
#  foo: bar

Component scraping kube proxy

kubeProxy: enabled: true

If your kube proxy is not deployed as a pod, specify IPs it can be found on

endpoints: []

- 10.141.4.22

- 10.141.4.23

- 10.141.4.24

service: enabled: true port: 10249 targetPort: 10249

selector:

#   k8s-app: kube-proxy

serviceMonitor: enabled: true

Scrape interval. If not set, the Prometheus default scrape interval is used.

##
interval: ""

## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
##
sampleLimit: 0

## TargetLimit defines a limit on the number of scraped targets that will be accepted.
##
targetLimit: 0

## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelLimit: 0

## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelNameLengthLimit: 0

## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelValueLengthLimit: 0

## proxyUrl: URL of a proxy that should be used for scraping.
##
proxyUrl: ""

## Enable scraping kube-proxy over https.
## Requires proper certs (not self-signed) and delegated authentication/authorization checks
##
https: false

## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
metricRelabelings: []
# - action: keep
#   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
#   sourceLabels: [__name__]

## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
relabelings: []
# - action: keep
#   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
#   sourceLabels: [__name__]

## Additional labels
##
additionalLabels: {}
#  foo: bar

Component scraping kube state metrics

kubeStateMetrics: enabled: true

Configuration for kube-state-metrics subchart

kube-state-metrics: namespaceOverride: "" rbac: create: true releaseLabel: true prometheus: monitor: enabled: true

  ## Scrape interval. If not set, the Prometheus default scrape interval is used.
  ##
  interval: ""

  ## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
  ##
  sampleLimit: 0

  ## TargetLimit defines a limit on the number of scraped targets that will be accepted.
  ##
  targetLimit: 0

  ## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
  ##
  labelLimit: 0

  ## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
  ##
  labelNameLengthLimit: 0

  ## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
  ##
  labelValueLengthLimit: 0

  ## Scrape Timeout. If not set, the Prometheus default scrape timeout is used.
  ##
  scrapeTimeout: ""

  ## proxyUrl: URL of a proxy that should be used for scraping.
  ##
  proxyUrl: ""

  # Keep labels from scraped data, overriding server-side labels
  ##
  honorLabels: true

  ## MetricRelabelConfigs to apply to samples after scraping, but before ingestion.
  ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
  ##
  metricRelabelings: []
  # - action: keep
  #   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
  #   sourceLabels: [__name__]

  ## RelabelConfigs to apply to samples before scraping
  ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
  ##
  relabelings: []
  # - sourceLabels: [__meta_kubernetes_pod_node_name]
  #   separator: ;
  #   regex: ^(.*)$
  #   targetLabel: nodename
  #   replacement: $1
  #   action: replace

selfMonitor: enabled: false

Deploy node exporter as a daemonset to all nodes

nodeExporter: enabled: true

Configuration for prometheus-node-exporter subchart

prometheus-node-exporter: namespaceOverride: "" podLabels:

Add the 'node-exporter' label to be used by serviceMonitor to match standard common usage in rules and grafana dashboards

##
jobLabel: node-exporter

releaseLabel: true extraArgs:

Manages Prometheus and Alertmanager components

prometheusOperator: enabled: true

Prometheus-Operator v0.39.0 and later support TLS natively.

tls: enabled: true

Value must match version names from https://golang.org/pkg/crypto/tls/#pkg-constants

tlsMinVersion: VersionTLS13
# The default webhook port is 10250 in order to work out-of-the-box in GKE private clusters and avoid adding firewall rules.
internalPort: 10250

Admission webhook support for PrometheusRules resources added in Prometheus Operator 0.30 can be enabled to prevent incorrectly formatted

rules from making their way into prometheus and potentially preventing the container from starting

admissionWebhooks:

Valid values: Fail, Ignore, IgnoreOnInstallOnly

## IgnoreOnInstallOnly - If Release.IsInstall returns "true", set "Ignore" otherwise "Fail"
failurePolicy: ""
## The default timeoutSeconds is 10 and the maximum value is 30.
timeoutSeconds: 10
enabled: true
## A PEM encoded CA bundle which will be used to validate the webhook's server certificate.
## If unspecified, system trust roots on the apiserver are used.
caBundle: ""
## If enabled, generate a self-signed certificate, then patch the webhook configurations with the generated data.
## On chart upgrades (or if the secret exists) the cert will not be re-generated. You can use this to provide your own
## certs ahead of time if you wish.
##
annotations: {}
#   argocd.argoproj.io/hook: PreSync
#   argocd.argoproj.io/hook-delete-policy: HookSucceeded
patch:
  enabled: true
  resources: {}
  ## Provide a priority class name to the webhook patching job
  ##
  priorityClassName: ""
  annotations: {}
  #   argocd.argoproj.io/hook: PreSync
  #   argocd.argoproj.io/hook-delete-policy: HookSucceeded
  podAnnotations: {}
  nodeSelector: {}
  affinity: {}
  tolerations: []

  ## SecurityContext holds pod-level security attributes and common container settings.
  ## This defaults to non root user with uid 2000 and gid 2000. *v1.PodSecurityContext  false
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
  ##
  securityContext:
    runAsGroup: 2000
    runAsNonRoot: true
    runAsUser: 2000
    seccompProfile:
      type: RuntimeDefault

# Security context for create job container
createSecretJob:
  securityContext:
    allowPrivilegeEscalation: false
    readOnlyRootFilesystem: true
    capabilities:
      drop:
      - ALL

  # Security context for patch job container
patchWebhookJob:
  securityContext:
    allowPrivilegeEscalation: false
    readOnlyRootFilesystem: true
    capabilities:
      drop:
      - ALL

# Use certmanager to generate webhook certs
certManager:
  enabled: false
  # self-signed root certificate
  rootCert:
    duration: ""  # default to be 5y
  admissionCert:
    duration: ""  # default to be 1y
  # issuerRef:
  #   name: "issuer"
  #   kind: "ClusterIssuer"

Namespaces to scope the interaction of the Prometheus Operator and the apiserver (allow list).

This is mutually exclusive with denyNamespaces. Setting this to an empty object will disable the configuration

namespaces: {}

releaseNamespace: true

# additional:
# - kube-system

Namespaces not to scope the interaction of the Prometheus Operator (deny list).

denyNamespaces: []

Filter namespaces to look for prometheus-operator custom resources

alertmanagerInstanceNamespaces: [] alertmanagerConfigNamespaces: [] prometheusInstanceNamespaces: [] thanosRulerInstanceNamespaces: []

The clusterDomain value will be added to the cluster.peer option of the alertmanager.

Without this specified option cluster.peer will have value alertmanager-monitoring-alertmanager-0.alertmanager-operated:9094 (default value)

With this specified option cluster.peer will have value alertmanager-monitoring-alertmanager-0.alertmanager-operated.namespace.svc.cluster-domain:9094

clusterDomain: "cluster.local"

networkPolicy:

Enable creation of NetworkPolicy resources.

##
enabled: false

## Flavor of the network policy to use.
#  Can be:
#  * kubernetes for networking.k8s.io/v1/NetworkPolicy
#  * cilium     for cilium.io/v2/CiliumNetworkPolicy
flavor: kubernetes

# cilium:
#   egress:

Service account for Alertmanager to use.

ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/

serviceAccount: create: true name: ""

Configuration for Prometheus operator service

service: annotations: {} labels: {} clusterIP: ""

Port to expose on each node

Only used if service.type is 'NodePort'

nodePort: 30080

nodePortTls: 30443

Additional ports to open for Prometheus service

ref: https://kubernetes.io/docs/concepts/services-networking/service/#multi-port-services

additionalPorts: []

Loadbalancer IP

Only use if service.type is "LoadBalancer"

loadBalancerIP: ""
loadBalancerSourceRanges: []

## Denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints
##
externalTrafficPolicy: Cluster

Service type

NodePort, ClusterIP, LoadBalancer

type: ClusterIP

## List of IP addresses at which the Prometheus server service is available
## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
##
externalIPs: []

Labels to add to the operator deployment

labels: {}

Annotations to add to the operator deployment

annotations: {}

Labels to add to the operator pod

podLabels: {}

Annotations to add to the operator pod

podAnnotations: {}

Assign a PriorityClassName to pods if set

priorityClassName: ""

Define Log Format

Use logfmt (default) or json logging

logFormat: logfmt

Decrease log verbosity to errors only

logLevel: error

If true, the operator will create and maintain a service for scraping kubelets

ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/helm/prometheus-operator/README.md

kubeletService: enabled: true namespace: kube-system

Use '{{ template "kube-prometheus-stack.fullname" . }}-kubelet' by default

name: ""

Create a servicemonitor for the operator

serviceMonitor:

Labels for ServiceMonitor

additionalLabels: {}

## Scrape interval. If not set, the Prometheus default scrape interval is used.
##
interval: ""

## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
##
sampleLimit: 0

## TargetLimit defines a limit on the number of scraped targets that will be accepted.
##
targetLimit: 0

## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelLimit: 0

## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelNameLengthLimit: 0

## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelValueLengthLimit: 0

## Scrape timeout. If not set, the Prometheus default scrape timeout is used.
scrapeTimeout: ""
selfMonitor: true

## Metric relabel configs to apply to samples before ingestion.
##
metricRelabelings: []
# - action: keep
#   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
#   sourceLabels: [__name__]

#   relabel configs to apply to samples before ingestion.
##
relabelings: []
# - sourceLabels: [__meta_kubernetes_pod_node_name]
#   separator: ;
#   regex: ^(.*)$
#   targetLabel: nodename
#   replacement: $1
#   action: replace

Resource limits & requests

resources: {}

limits:

cpu: 200m

memory: 200Mi

requests:

cpu: 100m

memory: 100Mi

Required for use in managed kubernetes clusters (such as AWS EKS) with custom CNI (such as calico),

because control-plane managed by AWS cannot communicate with pods' IP CIDR and admission webhooks are not working

hostNetwork: false

Define which Nodes the Pods are scheduled on.

ref: https://kubernetes.io/docs/user-guide/node-selection/

nodeSelector: {}

Tolerations for use with node taints

ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/

tolerations: []

- key: "key"

operator: "Equal"

value: "value"

effect: "NoSchedule"

Assign custom affinity rules to the prometheus operator

ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

affinity: {}

nodeAffinity:

#   requiredDuringSchedulingIgnoredDuringExecution:
#     nodeSelectorTerms:
#     - matchExpressions:
#       - key: kubernetes.io/e2e-az-name
#         operator: In
#         values:
#         - e2e-az1
#         - e2e-az2

dnsConfig: {}

nameservers:

#   - 1.2.3.4
# searches:
#   - ns1.svc.cluster-domain.example
#   - my.dns.search.suffix
# options:
#   - name: ndots
#     value: "2"

- name: edns0

securityContext: fsGroup: 65534 runAsGroup: 65534 runAsNonRoot: true runAsUser: 65534 seccompProfile: type: RuntimeDefault

Container-specific security context configuration

ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/

containerSecurityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop:

Deploy a Prometheus instance

prometheus: enabled: true

Annotations for Prometheus

annotations: {}

Configure network policy for the prometheus

networkPolicy: enabled: false

## Flavor of the network policy to use.
#  Can be:
#  * kubernetes for networking.k8s.io/v1/NetworkPolicy
#  * cilium     for cilium.io/v2/CiliumNetworkPolicy
flavor: kubernetes

# cilium:
#   endpointSelector:
#   egress:
#   ingress:

# egress:
# - {}
# ingress:
# - {}
# podSelector:
#   matchLabels:
#     app: prometheus

Service account for Prometheuses to use.

ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/

serviceAccount: create: true name: "" annotations: {}

Service for thanos service discovery on sidecar

Enable this can make Thanos Query can use

--store=dnssrv+_grpc._tcp.${kube-prometheus-stack.fullname}-thanos-discovery.${namespace}.svc.cluster.local to discovery

Thanos sidecar on prometheus nodes

(Please remember to change ${kube-prometheus-stack.fullname} and ${namespace}. Not just copy and paste!)

thanosService: enabled: false annotations: {} labels: {}

## Denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints
##
externalTrafficPolicy: Cluster

## Service type
##
type: ClusterIP

## gRPC port config
portName: grpc
port: 10901
targetPort: "grpc"

## HTTP port config (for metrics)
httpPortName: http
httpPort: 10902
targetHttpPort: "http"

## ClusterIP to assign
# Default is to make this a headless service ("None")
clusterIP: "None"

## Port to expose on each node, if service type is NodePort
##
nodePort: 30901
httpNodePort: 30902

ServiceMonitor to scrape Sidecar metrics

Needs thanosService to be enabled as well

thanosServiceMonitor: enabled: false interval: ""

## Additional labels
##
additionalLabels: {}

## scheme: HTTP scheme to use for scraping. Can be used with `tlsConfig` for example if using istio mTLS.
scheme: ""

## tlsConfig: TLS configuration to use when scraping the endpoint. For example if using istio mTLS.
## Of type: https://github.com/coreos/prometheus-operator/blob/main/Documentation/api.md#tlsconfig
tlsConfig: {}

bearerTokenFile:

## Metric relabel configs to apply to samples before ingestion.
metricRelabelings: []

## relabel configs to apply to samples before ingestion.
relabelings: []

Service for external access to sidecar

Enabling this creates a service to expose thanos-sidecar outside the cluster.

thanosServiceExternal: enabled: false annotations: {} labels: {} loadBalancerIP: "" loadBalancerSourceRanges: []

## gRPC port config
portName: grpc
port: 10901
targetPort: "grpc"

## HTTP port config (for metrics)
httpPortName: http
httpPort: 10902
targetHttpPort: "http"

## Denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints
##
externalTrafficPolicy: Cluster

## Service type
##
type: LoadBalancer

## Port to expose on each node
##
nodePort: 30901
httpNodePort: 30902

Configuration for Prometheus service

service: annotations: {} labels: {} clusterIP: ""

## Port for Prometheus Service to listen on
##
port: 9090

## To be used with a proxy extraContainer port
targetPort: 9090

## List of IP addresses at which the Prometheus server service is available
## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
##
externalIPs: []

## Port to expose on each node
## Only used if service.type is 'NodePort'
##
nodePort: 30090

## Loadbalancer IP
## Only use if service.type is "LoadBalancer"
loadBalancerIP: ""
loadBalancerSourceRanges: []

## Denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints
##
externalTrafficPolicy: Cluster

## Service type
##
type: ClusterIP

## Additional port to define in the Service
additionalPorts: []
# additionalPorts:
# - name: authenticated
#   port: 8081
#   targetPort: 8081

## Consider that all endpoints are considered "ready" even if the Pods themselves are not
## Ref: https://kubernetes.io/docs/reference/kubernetes-api/service-resources/service-v1/#ServiceSpec
publishNotReadyAddresses: false

sessionAffinity: ""

Configuration for creating a separate Service for each statefulset Prometheus replica

servicePerReplica: enabled: false annotations: {}

## Port for Prometheus Service per replica to listen on
##
port: 9090

## To be used with a proxy extraContainer port
targetPort: 9090

## Port to expose on each node
## Only used if servicePerReplica.type is 'NodePort'
##
nodePort: 30091

## Loadbalancer source IP ranges
## Only used if servicePerReplica.type is "LoadBalancer"
loadBalancerSourceRanges: []

## Denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints
##
externalTrafficPolicy: Cluster

## Service type
##
type: ClusterIP

Configure pod disruption budgets for Prometheus

ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/#specifying-a-poddisruptionbudget

This configuration is immutable once created and will require the PDB to be deleted to be changed

https://github.com/kubernetes/kubernetes/issues/45398

podDisruptionBudget: enabled: false minAvailable: 1 maxUnavailable: ""

Ingress exposes thanos sidecar outside the cluster

thanosIngress: enabled: false

# For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
# See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
# ingressClassName: nginx

annotations: {}
labels: {}
servicePort: 10901

## Port to expose on each node
## Only used if service.type is 'NodePort'
##
nodePort: 30901

## Hosts must be provided if Ingress is enabled.
##
hosts: []
  # - thanos-gateway.domain.com

## Paths to use for ingress rules
##
paths: []
# - /

## For Kubernetes >= 1.18 you should specify the pathType (determines how Ingress paths should be matched)
## See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#better-path-matching-with-path-types
# pathType: ImplementationSpecific

## TLS configuration for Thanos Ingress
## Secret must be manually created in the namespace
##
tls: []
# - secretName: thanos-gateway-tls
#   hosts:
#   - thanos-gateway.domain.com
#

ExtraSecret can be used to store various data in an extra secret

(use it for example to store hashed basic auth credentials)

extraSecret:

if not set, name will be auto generated

# name: ""
annotations: {}
data: {}

auth: |

foo:$apr1$OFG3Xybp$ckL0FHDAkoXYIlH9.cysT0

someoneelse:$apr1$DMZX2Z4q$6SbQIfyuLQd.xmo/P0m2c.

ingress: enabled: false

# For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
# See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
# ingressClassName: nginx

annotations: {}
labels: {}

## Redirect ingress to an additional defined port on the service
# servicePort: 8081

## Hostnames.
## Must be provided if Ingress is enabled.
##
# hosts:
#   - prometheus.domain.com
hosts: []

## Paths to use for ingress rules - one path should match the prometheusSpec.routePrefix
##
paths: []
# - /

## For Kubernetes >= 1.18 you should specify the pathType (determines how Ingress paths should be matched)
## See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#better-path-matching-with-path-types
# pathType: ImplementationSpecific

## TLS configuration for Prometheus Ingress
## Secret must be manually created in the namespace
##
tls: []
  # - secretName: prometheus-general-tls
  #   hosts:
  #     - prometheus.example.com

Configuration for creating an Ingress that will map to each Prometheus replica service

prometheus.servicePerReplica must be enabled

ingressPerReplica: enabled: false

# For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
# See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
# ingressClassName: nginx

annotations: {}
labels: {}

## Final form of the hostname for each per replica ingress is
## {{ ingressPerReplica.hostPrefix }}-{{ $replicaNumber }}.{{ ingressPerReplica.hostDomain }}
##
## Prefix for the per replica ingress that will have `-$replicaNumber`
## appended to the end
hostPrefix: ""
## Domain that will be used for the per replica ingress
hostDomain: ""

## Paths to use for ingress rules
##
paths: []
# - /

## For Kubernetes >= 1.18 you should specify the pathType (determines how Ingress paths should be matched)
## See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#better-path-matching-with-path-types
# pathType: ImplementationSpecific

## Secret name containing the TLS certificate for Prometheus per replica ingress
## Secret must be manually created in the namespace
tlsSecretName: ""

## Separated secret for each per replica Ingress. Can be used together with cert-manager
##
tlsSecretPerReplica:
  enabled: false
  ## Final form of the secret for each per replica ingress is
  ## {{ tlsSecretPerReplica.prefix }}-{{ $replicaNumber }}
  ##
  prefix: "prometheus"

Configure additional options for default pod security policy for Prometheus

ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/

podSecurityPolicy: allowedCapabilities: [] allowedHostPaths: [] volumes: []

serviceMonitor:

Scrape interval. If not set, the Prometheus default scrape interval is used.

##
interval: ""
selfMonitor: true

## Additional labels
##
additionalLabels: {}

## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
##
sampleLimit: 0

## TargetLimit defines a limit on the number of scraped targets that will be accepted.
##
targetLimit: 0

## Per-scrape limit on number of labels that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelLimit: 0

## Per-scrape limit on length of labels name that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelNameLengthLimit: 0

## Per-scrape limit on length of labels value that will be accepted for a sample. Only valid in Prometheus versions 2.27.0 and newer.
##
labelValueLengthLimit: 0

## scheme: HTTP scheme to use for scraping. Can be used with `tlsConfig` for example if using istio mTLS.
scheme: ""

## tlsConfig: TLS configuration to use when scraping the endpoint. For example if using istio mTLS.
## Of type: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#tlsconfig
tlsConfig: {}

bearerTokenFile:

## Metric relabel configs to apply to samples before ingestion.
##
metricRelabelings: []
# - action: keep
#   regex: 'kube_(daemonset|deployment|pod|namespace|node|statefulset).+'
#   sourceLabels: [__name__]

#   relabel configs to apply to samples before ingestion.
##
relabelings: []
# - sourceLabels: [__meta_kubernetes_pod_node_name]
#   separator: ;
#   regex: ^(.*)$
#   targetLabel: nodename
#   replacement: $1
#   action: replace

Settings affecting prometheusSpec

ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#prometheusspec

prometheusSpec:

If true, pass --storage.tsdb.max-block-duration=2h to prometheus. This is already done if using Thanos

##
disableCompaction: false
## APIServerConfig
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#apiserverconfig
##
apiserverConfig: {}

## Allows setting additional arguments for the Prometheus container
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.Prometheus
additionalArgs: []

## Interval between consecutive scrapes.
## Defaults to 30s.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/release-0.44/pkg/prometheus/promcfg.go#L180-L183
##
scrapeInterval: ""

## Number of seconds to wait for target to respond before erroring
##
scrapeTimeout: ""

## Interval between consecutive evaluations.
##
evaluationInterval: ""

## ListenLocal makes the Prometheus server listen on loopback, so that it does not bind against the Pod IP.
##
listenLocal: false

## EnableAdminAPI enables Prometheus the administrative HTTP API which includes functionality such as deleting time series.
## This is disabled by default.
## ref: https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-admin-apis
##
enableAdminAPI: false

## Sets version of Prometheus overriding the Prometheus version as derived
## from the image tag. Useful in cases where the tag does not follow semver v2.
version: ""

## WebTLSConfig defines the TLS parameters for HTTPS
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#webtlsconfig
web: {}

## Exemplars related settings that are runtime reloadable.
## It requires to enable the exemplar storage feature to be effective.
exemplars: ""
  ## Maximum number of exemplars stored in memory for all series.
  ## If not set, Prometheus uses its default value.
  ## A value of zero or less than zero disables the storage.
  # maxSize: 100000

# EnableFeatures API enables access to Prometheus disabled features.
# ref: https://prometheus.io/docs/prometheus/latest/disabled_features/
enableFeatures: []
# - exemplar-storage

## Image of Prometheus.
##

## Tolerations for use with node taints
## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
##
tolerations: []
#  - key: "key"
#    operator: "Equal"
#    value: "value"
#    effect: "NoSchedule"

## If specified, the pod's topology spread constraints.
## ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/
##
topologySpreadConstraints: []
# - maxSkew: 1
#   topologyKey: topology.kubernetes.io/zone
#   whenUnsatisfiable: DoNotSchedule
#   labelSelector:
#     matchLabels:
#       app: prometheus

## Alertmanagers to which alerts will be sent
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#alertmanagerendpoints
##
## Default configuration will connect to the alertmanager deployed as part of this release
##
alertingEndpoints: []
# - name: ""
#   namespace: ""
#   port: http
#   scheme: http
#   pathPrefix: ""
#   tlsConfig: {}
#   bearerTokenFile: ""
#   apiVersion: v2

## External labels to add to any time series or alerts when communicating with external systems
##
externalLabels: {}

## enable --web.enable-remote-write-receiver flag on prometheus-server
##
enableRemoteWriteReceiver: false

## Name of the external label used to denote replica name
##
replicaExternalLabelName: ""

## If true, the Operator won't add the external label used to denote replica name
##
replicaExternalLabelNameClear: false

## Name of the external label used to denote Prometheus instance name
##
prometheusExternalLabelName: ""

## If true, the Operator won't add the external label used to denote Prometheus instance name
##
prometheusExternalLabelNameClear: false

## External URL at which Prometheus will be reachable.
##
externalUrl: ""

## Define which Nodes the Pods are scheduled on.
## ref: https://kubernetes.io/docs/user-guide/node-selection/
##
nodeSelector: {}

## Secrets is a list of Secrets in the same namespace as the Prometheus object, which shall be mounted into the Prometheus Pods.
## The Secrets are mounted into /etc/prometheus/secrets/. Secrets changes after initial creation of a Prometheus object are not
## reflected in the running Pods. To change the secrets mounted into the Prometheus Pods, the object must be deleted and recreated
## with the new list of secrets.
##
secrets: []

## ConfigMaps is a list of ConfigMaps in the same namespace as the Prometheus object, which shall be mounted into the Prometheus Pods.
## The ConfigMaps are mounted into /etc/prometheus/configmaps/.
##
configMaps: []

## QuerySpec defines the query command line flags when starting Prometheus.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#queryspec
##
query: {}

## If nil, select own namespace. Namespaces to be selected for PrometheusRules discovery.
ruleNamespaceSelector: {}
## Example which selects PrometheusRules in namespaces with label "prometheus" set to "somelabel"
# ruleNamespaceSelector:
#   matchLabels:
#     prometheus: somelabel

## If true, a nil or {} value for prometheus.prometheusSpec.ruleSelector will cause the
## prometheus resource to be created with selectors based on values in the helm deployment,
## which will also match the PrometheusRule resources created
##
ruleSelectorNilUsesHelmValues: false

## PrometheusRules to be selected for target discovery.
## If {}, select all PrometheusRules
##
ruleSelector: {}
## Example which select all PrometheusRules resources
## with label "prometheus" with values any of "example-rules" or "example-rules-2"
# ruleSelector:
#   matchExpressions:
#     - key: prometheus
#       operator: In
#       values:
#         - example-rules
#         - example-rules-2
#
## Example which select all PrometheusRules resources with label "role" set to "example-rules"
# ruleSelector:
#   matchLabels:
#     role: example-rules

## If true, a nil or {} value for prometheus.prometheusSpec.serviceMonitorSelector will cause the
## prometheus resource to be created with selectors based on values in the helm deployment,
## which will also match the servicemonitors created
##
serviceMonitorSelectorNilUsesHelmValues: false

## ServiceMonitors to be selected for target discovery.
## If {}, select all ServiceMonitors
##
serviceMonitorSelector: {}
## Example which selects ServiceMonitors with label "prometheus" set to "somelabel"
# serviceMonitorSelector:
#   matchLabels:
#     prometheus: somelabel

## Namespaces to be selected for ServiceMonitor discovery.
##
serviceMonitorNamespaceSelector: {}
## Example which selects ServiceMonitors in namespaces with label "prometheus" set to "somelabel"
# serviceMonitorNamespaceSelector:
#   matchLabels:
#     prometheus: somelabel

## If true, a nil or {} value for prometheus.prometheusSpec.podMonitorSelector will cause the
## prometheus resource to be created with selectors based on values in the helm deployment,
## which will also match the podmonitors created
##
podMonitorSelectorNilUsesHelmValues: false

## PodMonitors to be selected for target discovery.
## If {}, select all PodMonitors
##
podMonitorSelector: {}
## Example which selects PodMonitors with label "prometheus" set to "somelabel"
# podMonitorSelector:
#   matchLabels:
#     prometheus: somelabel

## If nil, select own namespace. Namespaces to be selected for PodMonitor discovery.
podMonitorNamespaceSelector: {}
## Example which selects PodMonitor in namespaces with label "prometheus" set to "somelabel"
# podMonitorNamespaceSelector:
#   matchLabels:
#     prometheus: somelabel

## If true, a nil or {} value for prometheus.prometheusSpec.probeSelector will cause the
## prometheus resource to be created with selectors based on values in the helm deployment,
## which will also match the probes created
##
probeSelectorNilUsesHelmValues: false

## Probes to be selected for target discovery.
## If {}, select all Probes
##
probeSelector: {}
## Example which selects Probes with label "prometheus" set to "somelabel"
# probeSelector:
#   matchLabels:
#     prometheus: somelabel

## If nil, select own namespace. Namespaces to be selected for Probe discovery.
probeNamespaceSelector: {}
## Example which selects Probe in namespaces with label "prometheus" set to "somelabel"
# probeNamespaceSelector:
#   matchLabels:
#     prometheus: somelabel

## If true, a nil or {} value for prometheus.prometheusSpec.scrapeConfigSelector will cause the
## prometheus resource to be created with selectors based on values in the helm deployment,
## which will also match the scrapeConfigs created
##
scrapeConfigSelectorNilUsesHelmValues: true

## scrapeConfigs to be selected for target discovery.
## If {}, select all scrapeConfigs
##
scrapeConfigSelector: {}
## Example which selects scrapeConfigs with label "prometheus" set to "somelabel"
# scrapeConfig:
#   matchLabels:
#     prometheus: somelabel

## If nil, select own namespace. Namespaces to be selected for scrapeConfig discovery.
scrapeConfigNamespaceSelector: {}
## Example which selects scrapeConfig in namespaces with label "prometheus" set to "somelabel"
# scrapeConfigsNamespaceSelector:
#   matchLabels:
#     prometheus: somelabel

## How long to retain metrics
##
retention: 30d

## Maximum size of metrics
##
retentionSize: ""

## Allow out-of-order/out-of-bounds samples ingested into Prometheus for a specified duration
## See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tsdb
tsdb:
  outOfOrderTimeWindow: 0s

## Enable compression of the write-ahead log using Snappy.
##
walCompression: true

## If true, the Operator won't process any Prometheus configuration changes
##
paused: false

## Number of replicas of each shard to deploy for a Prometheus deployment.
## Number of replicas multiplied by shards is the total number of Pods created.
##
replicas: 1

## EXPERIMENTAL: Number of shards to distribute targets onto.
## Number of replicas multiplied by shards is the total number of Pods created.
## Note that scaling down shards will not reshard data onto remaining instances, it must be manually moved.
## Increasing shards will not reshard data either but it will continue to be available from the same instances.
## To query globally use Thanos sidecar and Thanos querier or remote write data to a central location.
## Sharding is done on the content of the `__address__` target meta-label.
##
shards: 1

## Log level for Prometheus be configured in
##
logLevel: info

## Log format for Prometheus be configured in
##
logFormat: logfmt

## Prefix used to register routes, overriding externalUrl route.
## Useful for proxies that rewrite URLs.
##
routePrefix: /

## Standard object's metadata. More info: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#metadata
## Metadata Labels and Annotations gets propagated to the prometheus pods.
##
podMetadata: {}
# labels:
#   app: prometheus
#   k8s-app: prometheus

## Pod anti-affinity can prevent the scheduler from placing Prometheus replicas on the same node.
## The default value "soft" means that the scheduler should *prefer* to not schedule two replica pods onto the same node but no guarantee is provided.
## The value "hard" means that the scheduler is *required* to not schedule two replica pods onto the same node.
## The value "" will disable pod anti-affinity so that no anti-affinity rules will be configured.
podAntiAffinity: ""

## If anti-affinity is enabled sets the topologyKey to use for anti-affinity.
## This can be changed to, for example, failure-domain.beta.kubernetes.io/zone
##
podAntiAffinityTopologyKey: kubernetes.io/hostname

## Assign custom affinity rules to the prometheus instance
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
##
affinity: {}
# nodeAffinity:
#   requiredDuringSchedulingIgnoredDuringExecution:
#     nodeSelectorTerms:
#     - matchExpressions:
#       - key: kubernetes.io/e2e-az-name
#         operator: In
#         values:
#         - e2e-az1
#         - e2e-az2

## The remote_read spec configuration for Prometheus.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#remotereadspec
remoteRead: []
# - url: http://remote1/read
## additionalRemoteRead is appended to remoteRead
additionalRemoteRead: []

## The remote_write spec configuration for Prometheus.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#remotewritespec
remoteWrite: []
# - url: http://remote1/push
## additionalRemoteWrite is appended to remoteWrite
additionalRemoteWrite: []

## Enable/Disable Grafana dashboards provisioning for prometheus remote write feature
remoteWriteDashboards: false

## Resource limits & requests
##
resources:
  limits:
    cpu: 4
    memory: 12000Mi
  requests:
    cpu: 2
    memory: 4000Mi

## Prometheus StorageSpec for persistent data
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/storage.md
##
storageSpec: 
## Using PersistentVolumeClaim
##
  volumeClaimTemplate:
    spec:
      storageClassName: rook-ceph-block
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 100Gi

## Using tmpfs volume
##
#  emptyDir:
#    medium: Memory

# Additional volumes on the output StatefulSet definition.
volumes:
  - name: config-volume
    configMap:
      name: prometheus-config

# Additional VolumeMounts on the output StatefulSet definition.
volumeMounts: 
- name: config-volume
  mountPath: /etc/config

## AdditionalScrapeConfigs allows specifying additional Prometheus scrape configurations. Scrape configurations
## are appended to the configurations generated by the Prometheus Operator. Job configurations must have the form
## as specified in the official Prometheus documentation:
## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config. As scrape configs are
## appended, the user is responsible to make sure it is valid. Note that using this feature may expose the possibility
## to break upgrades of Prometheus. It is advised to review Prometheus release notes to ensure that no incompatible
## scrape configs are going to break Prometheus after the upgrade.
## AdditionalScrapeConfigs can be defined as a list or as a templated string.
##
## The scrape configuration example below will find master nodes, provided they have the name .*mst.*, relabel the
## port to 2379 and allow etcd scraping provided it is running on all Kubernetes master nodes
##
additionalScrapeConfigs: []
# - job_name: kube-etcd
#   kubernetes_sd_configs:
#     - role: node
#   scheme: https
#   tls_config:
#     ca_file:   /etc/prometheus/secrets/etcd-client-cert/etcd-ca
#     cert_file: /etc/prometheus/secrets/etcd-client-cert/etcd-client
#     key_file:  /etc/prometheus/secrets/etcd-client-cert/etcd-client-key
#   relabel_configs:
#   - action: labelmap
#     regex: __meta_kubernetes_node_label_(.+)
#   - source_labels: [__address__]
#     action: replace
#     targetLabel: __address__
#     regex: ([^:;]+):(\d+)
#     replacement: ${1}:2379
#   - source_labels: [__meta_kubernetes_node_name]
#     action: keep
#     regex: .*mst.*
#   - source_labels: [__meta_kubernetes_node_name]
#     action: replace
#     targetLabel: node
#     regex: (.*)
#     replacement: ${1}
#   metric_relabel_configs:
#   - regex: (kubernetes_io_hostname|failure_domain_beta_kubernetes_io_region|beta_kubernetes_io_os|beta_kubernetes_io_arch|beta_kubernetes_io_instance_type|failure_domain_beta_kubernetes_io_zone)
#     action: labeldrop
#
## If scrape config contains a repetitive section, you may want to use a template.
## In the following example, you can see how to define `gce_sd_configs` for multiple zones
# additionalScrapeConfigs: |
#  - job_name: "node-exporter"
#    gce_sd_configs:
#    {{range $zone := .Values.gcp_zones}}
#    - project: "project1"
#      zone: "{{$zone}}"
#      port: 9100
#    {{end}}
#    relabel_configs:
#    ...

## If additional scrape configurations are already deployed in a single secret file you can use this section.
## Expected values are the secret name and key
## Cannot be used with additionalScrapeConfigs
additionalScrapeConfigsSecret: {}
  # enabled: false
  # name:
  # key:

## additionalPrometheusSecretsAnnotations allows to add annotations to the kubernetes secret. This can be useful
## when deploying via spinnaker to disable versioning on the secret, strategy.spinnaker.io/versioned: 'false'
additionalPrometheusSecretsAnnotations: {}

## AdditionalAlertManagerConfigs allows for manual configuration of alertmanager jobs in the form as specified
## in the official Prometheus documentation https://prometheus.io/docs/prometheus/latest/configuration/configuration/#<alertmanager_config>.
## AlertManager configurations specified are appended to the configurations generated by the Prometheus Operator.
## As AlertManager configs are appended, the user is responsible to make sure it is valid. Note that using this
## feature may expose the possibility to break upgrades of Prometheus. It is advised to review Prometheus release
## notes to ensure that no incompatible AlertManager configs are going to break Prometheus after the upgrade.
##
additionalAlertManagerConfigs: []
# - consul_sd_configs:
#   - server: consul.dev.test:8500
#     scheme: http
#     datacenter: dev
#     tag_separator: ','
#     services:
#       - metrics-prometheus-alertmanager

## If additional alertmanager configurations are already deployed in a single secret, or you want to manage
## them separately from the helm deployment, you can use this section.
## Expected values are the secret name and key
## Cannot be used with additionalAlertManagerConfigs
additionalAlertManagerConfigsSecret: {}
  # name:
  # key:
  # optional: false

## AdditionalAlertRelabelConfigs allows specifying Prometheus alert relabel configurations. Alert relabel configurations specified are appended
## to the configurations generated by the Prometheus Operator. Alert relabel configurations specified must have the form as specified in the
## official Prometheus documentation: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#alert_relabel_configs.
## As alert relabel configs are appended, the user is responsible to make sure it is valid. Note that using this feature may expose the
## possibility to break upgrades of Prometheus. It is advised to review Prometheus release notes to ensure that no incompatible alert relabel
## configs are going to break Prometheus after the upgrade.
##
additionalAlertRelabelConfigs: []
# - separator: ;
#   regex: prometheus_replica
#   replacement: $1
#   action: labeldrop

## If additional alert relabel configurations are already deployed in a single secret, or you want to manage
## them separately from the helm deployment, you can use this section.
## Expected values are the secret name and key
## Cannot be used with additionalAlertRelabelConfigs
additionalAlertRelabelConfigsSecret: {}
  # name:
  # key:

## SecurityContext holds pod-level security attributes and common container settings.
## This defaults to non root user with uid 1000 and gid 2000.
## https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md
##
securityContext:
  runAsGroup: 2000
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 2000
  seccompProfile:
    type: RuntimeDefault

## Priority class assigned to the Pods
##
priorityClassName: ""

## Thanos configuration allows configuring various aspects of a Prometheus server in a Thanos environment.
## This section is experimental, it may change significantly without deprecation notice in any release.
## This is experimental and may change significantly without backward compatibility in any release.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#thanosspec
##
thanos: {}
  # secretProviderClass:
  #   provider: gcp
  #   parameters:
  #     secrets: |
  #       - resourceName: "projects/$PROJECT_ID/secrets/testsecret/versions/latest"
  #         fileName: "objstore.yaml"
  # objectStorageConfigFile: /var/secrets/object-store.yaml

## Containers allows injecting additional containers. This is meant to allow adding an authentication proxy to a Prometheus pod.
## if using proxy extraContainer update targetPort with proxy container port
containers: []

## InitContainers allows injecting additional initContainers. This is meant to allow doing some changes
## (permissions, dir tree) on mounted volumes before starting prometheus
initContainers: []

## PortName to use for Prometheus.
##
portName: "http-web"

## ArbitraryFSAccessThroughSMs configures whether configuration based on a service monitor can access arbitrary files
## on the file system of the Prometheus container e.g. bearer token files.
arbitraryFSAccessThroughSMs: false

## OverrideHonorLabels if set to true overrides all user configured honor_labels. If HonorLabels is set in ServiceMonitor
## or PodMonitor to true, this overrides honor_labels to false.
overrideHonorLabels: false

## OverrideHonorTimestamps allows to globally enforce honoring timestamps in all scrape configs.
overrideHonorTimestamps: false

## IgnoreNamespaceSelectors if set to true will ignore NamespaceSelector settings from the podmonitor and servicemonitor
## configs, and they will only discover endpoints within their current namespace. Defaults to false.
ignoreNamespaceSelectors: false

## EnforcedNamespaceLabel enforces adding a namespace label of origin for each alert and metric that is user created.
## The label value will always be the namespace of the object that is being created.
## Disabled by default
enforcedNamespaceLabel: ""

## PrometheusRulesExcludedFromEnforce - list of prometheus rules to be excluded from enforcing of adding namespace labels.
## Works only if enforcedNamespaceLabel set to true. Make sure both ruleNamespace and ruleName are set for each pair
## Deprecated, use `excludedFromEnforcement` instead
prometheusRulesExcludedFromEnforce: []

## ExcludedFromEnforcement - list of object references to PodMonitor, ServiceMonitor, Probe and PrometheusRule objects
## to be excluded from enforcing a namespace label of origin.
## Works only if enforcedNamespaceLabel set to true.
## See https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#objectreference
excludedFromEnforcement: []

## QueryLogFile specifies the file to which PromQL queries are logged. Note that this location must be writable,
## and can be persisted using an attached volume. Alternatively, the location can be set to a stdout location such
## as /dev/stdout to log querie information to the default Prometheus log stream. This is only available in versions
## of Prometheus >= 2.16.0. For more details, see the Prometheus docs (https://prometheus.io/docs/guides/query-log/)
queryLogFile: false

## EnforcedSampleLimit defines global limit on number of scraped samples that will be accepted. This overrides any SampleLimit
## set per ServiceMonitor or/and PodMonitor. It is meant to be used by admins to enforce the SampleLimit to keep overall
## number of samples/series under the desired limit. Note that if SampleLimit is lower that value will be taken instead.
enforcedSampleLimit: false

## EnforcedTargetLimit defines a global limit on the number of scraped targets. This overrides any TargetLimit set
## per ServiceMonitor or/and PodMonitor. It is meant to be used by admins to enforce the TargetLimit to keep the overall
## number of targets under the desired limit. Note that if TargetLimit is lower, that value will be taken instead, except
## if either value is zero, in which case the non-zero value will be used. If both values are zero, no limit is enforced.
enforcedTargetLimit: false

## Per-scrape limit on number of labels that will be accepted for a sample. If more than this number of labels are present
## post metric-relabeling, the entire scrape will be treated as failed. 0 means no limit. Only valid in Prometheus versions
## 2.27.0 and newer.
enforcedLabelLimit: false

## Per-scrape limit on length of labels name that will be accepted for a sample. If a label name is longer than this number
## post metric-relabeling, the entire scrape will be treated as failed. 0 means no limit. Only valid in Prometheus versions
## 2.27.0 and newer.
enforcedLabelNameLengthLimit: false

## Per-scrape limit on length of labels value that will be accepted for a sample. If a label value is longer than this
## number post metric-relabeling, the entire scrape will be treated as failed. 0 means no limit. Only valid in Prometheus
## versions 2.27.0 and newer.
enforcedLabelValueLengthLimit: false

## AllowOverlappingBlocks enables vertical compaction and vertical query merge in Prometheus. This is still experimental
## in Prometheus so it may change in any upcoming release.
allowOverlappingBlocks: false

## Minimum number of seconds for which a newly created pod should be ready without any of its container crashing for it to
## be considered available. Defaults to 0 (pod will be considered available as soon as it is ready).
minReadySeconds: 0

# Required for use in managed kubernetes clusters (such as AWS EKS) with custom CNI (such as calico),
# because control-plane managed by AWS cannot communicate with pods' IP CIDR and admission webhooks are not working
# Use the host's network namespace if true. Make sure to understand the security implications if you want to enable it.
# When hostNetwork is enabled, this will set dnsPolicy to ClusterFirstWithHostNet automatically.
hostNetwork: false

# HostAlias holds the mapping between IP and hostnames that will be injected
# as an entry in the pod’s hosts file.
hostAliases: []
#  - ip: 10.10.0.100
#    hostnames:
#      - a1.app.local
#      - b1.app.local

additionalRulesForClusterRole: []

- apiGroups: [ "" ]

resources:

- nodes/proxy

verbs: [ "get", "list", "watch" ]

additionalServiceMonitors: []

Name of the ServiceMonitor to create

- name: ""

## Additional labels to set used for the ServiceMonitorSelector. Together with standard labels from
## the chart
##
# additionalLabels: {}

## Service label for use in assembling a job name of the form <label value>-<port>
## If no label is specified, the service name is used.
##
# jobLabel: ""

## labels to transfer from the kubernetes service to the target
##
# targetLabels: []

## labels to transfer from the kubernetes pods to the target
##
# podTargetLabels: []

## Label selector for services to which this ServiceMonitor applies
##
# selector: {}

## Namespaces from which services are selected
##
# namespaceSelector:
  ## Match any namespace
  ##
  # any: false

  ## Explicit list of namespace names to select
  ##
  # matchNames: []

## Endpoints of the selected service to be monitored
##
# endpoints: []
  ## Name of the endpoint's service port
  ## Mutually exclusive with targetPort
  # - port: ""

  ## Name or number of the endpoint's target port
  ## Mutually exclusive with port
  # - targetPort: ""

  ## File containing bearer token to be used when scraping targets
  ##
  #   bearerTokenFile: ""

  ## Interval at which metrics should be scraped
  ##
  #   interval: 30s

  ## HTTP path to scrape for metrics
  ##
  #   path: /metrics

  ## HTTP scheme to use for scraping
  ##
  #   scheme: http

  ## TLS configuration to use when scraping the endpoint
  ##
  #   tlsConfig:

      ## Path to the CA file
      ##
      # caFile: ""

      ## Path to client certificate file
      ##
      # certFile: ""

      ## Skip certificate verification
      ##
      # insecureSkipVerify: false

      ## Path to client key file
      ##
      # keyFile: ""

      ## Server name used to verify host name
      ##
      # serverName: ""

additionalPodMonitors: []

Name of the PodMonitor to create

- name: ""

## Additional labels to set used for the PodMonitorSelector. Together with standard labels from
## the chart
##
# additionalLabels: {}

## Pod label for use in assembling a job name of the form <label value>-<port>
## If no label is specified, the pod endpoint name is used.
##
# jobLabel: ""

## Label selector for pods to which this PodMonitor applies
##
# selector: {}

## PodTargetLabels transfers labels on the Kubernetes Pod onto the target.
##
# podTargetLabels: {}

## SampleLimit defines per-scrape limit on number of scraped samples that will be accepted.
##
# sampleLimit: 0

## Namespaces from which pods are selected
##
# namespaceSelector:
  ## Match any namespace
  ##
  # any: false

  ## Explicit list of namespace names to select
  ##
  # matchNames: []

## Endpoints of the selected pods to be monitored
## https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#podmetricsendpoint
##
# podMetricsEndpoints: []

Configuration for thanosRuler

ref: https://thanos.io/tip/components/rule.md/

cleanPrometheusOperatorObjectNames: false

Extra manifests to deploy as an array

extraManifests: []

- apiVersion: v1

kind: ConfigMap

metadata:

labels:

name: prometheus-extra

data:

extra-data: "value"

`

mlucas99 commented 3 months ago
{"time": "2023-07-12T09:31:03.000748+00:00", "msg": "Writing /etc/grafana/provisioning/alerting/alerts.yaml (ascii)", "level": "INFO"}
{"time": "2023-07-12T09:31:40.140993+00:00", "msg": "Received unknown exception: HTTPConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /api/admin/provisioning/alerting/reload (Caused by ResponseError('too many 500 error responses'))\n", "level": "ERROR"}
Traceback (most recent call last):
  File "/app/.venv/lib/python3.11/site-packages/requests/adapters.py", line 487, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  [Previous line repeated 2 more times]
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 879, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)

Same problem in my situation :/

Sorry if this is not directly related to the original issue but if I can help someone out, I'd like to try.

So I was getting the same 'too many 500 error responses' when my sidecar was reloading my datasources. Subsequently I wasn't seeing the datasources I expected in Grafana. I'm guessing it's somethin similar in your case with alerts.

So after a couple days of debugging it of course was something pretty obvious, I was also using the loki-stack helm chart which creates a configmap for "loki-stack-datasource.yaml" which specifies it's own datasource. The problem being it was setting it's "isDefault" parameter to true and only 1 datasource can be set to default.

The 500 error I was receiving is actually expected, as I had multiple datasources configured as default. As soon as I fixed the underlying issue by, setting the loki.isDefault parameter on the helm chart to false, the error went away and the datasources reloaded as expected.

Perhaps you have a similar error in one of your alerting configs, which is then causing the reload to return a 500 as well and thus never actually reloading.

romosa commented 3 months ago
{"time": "2023-07-12T09:31:03.000748+00:00", "msg": "Writing /etc/grafana/provisioning/alerting/alerts.yaml (ascii)", "level": "INFO"}
{"time": "2023-07-12T09:31:40.140993+00:00", "msg": "Received unknown exception: HTTPConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /api/admin/provisioning/alerting/reload (Caused by ResponseError('too many 500 error responses'))\n", "level": "ERROR"}
Traceback (most recent call last):
  File "/app/.venv/lib/python3.11/site-packages/requests/adapters.py", line 487, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  [Previous line repeated 2 more times]
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 879, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)

Same problem in my situation :/

Sorry if this is not directly related to the original issue but if I can help someone out, I'd like to try.

So I was getting the same 'too many 500 error responses' when my sidecar was reloading my datasources. Subsequently I wasn't seeing the datasources I expected in Grafana. I'm guessing it's somethin similar in your case with alerts.

So after a couple days of debugging it of course was something pretty obvious, I was also using the loki-stack helm chart which creates a configmap for "loki-stack-datasource.yaml" which specifies it's own datasource. The problem being it was setting it's "isDefault" parameter to true and only 1 datasource can be set to default.

The 500 error I was receiving is actually expected, as I had multiple datasources configured as default. As soon as I fixed the underlying issue by, setting the loki.isDefault parameter on the helm chart to false, the error went away and the datasources reloaded as expected.

Perhaps you have a similar error in one of your alerting configs, which is then causing the reload to return a 500 as well and thus never actually reloading.

This is the issue for me. I have loki helm chart as well, after setting the loki datasource isDefault to false, the error was gone. Thanks!!

k4d1sm0 commented 3 months ago
{"time": "2023-07-12T09:31:03.000748+00:00", "msg": "Writing /etc/grafana/provisioning/alerting/alerts.yaml (ascii)", "level": "INFO"}
{"time": "2023-07-12T09:31:40.140993+00:00", "msg": "Received unknown exception: HTTPConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /api/admin/provisioning/alerting/reload (Caused by ResponseError('too many 500 error responses'))\n", "level": "ERROR"}
Traceback (most recent call last):
  File "/app/.venv/lib/python3.11/site-packages/requests/adapters.py", line 487, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 889, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  [Previous line repeated 2 more times]
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 879, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)

Same problem in my situation :/

Sorry if this is not directly related to the original issue but if I can help someone out, I'd like to try.

So I was getting the same 'too many 500 error responses' when my sidecar was reloading my datasources. Subsequently I wasn't seeing the datasources I expected in Grafana. I'm guessing it's somethin similar in your case with alerts.

So after a couple days of debugging it of course was something pretty obvious, I was also using the loki-stack helm chart which creates a configmap for "loki-stack-datasource.yaml" which specifies it's own datasource. The problem being it was setting it's "isDefault" parameter to true and only 1 datasource can be set to default.

The 500 error I was receiving is actually expected, as I had multiple datasources configured as default. As soon as I fixed the underlying issue by, setting the loki.isDefault parameter on the helm chart to false, the error went away and the datasources reloaded as expected.

Perhaps you have a similar error in one of your alerting configs, which is then causing the reload to return a 500 as well and thus never actually reloading.

This is the problem for me too, I'm using Loki and Prometheus as data sources. Sorry for my stupid question, but why do loki data sources directly interfere with a separate installed grafana? In my opinion, grafana data sources should only be created when installing a grafana helm chart, not when installing the loki one, and if in the loki helm chart we decide to install grafana then add the loki configmap to grafana which was installed within Loki.