prometheus-community / helm-charts

Prometheus community Helm charts
Apache License 2.0
5.04k stars 5k forks source link

[kube-prometheus-stack] Prometheus not created if additionalArgs are set #4266

Open fniko opened 8 months ago

fniko commented 8 months ago

Describe the bug a clear and concise description of what the bug is.

Upon trying to set storage.tsdb.min-block-duration using additionalArgs while thanos objectStorageConfig configuration is present, the prometheus StatefulSet is not created.

~After clean install using Helm, I am observing two strange warnings - it might relate~ (fixed by removing old CRD)

W0219 00:16:32.123096   89043 warnings.go:70] unknown field "spec.scrapeConfigNamespaceSelector"
W0219 00:16:32.123672   89043 warnings.go:70] unknown field "spec.scrapeConfigSelector"

What's your helm version?

3.14.1

What's your kubectl version?

1.24.2

Which chart?

kube-prometheus-stack

What's the chart version?

56.7.0

What happened?

After using custom values in order to increase Thanos sync frequency to remote storage, the prometheus did not reflect those changes. When using as clean install, the prometheus did not show up at all. It seems that StatefulSet is not created. It seems like the issue is with objectStorageConfig under thanos configuration block. When it's removed (see values.yml below), prometheus starts to behave as expected.

Helm output

Release "kube-prometheus-stack" does not exist. Installing it now.
W0219 00:16:32.123096   89043 warnings.go:70] unknown field "spec.scrapeConfigNamespaceSelector"
W0219 00:16:32.123672   89043 warnings.go:70] unknown field "spec.scrapeConfigSelector"
NAME: kube-prometheus-stack
LAST DEPLOYED: Mon Feb 19 00:16:21 2024
NAMESPACE: kube-prometheus-stack
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace kube-prometheus-stack get pods -l "release=kube-prometheus-stack"

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

What you expected to happen?

How to reproduce it?

  1. Create values.yml file with provided values
  2. Use helm deploy command as provided

Enter the changed values of values.yml?

prometheus:
  prometheusSpec:
    # Increase Thanos sync period - used to DEBUG
    disableCompaction: false
    additionalArgs:
      - name: storage.tsdb.max-block-duration
        value: "30s"

    # Configure Thanos
    thanos:
      objectStorageConfig:
        secret:
          type: S3
          config:
            bucket: "thanos"
            endpoint: "region.provider.com"
            access_key: "xxx"
            secret_key: "xxx"

Enter the command that you execute and failing/misfunctioning.

helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 56.7.0 \
  --values values.yml

Anything else we need to know?

This values.yml configuration works as expected - max-block-duration is set and sidecar is live

prometheus:
  prometheusSpec:
    # Increase Thanos sync period - used to DEBUG
    disableCompaction: false
    additionalArgs:
      - name: storage.tsdb.max-block-duration
        value: "30s"

    # Configure Thanos
    thanos:
      image: quay.io/thanos/thanos:v0.28.1

Full outpus

helm ls

NAME                    NAMESPACE               REVISION    UPDATED                                 STATUS      CHART                           APP VERSION
kube-prometheus-stack   kube-prometheus-stack   1           2024-02-19 00:54:45.810489 +0000 UTC    deployed    kube-prometheus-stack-56.7.0    v0.71.2

kubectl get pod

alertmanager-kube-prometheus-stack-alertmanager-0           2/2     Running   0          5m24s
kube-prometheus-stack-grafana-585d96b575-dl4tp              3/3     Running   0          5m25s
kube-prometheus-stack-kube-state-metrics-5744bb9db6-62ng2   1/1     Running   0          5m25s
kube-prometheus-stack-operator-6f97fc84f6-fcpb6             1/1     Running   0          5m25s
kube-prometheus-stack-prometheus-node-exporter-2bdqn        1/1     Running   0          5m25s
...
kube-prometheus-stack-prometheus-node-exporter-tffbd        1/1     Running   0          5m25s

kubectl get deploy

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
kube-prometheus-stack-grafana              1/1     1            1           6m28s
kube-prometheus-stack-kube-state-metrics   1/1     1            1           6m28s
kube-prometheus-stack-operator             1/1     1            1           6m28s

kubectl get statefulset

NAME                                              READY   AGE
alertmanager-kube-prometheus-stack-alertmanager   1/1     6m45s
fniko commented 8 months ago

~I have discovered a typo within my values.yml file which caused this error. Closing, sorry.~

fniko commented 8 months ago

I though that the issue was caused by some typo, however it seems there was a deeper relation between configuration blocks. I am reopening this issue with updated description.

fniko commented 8 months ago

Also passing output from helm template, not including into original post to make it more clear.

# Source: kube-prometheus-stack/templates/prometheus/prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: kube-prometheus-stack-prometheus
  namespace: kube-prometheus-stack
  labels:
    app: kube-prometheus-stack-prometheus

    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/version: "56.7.0"
    app.kubernetes.io/part-of: kube-prometheus-stack
    chart: kube-prometheus-stack-56.7.0
    release: "kube-prometheus-stack"
    heritage: "Helm"
spec:
  alerting:
    alertmanagers:
      - namespace: kube-prometheus-stack
        name: kube-prometheus-stack-alertmanager
        port: http-web
        pathPrefix: "/"
        apiVersion: v2
  image: "quay.io/prometheus/prometheus:v2.49.1"
  version: v2.49.1
  additionalArgs:
    - name: storage.tsdb.max-block-duration
      value: 30s
  externalUrl: http://kube-prometheus-stack-prometheus.kube-prometheus-stack:9090
  paused: false
  replicas: 1
  shards: 1
  logLevel:  info
  logFormat:  logfmt
  listenLocal: false
  enableAdminAPI: false
  retention: "10d"
  tsdb:
    outOfOrderTimeWindow: 0s
  walCompression: true
  routePrefix: "/"
  serviceAccountName: kube-prometheus-stack-prometheus
  serviceMonitorSelector:
    matchLabels:
      release: "kube-prometheus-stack"

  serviceMonitorNamespaceSelector: {}
  podMonitorSelector:
    matchLabels:
      release: "kube-prometheus-stack"

  podMonitorNamespaceSelector: {}
  probeSelector:
    matchLabels:
      release: "kube-prometheus-stack"

  probeNamespaceSelector: {}
  securityContext:
    fsGroup: 2000
    runAsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault
  ruleNamespaceSelector: {}
  ruleSelector:
    matchLabels:
      release: "kube-prometheus-stack"

  scrapeConfigSelector:
    matchLabels:
      release: "kube-prometheus-stack"

  scrapeConfigNamespaceSelector: {}
  thanos:
    image: quay.io/thanos/thanos:v0.28.1
    objectStorageConfig:
      key: object-storage-configs.yaml
      name: kube-prometheus-stack-prometheus
  portName: http-web
  hostNetwork: false

When trying to just apply this (for debug purposes) kubectl apply above-config.yml

Error from server (BadRequest): error when creating "above-config.yml": Prometheus in version "v1" cannot be handled as a Prometheus: strict decoding error: unknown field "spec.scrapeConfigNamespaceSelector", unknown field "spec.scrapeConfigSelector"

EDIT: The above error was fixed by removing CRD - Uninstall Helm Chart . Current version from CRD kubectl describe crd prometheuses.monitoring.coreos.com

Annotations:  controller-gen.kubebuilder.io/version: v0.13.0
              operator.prometheus.io/version: 0.71.2
fniko commented 8 months ago

Ok, I did more debug and after manually applying the above prometheus.yml, the output of kubectl describe prometheus kube-prometheus-stack-prometheus is:

making statefulset failed: make StatefulSet spec: can't set arguments which are already managed by the operator: storage.tsdb.max-block-duration,storage.tsdb.min-block-duration

wider output (less readable though)

    Message:               shard 0: statefulset kube-prometheus-stack/prometheus-kube-prometheus-stack-prometheus not found
    Observed Generation:   1
    Reason:                StatefulSetNotFound
    Status:                False
    Type:                  Available
    Last Transition Time:  2024-02-19T01:49:52Z
    Message:               making statefulset failed: make StatefulSet spec: can't set arguments which are already managed by the operator: storage.tsdb.max-block-duration
    Observed Generation:   1
    Reason:                ReconciliationFailed
    Status:                False
    Type:                  Reconciled

How this should be handled?

zeritti commented 7 months ago

The tsdb block duration arguments can be set through additionalArgs only if disableCompaction is not set (default is false), i.e. if compaction is enabled. If set to true, the operator does not allow overriding the arguments.

Furthermore, if spec.thanos is set in prometheus CR with objectStorageConfig defined, i.e. uploads are active, the operator disables compaction by setting the two block duration arguments equal. In these conditions, you may wish to have a look at blockSize in thanosSpec. The field is not present in the values' prometheus.prometheusSpec.thanos but will be taken over once inserted.

fniko commented 7 months ago

Oh, OK. Thank you for your help. So I think I will be closing this issue because it's not an issue rather than a configuration mismatch. or do you think it makes sense to improve some docs or any other aspect of the helm chart? If not, I will close the issue immediately.

Configuration that works, for others.

prometheus:
  prometheusSpec:
    # Configure Thanos
    thanos:
      image: quay.io/thanos/thanos:v0.28.1
      blockSize: "30s"
      objectStorageConfig:
        secret:
          type: S3
          config:
            bucket: "thanos"
            endpoint: "region.provider.com"
            access_key: "xxx"
            secret_key: "xxx"