kiwigrid / k8s-sidecar

This is a docker container intended to run inside a kubernetes cluster to collect config maps with a specified label and store the included files in a local folder.
MIT License
563 stars 181 forks source link

sync stop working on k8s api errors, liveness check needed? #338

Open fctb opened 5 months ago

fctb commented 5 months ago

When we get k8s api errors the sync stop working silently like calling kubernetes: (410) Reason: Expired: The resourceVersion for the provided watch is too old. After on debuglevel you only see the msg: Performing watch-based sync on secret resources: {'label_selector': 'grafana_dashboard_v10=1', 'timeout_seconds': '300', '_request_timeout': '330'} the msg for configmap stops: Performing watch-based sync on configmap resources: {'label_selector': 'grafana_dashboard_v10=1', 'timeout_seconds': '300', '_request_timeout': '330'} as well as other debug messages related to configmap. We only have matching configmaps in this cluster.

It looks like that the process for configmap are dead. The process itself is still there.

Make it sense to introduce a liveness check (dead man switch like), that on problems the hole container get restartet?

container yaml:

  - env:
    - name: REQ_TIMEOUT
      value: "60"
    - name: IGNORE_ALREADY_PROCESSED
      value: "true"
    - name: METHOD
      value: WATCH
    - name: LABEL
      value: grafana_dashboard_v10
    - name: LABEL_VALUE
      value: "1"
    - name: LOG_LEVEL
      value: debug
    - name: FOLDER
      value: /tmp/dashboards
    - name: RESOURCE
      value: both
    - name: NAMESPACE
      value: ALL
    - name: REQ_USERNAME
      valueFrom:
        secretKeyRef:
          key: admin-user
          name: grafana-admin-password
    - name: REQ_PASSWORD
      valueFrom:
        secretKeyRef:
          key: admin-password
          name: grafana-admin-password
    - name: REQ_URL
      value: http://localhost:3000/api/admin/provisioning/dashboards/reload
    - name: REQ_METHOD
      value: POST
    - name: WATCH_SERVER_TIMEOUT
      value: "300"
    - name: WATCH_CLIENT_TIMEOUT
      value: "330"
    image: quay.io/kiwigrid/k8s-sidecar:1.26.1
    imagePullPolicy: IfNotPresent
    name: grafana-sc-dashboard
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      seccompProfile:
        type: RuntimeDefault
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /tmp/dashboards
      name: sc-dashboard-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-6sd9s
      readOnly: true