How to deploy the controllers as daemonsets or at least redeploy in case of node failure?

disi commented 11 months ago

In my testing, I created a cluster of three master nodes, all are untainted and can schedule normal pods.

Flux is only ever running on the node it was originally deployed on via timoni. If this node goes down, the controllers are not deployed to other nodes.

flux events - shows logs until the node went down. The pods show running on the node that is down: stream logs failed Get "https://10.0.2.22:10250/containerLogs/flux-system/flux-57bd866b6d-zbrfc/helm-controller?follow=true&sinceSeconds=300&tailLines=100&timestamps=true": dial tcp 10.0.2.22:10250: connect:

disi commented 11 months ago

Here is the cue bit for timoni, I use:

  "flux": {
          module: {
                  url:     "oci://ghcr.io/stefanprodan/modules/flux-aio"
                  version: "2.1.2"
          }
          namespace: "flux-system"
          values: {
                  hostNetwork:     true
                  securityProfile: "privileged"
                  controllers: notification: enabled: false
          }
  }

stefanprodan commented 11 months ago

If this node goes down, the controllers are not deployed to other nodes.

Is the Kubernetes control plane still working? I expect it to reschedule the pod on a different node. Maybe the toleration we set in Flux is too broad, I set it like this https://github.com/stefanprodan/flux-aio/blob/aedf966e28a5f7170e3e737d7d52fa5815c8cfad/modules/flux-aio/templates/config.cue#L132

stefanprodan commented 11 months ago

It may well be that since etcd has no quorum, the control plane will no longer schedule pods anywhere. I suggest creating a cluster with 2 worker nodes, deploy Flux on one of the workers, make that node fail and see if it gets reschedule to the healthy node.

disi commented 11 months ago

I can still schedule pods. Weave Dashboard, AWX, kubernetes dashboard are all rescheduled to other nodes. Only Flux does not and shows "running".

stefanprodan commented 11 months ago

If you describe the Flux pod, is there any hint in the events about some blocker to rescheduling?

disi commented 11 months ago

Events show this

Events:
  Type     Reason          Age                From             Message
  ----     ------          ----               ----             -------
  Normal   Created         39m                kubelet          Created container kustomize-controller
  Normal   Pulled          39m                kubelet          Container image "ghcr.io/fluxcd/helm-controller:v0.36.2" already present on machine
  Normal   Pulled          39m                kubelet          Container image "ghcr.io/fluxcd/source-controller:v1.1.2" already present on machine
  Normal   Created         39m                kubelet          Created container source-controller
  Normal   Started         39m                kubelet          Started container source-controller
  Normal   Pulled          39m                kubelet          Container image "ghcr.io/fluxcd/kustomize-controller:v1.1.1" already present on machine
  Normal   SandboxChanged  39m                kubelet          Pod sandbox changed, it will be killed and re-created.
  Normal   Started         39m                kubelet          Started container kustomize-controller
  Normal   Started         39m                kubelet          Started container helm-controller
  Normal   Created         39m                kubelet          Created container helm-controller
  Warning  Unhealthy       38m                kubelet          Liveness probe failed: Get "http://10.0.2.22:9794/healthz": dial tcp 10.0.2.22:9794: connect: connection refused
  Warning  Unhealthy       38m (x5 over 39m)  kubelet          Readiness probe failed: Get "http://10.0.2.22:9794/readyz": dial tcp 10.0.2.22:9794: connect: connection refused
  Warning  Unhealthy       38m                kubelet          Liveness probe failed: Get "http://10.0.2.22:9792/healthz": dial tcp 10.0.2.22:9792: connect: connection refused
  Warning  Unhealthy       38m (x8 over 39m)  kubelet          Readiness probe failed: Get "http://10.0.2.22:9790/": dial tcp 10.0.2.22:9790: connect: connection refused
  Warning  NodeNotReady    16m (x3 over 93m)  node-controller  Node is not ready

The full description:

[disi@vmalmakw1s ~]$ kubectl describe pod flux-57bd866b6d-zbrfc -n flux-system
Name:                 flux-57bd866b6d-zbrfc
Namespace:            flux-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      flux
Node:                 vmalmakms.home/10.0.2.22
Start Time:           Sat, 25 Nov 2023 22:11:15 +0000
Labels:               app.kubernetes.io/name=flux
                      pod-template-hash=57bd866b6d
Annotations:          cluster-autoscaler.kubernetes.io/safe-to-evict: true
                      prometheus.io/scrape: true
Status:               Running
IP:                   10.0.2.22
IPs:
  IP:           10.0.2.22
Controlled By:  ReplicaSet/flux-57bd866b6d
Containers:
  source-controller:
    Container ID:    containerd://f95a5ff962bb8b4697cc3b3b933b2c45499f594b65802f113aebb587ff822b61
    Image:           ghcr.io/fluxcd/source-controller:v1.1.2
    Image ID:        ghcr.io/fluxcd/source-controller@sha256:b776e085ac079bf22ed23afe2874aebd10efcfaa740ec25748774608bbc79932
    Ports:           9790/TCP, 9791/TCP, 9792/TCP
    Host Ports:      9790/TCP, 9791/TCP, 9792/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      --watch-all-namespaces
      --log-level=info
      --log-encoding=json
      --enable-leader-election=false
      --metrics-addr=:9791
      --health-addr=:9792
      --storage-addr=:9790
      --storage-path=/data
      --storage-adv-addr=flux.$(RUNTIME_NAMESPACE).svc.cluster.local.
      --concurrent=5
      --requeue-dependency=30s
      --watch-label-selector=!sharding.fluxcd.io/key
      --helm-cache-max-size=10
      --helm-cache-ttl=60m
      --helm-cache-purge-interval=5m
    State:          Running
      Started:      Sun, 26 Nov 2023 08:33:56 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 26 Nov 2023 08:33:25 +0000
      Finished:     Sun, 26 Nov 2023 08:33:56 +0000
    Ready:          True
    Restart Count:  15
    Limits:
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   64Mi
    Liveness:   http-get http://:healthz-sc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:http-sc/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      SOURCE_CONTROLLER_LOCALHOST:  localhost:9790
      RUNTIME_NAMESPACE:            flux-system (v1:metadata.namespace)
      TUF_ROOT:                     /tmp/.sigstore
      NO_PROXY:                     .cluster.local.,.cluster.local,.svc
    Mounts:
      /data from data (rw)
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nznjt (ro)
  kustomize-controller:
    Container ID:    containerd://997b139a3e85abd2da13c1d95fbb585bf6cfe29967bcd241d95a885493213971
    Image:           ghcr.io/fluxcd/kustomize-controller:v1.1.1
    Image ID:        ghcr.io/fluxcd/kustomize-controller@sha256:e2b3c9e1292564bbfaa513f3cc6fa1df1194fae8ba9483fbe581099d0c585d94
    Ports:           9793/TCP, 9794/TCP
    Host Ports:      9793/TCP, 9794/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      --watch-all-namespaces
      --log-level=info
      --log-encoding=json
      --enable-leader-election=false
      --metrics-addr=:9793
      --health-addr=:9794
      --watch-label-selector=!sharding.fluxcd.io/key
      --concurrent=5
      --requeue-dependency=30s
    State:          Running
      Started:      Sun, 26 Nov 2023 08:33:56 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 26 Nov 2023 08:33:25 +0000
      Finished:     Sun, 26 Nov 2023 08:33:56 +0000
    Ready:          True
    Restart Count:  15
    Limits:
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   64Mi
    Liveness:   http-get http://:healthz-kc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:healthz-kc/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      SOURCE_CONTROLLER_LOCALHOST:  localhost:9790
      RUNTIME_NAMESPACE:            flux-system (v1:metadata.namespace)
      TUF_ROOT:                     /tmp/.sigstore
      NO_PROXY:                     .cluster.local.,.cluster.local,.svc
    Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nznjt (ro)
  helm-controller:
    Container ID:    containerd://c6a9a4ec46740520acc2f70a763259469830add59b7904a4ee0b00e8e97d2dd1
    Image:           ghcr.io/fluxcd/helm-controller:v0.36.2
    Image ID:        ghcr.io/fluxcd/helm-controller@sha256:6ee7e590e57350ac91cfdeee4587d0e9e6f52e723c56d4b7878c59279bd36f00
    Ports:           9795/TCP, 9796/TCP
    Host Ports:      9795/TCP, 9796/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      --watch-all-namespaces
      --log-level=info
      --log-encoding=json
      --enable-leader-election=false
      --metrics-addr=:9795
      --health-addr=:9796
      --watch-label-selector=!sharding.fluxcd.io/key
      --concurrent=5
      --requeue-dependency=30s
    State:          Running
      Started:      Sun, 26 Nov 2023 08:33:25 +0000
    Last State:     Terminated
      Reason:       Unknown
      Exit Code:    255
      Started:      Sun, 26 Nov 2023 07:49:37 +0000
      Finished:     Sun, 26 Nov 2023 08:33:09 +0000
    Ready:          True
    Restart Count:  13
    Limits:
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   64Mi
    Liveness:   http-get http://:healthz-hc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:healthz-hc/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      SOURCE_CONTROLLER_LOCALHOST:  localhost:9790
      RUNTIME_NAMESPACE:            flux-system (v1:metadata.namespace)
      TUF_ROOT:                     /tmp/.sigstore
      NO_PROXY:                     .cluster.local.,.cluster.local,.svc
    Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nznjt (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   True
  PodScheduled      True
Volumes:
  data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-nznjt:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 op=Exists
Events:
  Type     Reason          Age                From             Message
  ----     ------          ----               ----             -------
  Normal   Created         39m                kubelet          Created container kustomize-controller
  Normal   Pulled          39m                kubelet          Container image "ghcr.io/fluxcd/helm-controller:v0.36.2" already present on machine
  Normal   Pulled          39m                kubelet          Container image "ghcr.io/fluxcd/source-controller:v1.1.2" already present on machine
  Normal   Created         39m                kubelet          Created container source-controller
  Normal   Started         39m                kubelet          Started container source-controller
  Normal   Pulled          39m                kubelet          Container image "ghcr.io/fluxcd/kustomize-controller:v1.1.1" already present on machine
  Normal   SandboxChanged  39m                kubelet          Pod sandbox changed, it will be killed and re-created.
  Normal   Started         39m                kubelet          Started container kustomize-controller
  Normal   Started         39m                kubelet          Started container helm-controller
  Normal   Created         39m                kubelet          Created container helm-controller
  Warning  Unhealthy       38m                kubelet          Liveness probe failed: Get "http://10.0.2.22:9794/healthz": dial tcp 10.0.2.22:9794: connect: connection refused
  Warning  Unhealthy       38m (x5 over 39m)  kubelet          Readiness probe failed: Get "http://10.0.2.22:9794/readyz": dial tcp 10.0.2.22:9794: connect: connection refused
  Warning  Unhealthy       38m                kubelet          Liveness probe failed: Get "http://10.0.2.22:9792/healthz": dial tcp 10.0.2.22:9792: connect: connection refused
  Warning  Unhealthy       38m (x8 over 39m)  kubelet          Readiness probe failed: Get "http://10.0.2.22:9790/": dial tcp 10.0.2.22:9790: connect: connection refused
  Warning  NodeNotReady    16m (x3 over 93m)  node-controller  Node is not ready

stefanprodan commented 11 months ago

Hmm why is Status: Running if the Liveness probe fails. Can you also describe the ReplicaSet/flux-57bd866b6d and the flux deployment please.

disi commented 11 months ago

Replicaset

Name:           flux-57bd866b6d
Namespace:      flux-system
Selector:       app.kubernetes.io/name=flux,pod-template-hash=57bd866b6d
Labels:         app.kubernetes.io/name=flux
                pod-template-hash=57bd866b6d
Annotations:    app.kubernetes.io/role: cluster-admin
                deployment.kubernetes.io/desired-replicas: 1
                deployment.kubernetes.io/max-replicas: 1
                deployment.kubernetes.io/revision: 1
Controlled By:  Deployment/flux
Replicas:       1 current / 1 desired
Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/name=flux
                    pod-template-hash=57bd866b6d
  Annotations:      cluster-autoscaler.kubernetes.io/safe-to-evict: true
                    prometheus.io/scrape: true
  Service Account:  flux
  Containers:
   source-controller:
    Image:           ghcr.io/fluxcd/source-controller:v1.1.2
    Ports:           9790/TCP, 9791/TCP, 9792/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      --watch-all-namespaces
      --log-level=info
      --log-encoding=json
      --enable-leader-election=false
      --metrics-addr=:9791
      --health-addr=:9792
      --storage-addr=:9790
      --storage-path=/data
      --storage-adv-addr=flux.$(RUNTIME_NAMESPACE).svc.cluster.local.
      --concurrent=5
      --requeue-dependency=30s
      --watch-label-selector=!sharding.fluxcd.io/key
      --helm-cache-max-size=10
      --helm-cache-ttl=60m
      --helm-cache-purge-interval=5m
    Limits:
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   64Mi
    Liveness:   http-get http://:healthz-sc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:http-sc/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      SOURCE_CONTROLLER_LOCALHOST:  localhost:9790
      RUNTIME_NAMESPACE:             (v1:metadata.namespace)
      TUF_ROOT:                     /tmp/.sigstore
      NO_PROXY:                     .cluster.local.,.cluster.local,.svc
    Mounts:
      /data from data (rw)
      /tmp from tmp (rw)
   kustomize-controller:
    Image:           ghcr.io/fluxcd/kustomize-controller:v1.1.1
    Ports:           9793/TCP, 9794/TCP
    Host Ports:      0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      --watch-all-namespaces
      --log-level=info
      --log-encoding=json
      --enable-leader-election=false
      --metrics-addr=:9793
      --health-addr=:9794
      --watch-label-selector=!sharding.fluxcd.io/key
      --concurrent=5
      --requeue-dependency=30s
    Limits:
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   64Mi
    Liveness:   http-get http://:healthz-kc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:healthz-kc/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      SOURCE_CONTROLLER_LOCALHOST:  localhost:9790
      RUNTIME_NAMESPACE:             (v1:metadata.namespace)
      TUF_ROOT:                     /tmp/.sigstore
      NO_PROXY:                     .cluster.local.,.cluster.local,.svc
    Mounts:
      /tmp from tmp (rw)
   helm-controller:
    Image:           ghcr.io/fluxcd/helm-controller:v0.36.2
    Ports:           9795/TCP, 9796/TCP
    Host Ports:      0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      --watch-all-namespaces
      --log-level=info
      --log-encoding=json
      --enable-leader-election=false
      --metrics-addr=:9795
      --health-addr=:9796
      --watch-label-selector=!sharding.fluxcd.io/key
      --concurrent=5
      --requeue-dependency=30s
    Limits:
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   64Mi
    Liveness:   http-get http://:healthz-hc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:healthz-hc/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      SOURCE_CONTROLLER_LOCALHOST:  localhost:9790
      RUNTIME_NAMESPACE:             (v1:metadata.namespace)
      TUF_ROOT:                     /tmp/.sigstore
      NO_PROXY:                     .cluster.local.,.cluster.local,.svc
    Mounts:
      /tmp from tmp (rw)
  Volumes:
   data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   tmp:
    Type:               EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:          <unset>
  Priority Class Name:  system-cluster-critical
Events:                 <none>

Deployment

Name:               flux
Namespace:          flux-system
CreationTimestamp:  Mon, 20 Nov 2023 15:16:57 +0000
Labels:             app.kubernetes.io/managed-by=timoni
                    app.kubernetes.io/name=flux
                    app.kubernetes.io/part-of=flux
                    app.kubernetes.io/version=v2.1.2
                    instance.timoni.sh/name=flux
                    instance.timoni.sh/namespace=flux-system
Annotations:        app.kubernetes.io/role: cluster-admin
                    deployment.kubernetes.io/revision: 1
Selector:           app.kubernetes.io/name=flux
Replicas:           1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:       Recreate
MinReadySeconds:    0
Pod Template:
  Labels:           app.kubernetes.io/name=flux
  Annotations:      cluster-autoscaler.kubernetes.io/safe-to-evict: true
                    prometheus.io/scrape: true
  Service Account:  flux
  Containers:
   source-controller:
    Image:           ghcr.io/fluxcd/source-controller:v1.1.2
    Ports:           9790/TCP, 9791/TCP, 9792/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      --watch-all-namespaces
      --log-level=info
      --log-encoding=json
      --enable-leader-election=false
      --metrics-addr=:9791
      --health-addr=:9792
      --storage-addr=:9790
      --storage-path=/data
      --storage-adv-addr=flux.$(RUNTIME_NAMESPACE).svc.cluster.local.
      --concurrent=5
      --requeue-dependency=30s
      --watch-label-selector=!sharding.fluxcd.io/key
      --helm-cache-max-size=10
      --helm-cache-ttl=60m
      --helm-cache-purge-interval=5m
    Limits:
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   64Mi
    Liveness:   http-get http://:healthz-sc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:http-sc/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      SOURCE_CONTROLLER_LOCALHOST:  localhost:9790
      RUNTIME_NAMESPACE:             (v1:metadata.namespace)
      TUF_ROOT:                     /tmp/.sigstore
      NO_PROXY:                     .cluster.local.,.cluster.local,.svc
    Mounts:
      /data from data (rw)
      /tmp from tmp (rw)
   kustomize-controller:
    Image:           ghcr.io/fluxcd/kustomize-controller:v1.1.1
    Ports:           9793/TCP, 9794/TCP
    Host Ports:      0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      --watch-all-namespaces
      --log-level=info
      --log-encoding=json
      --enable-leader-election=false
      --metrics-addr=:9793
      --health-addr=:9794
      --watch-label-selector=!sharding.fluxcd.io/key
      --concurrent=5
      --requeue-dependency=30s
    Limits:
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   64Mi
    Liveness:   http-get http://:healthz-kc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:healthz-kc/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      SOURCE_CONTROLLER_LOCALHOST:  localhost:9790
      RUNTIME_NAMESPACE:             (v1:metadata.namespace)
      TUF_ROOT:                     /tmp/.sigstore
      NO_PROXY:                     .cluster.local.,.cluster.local,.svc
    Mounts:
      /tmp from tmp (rw)
   helm-controller:
    Image:           ghcr.io/fluxcd/helm-controller:v0.36.2
    Ports:           9795/TCP, 9796/TCP
    Host Ports:      0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      --watch-all-namespaces
      --log-level=info
      --log-encoding=json
      --enable-leader-election=false
      --metrics-addr=:9795
      --health-addr=:9796
      --watch-label-selector=!sharding.fluxcd.io/key
      --concurrent=5
      --requeue-dependency=30s
    Limits:
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   64Mi
    Liveness:   http-get http://:healthz-hc/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:healthz-hc/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      SOURCE_CONTROLLER_LOCALHOST:  localhost:9790
      RUNTIME_NAMESPACE:             (v1:metadata.namespace)
      TUF_ROOT:                     /tmp/.sigstore
      NO_PROXY:                     .cluster.local.,.cluster.local,.svc
    Mounts:
      /tmp from tmp (rw)
  Volumes:
   data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   tmp:
    Type:               EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:          <unset>
  Priority Class Name:  system-cluster-critical
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      False   MinimumReplicasUnavailable
OldReplicaSets:  <none>
NewReplicaSet:   flux-57bd866b6d (1/1 replicas created)
Events:          <none>

stefanprodan commented 11 months ago

Really odd, the replicaset says Pods Status: 1 Running which is really strange, but the Deployment say Replicas: 1 unavailable but it doesn't create a new replicaset.

disi commented 11 months ago

I will see later today, if I start over and redeploy the entire cluster. Then test again and see if it has the same behaviour.

stefanprodan commented 11 months ago

I guess if you delete pod it will get rescheduled, this looks like some race condition in the Kubernetes scheduler or the toleration makes it trip.

disi commented 11 months ago

Correct :) I did delete the pod.

flux-system    flux-57bd866b6d-j6z7x                              3/3     Running       0                56s     10.0.2.24      vmalmakw2s.home   <none>           <none>
flux-system    flux-57bd866b6d-zbrfc                              3/3     Terminating   43 (75m ago)     11h     10.0.2.22      vmalmakms.home    <none>           <none>

A new was created and it's working fine. But it did not happen automatically.

stefanprodan commented 11 months ago

Hmm so it looks like it got stuck in Terminating, but why wasn't this status reflected in the replicaset and why it didn't timeout. I wander if this is some bug in Kubernetes.

disi commented 11 months ago

Terminating status only after ran:

$ kubectl delete pod flux-57bd866b6d-zbrfc -n flux-system

i.e. the kubernetes dashboard and other pods also linger some time in terminating. Default is some 15min, before those get removed by Kubernetes? edit, removes it immediately :

$ kubectl delete pod --force flux-57bd866b6d-zbrfc -n flux-system

stefanprodan commented 11 months ago

If you managed to reproduce this, it would be good to take snapshots of the deployment and replicaset and see what events are issued for those, I guess those expired and that's why there are no events listed now.

stefanprodan commented 11 months ago

If you can reproduce this, please add with kubectl edit a toleration like so, and retest please:

- effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 30

disi commented 11 months ago

Here some attempts to log events: Flux is now running on vmalmakw2s and if I shut that down, there is no event to the Replicaset or Deployment after ~15min. I ran this:

$ watch "kubectl describe replicasets.apps -n flux-system flux-57bd866b6d | grep Events"

And no events on the Deployment either. I then started the node again and still no events. Then I shutdown the node with awx-operator deployment and monitored. The only event on the new ReplicaSet for awx-operator that is deployed after ~6min:

Normal  SuccessfulCreate  89s   replicaset-controller  Created pod: awx-operator-controller-manager-5cd65bb78d-7wn64`

I hope this helps.

Now, I'll edit Flux as you stated above and test again.

disi commented 11 months ago

[disi@vmalmakw1s ~]$ kubectl edit deployments.apps  -n flux-system flux
deployment.apps/flux edited

Still running fine. Tested sync with git and events. Deployment log:

Normal  ScalingReplicaSet  3m12s  deployment-controller  Scaled down replica set flux-57bd866b6d to 0 from 1
 Normal  ScalingReplicaSet  3m12s  deployment-controller  Scaled up replica set flux-5c4dd674fc to 1

ReplicaSet log:

  Normal  SuccessfulCreate  6m11s  replicaset-controller  Created pod: flux-57bd866b6d-qj947
  Normal  SuccessfulDelete  3m49s  replicaset-controller  Deleted pod: flux-57bd866b6d-qj947

Running on node "vmalmakms". Monitoring:

$ watch "kubectl describe replicasets.apps -n flux-system flux-57bd866b6d | grep -A 6 Events"
$ watch "kubectl describe deployments.apps -n flux-system flux | grep -A 6 Events"

Now shutting down "vmalmakms"... It's working :) New pod

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  62s   default-scheduler  Successfully assigned flux-system/flux-5c4dd674fc-dfpqk to vmalmakw2s.home
  Normal  Pulled     62s   kubelet            Container image "ghcr.io/fluxcd/source-controller:v1.1.2" already present on machine
  Normal  Created    62s   kubelet            Created container source-controller
  Normal  Started    61s   kubelet            Started container source-controller
  Normal  Pulled     61s   kubelet            Container image "ghcr.io/fluxcd/kustomize-controller:v1.1.1" already present on machine
  Normal  Created    61s   kubelet            Created container kustomize-controller
  Normal  Started    61s   kubelet            Started container kustomize-controller
  Normal  Pulled     61s   kubelet            Container image "ghcr.io/fluxcd/helm-controller:v0.36.2" already present on machine
  Normal  Created    61s   kubelet            Created container helm-controller
  Normal  Started    61s   kubelet            Started container helm-controller

old pod

  Type     Reason        Age    From               Message
  ----     ------        ----   ----               -------
  Normal   Scheduled     9m36s  default-scheduler  Successfully assigned flux-system/flux-5c4dd674fc-bxx65 to vmalmakms.home
  Normal   Pulled        9m36s  kubelet            Container image "ghcr.io/fluxcd/source-controller:v1.1.2" already present on machine
  Normal   Created       9m36s  kubelet            Created container source-controller
  Normal   Started       9m36s  kubelet            Started container source-controller
  Normal   Pulled        9m36s  kubelet            Container image "ghcr.io/fluxcd/kustomize-controller:v1.1.1" already present on machine
  Normal   Created       9m36s  kubelet            Created container kustomize-controller
  Normal   Started       9m36s  kubelet            Started container kustomize-controller
  Normal   Pulled        9m36s  kubelet            Container image "ghcr.io/fluxcd/helm-controller:v0.36.2" already present on machine
  Normal   Created       9m36s  kubelet            Created container helm-controller
  Normal   Started       9m35s  kubelet            Started container helm-controller
  Warning  NodeNotReady  64s    node-controller    Node is not ready

disi commented 11 months ago

No events on the Deployment, a new ReplicaSet is created:

flux-system    flux-57bd866b6d                              0         0         0       5d20h
flux-system    flux-5c4dd674fc                              1         1         1       15m

stefanprodan commented 11 months ago

Ok so the tolerationSeconds: 30 made it reschedule? And without it, it stays dead on the failing node?

disi commented 11 months ago

Ok so the tolerationSeconds: 30 made it reschedule? And without it, it stays dead on the failing node?

Hi, yes, without this parameter it just stays there forever in running state. I would probably change it to ~5min as the standard setting of Kubernetes? It now reschedules way ahead of other pods.

stefanprodan commented 11 months ago

Thanks @disi for all the tests. I have published the fix, rerunning timoni bundle applyshould set the right tolerations now.

stefanprodan / flux-aio

How to deploy the controllers as daemonsets or at least redeploy in case of node failure? #52