In order to benefit from bosh canary / max in flight mechanism, the bosh release should check all of k8s node status.conditions @ bosh posts-start.
expected state is Status=false.
Status=true should result in post-start failure, thus preventing further impacts on following instance groups
Node Condition | Description
-- | --
Ready | True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds)
In order to benefit from bosh canary / max in flight mechanism, the bosh release should check all of k8s node status.conditions @ bosh posts-start. expected state is
Status=false
.Status=true
should result in post-start failure, thus preventing further impacts on following instance groupseg:
kubectl wait --for=condition=Ready node/agents-concourse-r1-z1-0 --timeout=10s
Note that Ready has a negated Status and
Ready=true
should be expectechttps://kubernetes.io/docs/reference/node/node-status/#condition
Sample standard node conditions are documented into https://kubernetes.io/docs/reference/node/node-status/#condition Additional extra node conditions can be set by 3rd party components, such as node-problem-detector see https://kubernetes.io/docs/tasks/debug/debug-cluster/monitor-node-health/#exporter
https://github.com/kubernetes/node-problem-detector/blob/ed94dff2cd827764dc43a9c90b0b3af773457dbd/config/kernel-monitor.json#L67-L70