This PR introduces a mop.priority value to the reference kustodian mop chart for running shell scripts. We follow the systemd convention of saying that lower priority values are "higher" (in terms of importance and scheduling precedence) priority.
How this works:
A release of the mop chart includes a mop.priority value (defaults to "99")
After we have been cleared for maintenance (i.e., the node whose host OS this script is running on has been cordoned and drained), we check locally for the existence of pre-existing maintenance scripts running or queued with a lower or equivalent priority value. If we detect that, then we block forward progress and retry in a loop until we no longer detect any other concurrent scripts running at a lower or equivalent priority value.
Once we detect that we are the lowest-valued priority script (read: the script identified as "highest" priority in terms of significance), then we mark ourselves as active on the host OS by touching the /var/<priority>.maintenance, which indicates to other concurrent scripts how to serialize themselves in a situation where there are more concurrent scripts waiting, or soon-to-arrive on the same host OS via another mop helm release.
This solves for the scenario when several concurrent mop scripts are waiting in line to execute on the same node. For example:
1) A mop script arrives with a default priority of "99"
2) A 2nd mop script arrives with a priority of "00"
What will generally occur is the following:
1) A particular node will win exclusive maintenance access
2) Let's say that the 2nd mop script hasn't yet arrived on the cluster, then that first script (priority value "99") will begin execution immediately
3) Let's say that the 1st script is still in-progress on this node while the 2nd mop script arrives onto the cluster. Because this node is already under active maintenance, and because the 2nd script has a lower priority, it will be cleared for immediate execution.
If the 2nd script in the above scenario had a higher value ("lower" priority in terms of significance), then execution of that script would be blocked until the first script successfully finishes.
Scripts scheduled concurrently with an equivalent priority value will be serialized in the order that they arrive (via a mop release) on the same node.
This PR introduces a
mop.priority
value to the reference kustodianmop
chart for running shell scripts. We follow the systemd convention of saying that lower priority values are "higher" (in terms of importance and scheduling precedence) priority.How this works:
mop
chart includes amop.priority
value (defaults to"99"
)/var/<priority>.maintenance
, which indicates to other concurrent scripts how to serialize themselves in a situation where there are more concurrent scripts waiting, or soon-to-arrive on the same host OS via another mop helm release.This solves for the scenario when several concurrent
mop
scripts are waiting in line to execute on the same node. For example:1) A
mop
script arrives with a default priority of "99" 2) A 2ndmop
script arrives with a priority of "00"What will generally occur is the following:
1) A particular node will win exclusive maintenance access 2) Let's say that the 2nd mop script hasn't yet arrived on the cluster, then that first script (priority value "99") will begin execution immediately 3) Let's say that the 1st script is still in-progress on this node while the 2nd mop script arrives onto the cluster. Because this node is already under active maintenance, and because the 2nd script has a lower priority, it will be cleared for immediate execution.
If the 2nd script in the above scenario had a higher value ("lower" priority in terms of significance), then execution of that script would be blocked until the first script successfully finishes.
Scripts scheduled concurrently with an equivalent priority value will be serialized in the order that they arrive (via a mop release) on the same node.