[JENKINS-52500] cloud agents are not used if labels overlap with permanent agent

timja commented 6 years ago

I've done my testing with the kubernetes plugin however this may apply to other cloud providers as well

When executing a job with a specific node label (IE linux) and a permanent agent shares the same label configured cloud providers are ignored.

example scripted pipeline

def test_builds = [:]
1.upto(36,{
  test_builds["${it}"] = {
    stage("${it}") {
      node('linux') {
sh 'echo it worked'
sh 'sleep 120'
      }
    }
  }
}) 

parallel(test_builds)

The follow log message is logged (after executing a few times to get the queue length up)

Consulting hudson.slaves.NodeProvisioner$StandardStrategyImpl@10a347a8 provisioning strategy with state StrategyState{label=linux, snapshot=LoadStatisticsSnapshot{definedExecutors=16, onlineExecutors=14, connectingExecutors=0, busyExecutors=14, idleExecutors=0, availableExecutors=0, queueLength=130}, plannedCapacitySnapshot=0, additionalPlannedCapacity=0}

Jul 11, 2018 5:59:25 PM FINER hudson.slaves.NodeProvisioner

Queue length 0 is less than the available capacity 0. No provisioning strategy required

Jul 11, 2018 5:59:25 PM FINER hudson.slaves.NodeProvisioner

Queue length 0 is less than the available capacity 0. No provisioning strategy required

Jul 11, 2018 5:59:25 PM FINER hudson.slaves.NodeProvisioner

Queue length 0 is less than the available capacity 0. No provisioning strategy required

Jul 11, 2018 5:59:25 PM FINER hudson.slaves.NodeProvisioner

Queue length 0 is less than the available capacity 0. No provisioning strategy required

Similar log messages repeat.

Running the same scripted pipeline but changing the node label to something unique to the cloud provider things work exactly as expected

Consulting hudson.slaves.NodeProvisioner$StandardStrategyImpl@10a347a8 provisioning strategy with state StrategyState{label=kubectl, snapshot=LoadStatisticsSnapshot{definedExecutors=9, onlineExecutors=4, connectingExecutors=5, busyExecutors=4, idleExecutors=0, availableExecutors=0, queueLength=32}, plannedCapacitySnapshot=0, additionalPlannedCapacity=6}

Jul 11, 2018 4:38:45 PM FINE hudson.slaves.NodeProvisioner

Excess workload 11.42 detected. (planned capacity=0,connecting capacity=5,Qlen=12.08,available=0.006&0,online=4,m=0.125)

Jul 11, 2018 4:38:47 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply

Started provisioning Kubernetes Pod Template from open-kc-k8s-corp-build-0 with 1 executors. Remaining excess workload: 10.42

Jul 11, 2018 4:38:47 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply

Started provisioning Kubernetes Pod Template from open-kc-k8s-corp-build-0 with 1 executors. Remaining excess workload: 9.42

Jul 11, 2018 4:38:47 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply

Started provisioning Kubernetes Pod Template from open-kc-k8s-corp-build-0 with 1 executors. Remaining excess workload: 8.42

Jul 11, 2018 4:38:47 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply

Started provisioning Kubernetes Pod Template from open-kc-k8s-corp-build-0 with 1 executors. Remaining excess workload: 7.42

Jul 11, 2018 4:38:47 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply

Started provisioning Kubernetes Pod Template from open-kc-k8s-corp-build-0 with 1 executors. Remaining excess workload: 6.42

Jul 11, 2018 4:38:47 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply

Started provisioning Kubernetes Pod Template from open-kc-k8s-corp-build-0 with 1 executors. Remaining excess workload: 5.42

I couldn't find an existing issue that described this specific issue so logged this one. Hoping it's something simple but I'm not seeing anything super obvious to me in the comment here or in the code underneath that would cause this behavior.

java runtime version: 1.8.0_162-b12
Jenkins version: 2.129
Kubernutes plugin version: 1.9.2

Originally reported by kyle_mcgovern, imported from: cloud agents are not used if labels overlap with permanent agent

status: Open
priority: Minor
resolution: Unresolved
imported: 2022/01/10

timja commented 6 years ago

csanchez:

how many executors do you have in the linux node ?
I would expect jenkins to use the permanent one up to the executors available, then start with cloud executors
Are you setting the overprovisioning flags for faster provisioning of cloud agents ? https://github.com/jenkinsci/kubernetes-plugin#over-provisioning-flags

timja commented 6 years ago

kyle_mcgovern:

Between 2 nodes there are 14 executors. 12 on one node (a physical) and 2 on a VM. A 3rd node is down for maintenance but it has 2 executors.

I did see those flags after digging through the code the standard provisioning strategy. With 100+ items in the queue would you have expected it to try and provision? When i was testing I got the build queue up to 540 without a kube slave spinning up before I decided to log this jira.

timja commented 6 years ago

kyle_mcgovern:

I added the suggested settings with no change in behavior

-Dhudson.slaves.NodeProvisioner.initialDelay=0 -Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85

timja / jenkins-gh-issues-poc-06-18

[JENKINS-52500] cloud agents are not used if labels overlap with permanent agent #9966