Closed mkrzywanski closed 2 years ago
You've set the limit to 90 in maxVirtualMachinesLimit
and the log says you have 90 running?
maximumDeploymentSize
is the number that will be deployed at a time.
If you check the deployment logs in azure you should see how many were deployed in each deployment
@timja Actually I thought that maximumDeploymentSize
is used to control maximum number of template instances at given time. Now I have checked that you actually implemented something that is used to limit template instances and is called maxVirtualMachinesLimit
, am I right? If so, I will update my plugin version and will give it a try.
Yes correct
I have tried to use maxVirtualMachinesLimit
to define per template constraints. I think there is some problem because i have defined maxVirtualMachinesLimit
to amount of 90 on the cloud definition level and then set maxVirtualMachinesLimit
to 10 for each template. Now regardless of the template type only 10 machines can be provisioned at time for entire cloud.
That's the expected behaviour.
Cloud definition sets a global limit Template definition sets a per-template limit
If you set 10 on each template then you won't get more than 10
You can use this feature to make sure you don't get 90 of one template and can't spawn the other one
So to make it clear for such setup :
In this situation only 10 machines can be spawned at a time regardless of the template type? If I want to run multiple jobs with different templates at the time only 10 machines overall can be spawned?
Yes
To be honest I do not know when to use such feature. I thought that I could set max agents limit to 30 for example. And the for example say that I want to have 10 instances of one template at maximum but it would not prevent other templates to spawn to the limit of 30. So actually in the setup I have just shown I would have 30 machines at the time, 10 instances of each template - but seems it does not work like it.
If I want to run multiple jobs with different templates at the time only 10 machines overall can be spawned?
Apologies, mis-understood. You should defo end up with more than 10 agents if you're using multiple templates, can you share your config please?
Jenkins version and plugins :
Jenkins: 2.338
OS: Linux - 5.4.0-1062-azure
---
ace-editor:1.1
antisamy-markup-formatter:2.7
apache-httpcomponents-client-4-api:4.5.13-1.0
azure-credentials:216.ve0b_4a_485ffc2
azure-keyvault:131.v867845ef6ae9
azure-sdk:106.v552de1e64d56
azure-vm-agents:808.v9d1999587120
bootstrap4-api:4.6.0-3
bootstrap5-api:5.1.3-6
bouncycastle-api:2.25
branch-api:2.7.0
build-timestamp:1.0.3
build-user-vars-plugin:1.8
caffeine-api:2.9.2-29.v717aac953ff3
checks-api:1.7.2
cloud-stats:0.27
cloudbees-folder:6.708.ve61636eb_65a_5
command-launcher:1.6
configuration-as-code:1414.v878271fc496f
credentials:1074.v60e6c29b_b_44b_
credentials-binding:1.27.1
display-url-api:2.3.5
durable-task:493.v195aefbb0ff2
echarts-api:5.3.0-2
extended-read-permission:3.2
font-awesome-api:6.0.0-1
git:4.10.3
git-client:3.11.0
git-server:1.10
handlebars:3.0.8
hidden-parameter:0.0.4
jackson2-api:2.13.2-260.v43d711474c77
javax-activation-api:1.2.0-2
javax-mail-api:1.6.2-5
jaxb:2.3.0.1
jdk-tool:1.5
jquery-detached:1.2.1
jquery3-api:3.6.0-2
jsch:0.1.55.2
junit:1.56
ldap:2.8
lockable-resources:2.14
mailer:408.vd726a_1130320
matrix-auth:2.6.8
matrix-project:758.v7a_ea_491852f3
momentjs:1.1.1
pipeline-build-step:2.16
pipeline-graph-analysis:188.v3a01e7973f2c
pipeline-input-step:446.vf27b_0b_83500e
pipeline-milestone-step:100.v60a_03cd446e1
pipeline-model-api:2.2064.v5eef7d0982b_e
pipeline-model-declarative-agent:1.1.1
pipeline-model-definition:2.2064.v5eef7d0982b_e
pipeline-model-extensions:2.2064.v5eef7d0982b_e
pipeline-rest-api:2.23
pipeline-stage-step:291.vf0a8a7aeeb50
pipeline-stage-tags-metadata:2.2064.v5eef7d0982b_e
pipeline-stage-view:2.23
plain-credentials:1.8
plugin-util-api:2.14.0
popper-api:1.16.1-2
popper2-api:2.11.2-1
rebuild:1.33
resource-disposer:0.17
role-strategy:3.2.0
scm-api:595.vd5a_df5eb_0e39
script-security:1140.vf967fb_efa_55a_
snakeyaml-api:1.29.1
ssh-credentials:1.19
sshd:3.1.0
structs:308.v852b473a2b8c
throttle-concurrents:2.6
timestamper:1.17
trilead-api:1.0.13
uno-choice:2.6.0
windows-slaves:1.8
workflow-aggregator:2.7
workflow-api:1143.v2d42f1e9dea_5
workflow-basic-steps:941.vdfe1b_a_132c64
workflow-cps:2660.vb_c0412dc4e6d
workflow-cps-global-lib:564.ve62a_4eb_b_e039
workflow-durable-task-step:1121.va_65b_d2701486
workflow-job:1174.vdcb_d054cf74a_
workflow-multibranch:711.vdfef37cda_816
workflow-scm-step:2.13
workflow-step-api:622.vb_8e7c15b_c95a_
workflow-support:815.vd60466279fc8
ws-cleanup:0.40
Cloud config :
clouds:
- azureVM:
azureCredentialsId: "credentials"
cloudName: "azure"
cloudTags:
- name: "deployer"
value: "test"
configurationStatus: "pass"
deploymentTimeout: 1200
existingResourceGroupName: "rg"
maxVirtualMachinesLimit: 30
resourceGroupReferenceType: "existing"
vmTemplates:
- agentLaunchMethod: "SSH"
agentWorkspace: "/var/jenkins"
builtInImage: "Ubuntu 20.04 LTS"
credentialsId: "jenkins"
diskType: "managed"
doNotUseMachineIfInitFails: true
executeInitScriptAsRoot: true
existingStorageAccountName: "storage"
imageReference:
id: "xx"
imageTopLevelType: "advanced"
javaPath: "java"
labels: "template1"
location: "West Europe"
maxVirtualMachinesLimit: 10
maximumDeploymentSize: 10
noOfParallelJobs: 1
osType: "Linux"
retentionStrategy:
azureVMCloudRetentionStrategy:
idleTerminationMinutes: 10
storageAccountNameReferenceType: "existing"
storageAccountType: "Standard_LRS"
subnetName: "xxx"
tags:
- name: "application"
value: "xxx"
templateDesc: "template1"
templateName: "template1"
usageMode: EXCLUSIVE
usePrivateIP: true
virtualMachineSize: "Standard_E2s_v3"
virtualNetworkName: "xxx"
virtualNetworkResourceGroupName: "xxx"
- agentLaunchMethod: "SSH"
agentWorkspace: "/var/jenkins"
builtInImage: "Ubuntu 20.04 LTS"
credentialsId: "jenkins"
diskType: "managed"
doNotUseMachineIfInitFails: true
executeInitScriptAsRoot: true
existingStorageAccountName: "storage"
imageReference:
id: "xxx"
imageTopLevelType: "advanced"
javaPath: "java"
labels: "template2"
location: "West Europe"
maxVirtualMachinesLimit: 10
maximumDeploymentSize: 10
noOfParallelJobs: 2
osType: "Linux"
retentionStrategy:
azureVMCloudRetentionStrategy:
idleTerminationMinutes: 10
storageAccountNameReferenceType: "existing"
storageAccountType: "Standard_LRS"
subnetName: "xxx"
tags:
- name: "application"
value: "xxx"
templateDesc: "template2"
templateName: "template2"
usageMode: EXCLUSIVE
usePrivateIP: true
virtualMachineSize: "Standard_B8ms"
virtualNetworkName: "xxx"
virtualNetworkResourceGroupName: "xxx"
- agentLaunchMethod: "SSH"
agentWorkspace: "/var/jenkins"
builtInImage: "Ubuntu 20.04 LTS"
credentialsId: "jenkins"
diskType: "managed"
doNotUseMachineIfInitFails: true
executeInitScriptAsRoot: true
existingStorageAccountName: "storage"
imageReference:
id: "xxx"
imageTopLevelType: "advanced"
javaPath: "java"
labels: "template3"
location: "West Europe"
maxVirtualMachinesLimit: 10
maximumDeploymentSize: 10
noOfParallelJobs: 1
osType: "Linux"
retentionStrategy:
azureVMCloudRetentionStrategy:
idleTerminationMinutes: 10
storageAccountNameReferenceType: "existing"
storageAccountType: "Standard_LRS"
subnetName: "xxx"
tags:
- name: "application"
value: "xxx"
templateDesc: "template3"
templateName: "template3"
usageMode: EXCLUSIVE
usePrivateIP: true
virtualMachineSize: "Standard_DS1_v2"
virtualNetworkName: "xxx"
virtualNetworkResourceGroupName: "xxx"
In this situation I get 10 machines max overall :
Maximum 10 machines are provisioned, and not 30. In the logs I can see :
Mar 16, 2022 10:22:10 AM FINE com.microsoft.azure.vmagent.AzureVMCloud
Current estimated VM count: 10, quantity desired 2
Mar 16, 2022 10:22:10 AM INFO com.microsoft.azure.vmagent.AzureVMCloud provision
Not able to create 2 nodes, at or above maximum VM count of 10 and already 10 VM(s)
@timja will you try to have a look at this as you implemented it recently? This feature is something we really need and right now we have to make workarounds with throttle plugin.
It looks like this method: adjustVirtualMachineCount
is not taking into account the current template count.
Only the 'max agent count' and the 'template limit'
If Other templates have taken up what is in the template limit but less than max limit no more VMs will spawn
I don't have time right now to do a write and test a fix, but I should be able to in the next couple of days hopefully.
@timja We've (just this morning) come to exactly the same conclusion - the adjustVirtualMachineCount
method is comparing the templateLimit against the "total of all VMs in the Azure resource group".
e.g. we have lots (over 10) of templates, each with a small (e.g. 5) templateLimit set, and the moment the total number of Azure VMs in play exceeds those limits, nothing new is provisioned and our developers start complaining that their builds aren't running.
What should happen is that the template's limit should be compared against the number of VMs made from that template rather than the total number in the cloud.
FYI (several years ago) I encountered much the same issue with the docker-plugin and I solved that using labels - I made the plugin label every docker container it made with both a "it came from
What I'd suggest is:
int approximateVirtualMachineCount
with Map<String, Integer> approximateVirtualMachineCountsByTemplate
getApproximateVirtualMachineCount()
method to sum all those IntegersgetApproximateVirtualMachineCountForTemplate(String templateName)
getVirtualMachineCount
into two with a new method that returns a Map<String, Integer>
where the index string is the Constants.AZURE_TEMPLATE_TAG_NAME
tag and make the old method call the new and sum all the Integers (unless there's nothing else that calls the old method).approximateVirtualMachineCountsByTemplate
periodicallyadjustVirtualMachineCount
take into account BOTH the per-template limit (compared against getApproximateVirtualMachineCountForTemplate(templateName)
) AND the cloud max limit (compared against getApproximateVirtualMachineCount()
)...and feel free to steal/be-inspired-by code in the docker-plugin - the license is permissive.
Thanks, I forgot about this issue, I have a few other in flight pieces of work but it’s in my backlog.
Contributions are very much welcome though
Jenkins and plugins versions report
Environment
```text Jenkins: 2.322 OS: Linux - 5.4.0-1062-azure --- ace-editor:1.1 apache-httpcomponents-client-4-api:4.5.13-1.0 azure-credentials:198.vf9c2fdfde55c azure-keyvault:131.v867845ef6ae9 azure-sdk:70.v63f6a95999a7 azure-vm-agents:799.va4c741108611 bootstrap4-api:4.6.0-3 bootstrap5-api:5.1.3-3 bouncycastle-api:2.25 branch-api:2.7.0 build-timestamp:1.0.3 build-user-vars-plugin:1.8 caffeine-api:2.9.2-29.v717aac953ff3 checks-api:1.7.2 cloud-stats:0.27 cloudbees-folder:6.16 command-launcher:1.6 configuration-as-code:1.54 credentials:2.6.2 credentials-binding:1.27 display-url-api:2.3.5 durable-task:493.v195aefbb0ff2 echarts-api:5.2.2-1 extended-read-permission:3.2 font-awesome-api:5.15.4-3 git:4.10.0 git-client:3.10.0 git-server:1.10 handlebars:3.0.8 jackson2-api:2.13.0-230.v59243c64b0a5 jaxb:2.3.0 jdk-tool:1.5 jquery-detached:1.2.1 jquery3-api:3.6.0-2 jsch:0.1.55.2 junit:1.53 ldap:2.7 lockable-resources:2.12 mailer:1.34 matrix-auth:2.6.8 matrix-project:1.19 momentjs:1.1.1 pipeline-build-step:2.15 pipeline-graph-analysis:1.12 pipeline-input-step:2.12 pipeline-milestone-step:1.3.2 pipeline-model-api:1.9.3 pipeline-model-declarative-agent:1.1.1 pipeline-model-definition:1.9.3 pipeline-model-extensions:1.9.3 pipeline-rest-api:2.19 pipeline-stage-step:2.5 pipeline-stage-tags-metadata:1.9.3 pipeline-stage-view:2.19 plain-credentials:1.7 plugin-util-api:2.5.1 popper-api:1.16.1-2 popper2-api:2.10.2-1 rebuild:1.32 resource-disposer:0.16 role-strategy:3.2.0 scm-api:2.6.5 script-security:1.78 snakeyaml-api:1.29.1 ssh-credentials:1.19 sshd:3.1.0 structs:1.24 throttle-concurrents:2.5 timestamper:1.15 trilead-api:1.0.13 workflow-aggregator:2.6 workflow-api:2.47 workflow-basic-steps:2.24 workflow-cps:2633.v6baeedc13805 workflow-cps-global-lib:548.v9085a486966a workflow-durable-task-step:1101.vf832bc1ac745 workflow-job:2.42 workflow-multibranch:2.26 workflow-scm-step:2.13 workflow-step-api:2.24 workflow-support:3.8 ws-cleanup:0.39 ```What Operating System are you using (both controller, and any agents involved in the problem)?
Jenkins in docker container
Reproduction steps
I have the plugin configured as follows :
As we can see the
maxVirtualMachinesLimit
is set to 90 andmaximumDeploymentSize
is set to 10. HowevermaximumDeploymentSize
seems to be ignored and up to 90 machines are spinned when using this configuration. Of course I have more images confgured this way and they all havemaximumDeploymentSize
set to 10. However this option is ignored.In the jenkins FINE logs I can see :
I checked the source code and there is a check which is never accessed and I cannot see it in the logs https://github.com/jenkinsci/azure-vm-agents-plugin/blob/a74208d4f7a1069427145e61b260372d8d6cd50c/src/main/java/com/microsoft/azure/vmagent/AzureVMCloud.java#L678
Expected Results
maximumDeploymentSize
is not ignored and correctly limits the virtual machine amount per template.Actual Results
maximumDeploymentSize
is ignored andmaxVirtualMachinesLimit
is used as hard limitAnything else?
No response