jenkinsci / azure-container-agents-plugin

Azure Container Agents Plugin for Jenkins
https://plugins.jenkins.io/azure-container-agents/
MIT License
19 stars 26 forks source link

ACI agents failed to allocate on ci.jenkins.io #80

Closed MarkEWaite closed 3 years ago

MarkEWaite commented 3 years ago

Version report

Jenkins and plugins versions report:

Click to expand details of working configuration ``` Jenkins: 2.277.4 OS: Linux - 5.4.0-1031-azure --- ace-editor:1.1 amazon-ecs:1.37 analysis-model-api:10.2.5 ansicolor:1.0.0 ant:1.11 antisamy-markup-formatter:2.1 apache-httpcomponents-client-4-api:4.5.13-1.0 authentication-tokens:1.4 aws-credentials:1.29 aws-java-sdk:1.11.995 azure-commons:1.1.3 azure-container-agents:201.v2afdce22b4cf azure-credentials:177.v816b81058012 azure-sdk:4.vcb202d9010c1 azure-vm-agents:774.v0cee503baa25 basic-branch-build-strategies:1.3.2 beer:1.3 blueocean-autofavorite:1.2.4 blueocean-bitbucket-pipeline:1.24.7 blueocean-commons:1.24.7 blueocean-config:1.24.7 blueocean-core-js:1.24.7 blueocean-dashboard:1.24.7 blueocean-display-url:2.4.1 blueocean-events:1.24.7 blueocean-git-pipeline:1.24.7 blueocean-github-pipeline:1.24.7 blueocean-i18n:1.24.7 blueocean-jira:1.24.7 blueocean-jwt:1.24.7 blueocean-personalization:1.24.7 blueocean-pipeline-api-impl:1.24.7 blueocean-pipeline-editor:1.24.7 blueocean-pipeline-scm-api:1.24.7 blueocean-rest-impl:1.24.7 blueocean-rest:1.24.7 blueocean-web:1.24.7 blueocean:1.24.7 bootstrap4-api:4.6.0-3 bouncycastle-api:2.20 branch-api:2.6.4 build-timeout:1.20 buildtriggerbadge:2.10 caffeine-api:2.9.1-23.v51c4e2c879c8 checks-api:1.7.0 cloud-stats:0.27 cloudbees-bitbucket-branch-source:2.9.9 cloudbees-folder:6.15 code-coverage-api:1.3.2 command-launcher:1.6 conditional-buildstep:1.4.1 configuration-as-code:1.51 copyartifact:1.46 credentials-binding:1.24 credentials:2.4.1 cvs:2.19 dark-theme:0.0.12 data-tables-api:1.10.23-3 disable-github-multibranch-status:1.2 display-url-api:2.3.5 docker-commons:1.17 docker-workflow:1.26 durable-task:1.36 ec2:1.59 echarts-api:5.1.0-2 embeddable-build-status:2.0.3 extended-read-permission:3.2 external-monitor-job:1.7 favorite:2.3.3 font-awesome-api:5.15.3-2 forensics-api:1.0.0 git-client:3.7.1 git-forensics:1.0.0 git-server:1.9 git:4.7.1 github-api:1.123 github-branch-source:2.10.4 github-checks:1.0.12 github:1.33.1 groovy:2.4 handlebars:3.0.8 handy-uri-templates-2-api:2.1.8-1.0 htmlpublisher:1.25 jackson2-api:2.12.3 jacoco:3.2.0 javadoc:1.6 jaxb:2.3.0.1 jdk-tool:1.5 jenkins-design-language:1.24.7 jira:3.3 jjwt-api:0.11.2-9.c8b45b8bb173 jobConfigHistory:2.27 jquery-detached:1.2.1 jquery3-api:3.6.0-1 jquery:1.12.4-1 jsch:0.1.55.2 junit-attachments:1.6 junit-realtime-test-reporter:0.6 junit:1.49 keyboard-shortcuts-plugin:1.4 kubernetes-client-api:4.13.3-1 kubernetes-credentials:0.9.0 kubernetes:1.29.6 ldap:2.7 lockable-resources:2.10 mailer:1.34 mapdb-api:1.0.9.0 matrix-auth:2.6.7 matrix-project:1.18 maven-plugin:3.10 mercurial:2.15 metrics:4.0.2.7 momentjs:1.1.1 node-iterator-api:1.5.0 okhttp-api:3.14.9 pam-auth:1.6 parallel-test-executor:1.13 parameterized-trigger:2.40 pipeline-build-step:2.13 pipeline-github-lib:1.0 pipeline-githubnotify-step:1.0.5 pipeline-graph-analysis:1.10 pipeline-input-step:2.12 pipeline-milestone-step:1.3.2 pipeline-model-api:1.8.4 pipeline-model-definition:1.8.4 pipeline-model-extensions:1.8.4 pipeline-rest-api:2.19 pipeline-stage-step:2.5 pipeline-stage-tags-metadata:1.8.4 pipeline-stage-view:2.19 pipeline-utility-steps:2.8.0 plain-credentials:1.7 plugin-util-api:2.2.0 popper-api:1.16.1-2 pubsub-light:1.14 run-condition:1.5 scm-api:2.6.4 script-security:1.77 snakeyaml-api:1.27.0 sse-gateway:1.24 ssh-agent:1.22 ssh-credentials:1.18.1 ssh-slaves:1.31.5 structs:1.23 subversion:2.14.2 support-core:2.74 theme-manager:0.6 throttle-concurrents:2.2 timestamper:1.13 token-macro:2.15 toolenv:1.2 translation:1.16 trilead-api:1.0.13 variant:1.4 warnings-ng:9.1.0 windows-azure-storage:355.v4da08e72a251 windows-slaves:1.8 workflow-aggregator:2.6 workflow-api:2.42 workflow-basic-steps:2.23 workflow-cps-global-lib:2.19 workflow-cps:2.92 workflow-durable-task-step:2.39 workflow-job:2.40 workflow-multibranch:2.24 workflow-scm-step:2.12 workflow-step-api:2.23 workflow-support:3.8 ```
Click to expand differences for failing configuration ``` azure-sdk:12.vc102aedd3c66 azure-credentials:182.v3ccd4a755864 azure-vm-agents:780.v50d067d02f76 windows-azure-storage:355.v4da08e72a251 azure-container-agents:207.v3ad9931bf69e ```
Controller running Ubuntu 18.04 Linux on Azure.  ACI agents running in Docker

Reproduction steps

  1. Upgrade plugins from current installed set to latest releases of azure-sdk, azure-credentials, azure-vm-agents, azure-container-agents, and windows-azure-storage
  2. Note that Azure container agents fail to allocate (maven and maven-11 agents). System log reports Cannot provision: template for label maven is not available now, because it failed to provision last time. An exception was visible in the cloud-stats page that showed:
    java.lang.Exception
    at com.microsoft.jenkins.containeragents.aci.AciService.createDeployment(AciService.java:143)
    at com.microsoft.jenkins.containeragents.aci.AciContainerTemplate.provisionAgents(AciContainerTemplate.java:128)
    at com.microsoft.jenkins.containeragents.aci.AciCloud.lambda$provision$1(AciCloud.java:109)
    Caused: java.lang.Exception
    at com.microsoft.jenkins.containeragents.aci.AciCloud.lambda$provision$1(AciCloud.java:136)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
  3. Resolve issue by reverting to previously installed versions of those plugins

See the detailed notes in the incident log.

Results

Expected result:

ACI agents expected to be allocated and destroyed on demand.

Actual result:

ACI agents not allocated, though there was a period where many agents were allocated (more than 250) but none of them connected to the Jenkins controller.

timja commented 3 years ago

Your incident log mentions a no such method error but I can’t see it here?

It’s unlikely to be an Azure plugin causing that issue but probably another plugin bundling something like an old version of Jackson

MarkEWaite commented 3 years ago

Your incident log mentions a no such method error but I can’t see it here?

I didn't include that log entry here because it was an error that only appeared when I downgraded only the Azure container instances plugin and left the other 4 Azure plugins at their latest version. I assumed that might be an expected issue because the older ACI plugin might require API's that had changed or been removed from the newer plugins. The message was:

java.lang.NoSuchMethodError: 'byte[] com.microsoft.azure.util.AzureBaseCredentials.serializeToTokenData()'

It’s unlikely to be an Azure plugin causing that issue but probably another plugin bundling something like an old version of Jackson

MarkEWaite commented 3 years ago

Looking at the logs on ci.jenkins.io, it seems there were messages like this (not sure if any of them are helpful):

ACI Periodic Clean Task cannot deserialize deploymentsToClean

2021-05-29 22:44:55.033+0000 [id=1111]  INFO    hudson.model.AsyncPeriodicWork#lambda$doRun$0: Started ACI Period Clean Task
2021-05-29 22:44:55.035+0000 [id=1111]  INFO    c.m.j.c.aci.AciCleanTask#cleanLeakedContainer: Starting to clean leaked containers for cloud ACI
2021-05-29 22:44:55.040+0000 [id=1111]  WARNING c.m.j.c.a.AciCleanTask$DeploymentRegistrar#<init>: AzureAciCleanUpTask: readResolve: Cannot deserialize deploymentsToClean
java.io.InvalidClassException: com.microsoft.jenkins.containeragents.aci.AciCleanTask$DeploymentInfo; local class incompatible: stream classdesc serialVersionUID = 6059552212366003515, local class serialVersionUID = -8799107075823754743
        at java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
        at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2012)
        at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
        at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:493)
        at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:451)
        at java.base/java.util.concurrent.ConcurrentLinkedQueue.readObject(ConcurrentLinkedQueue.java:844)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1175)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2325)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
        at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:493)
        at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:451)
        at com.microsoft.jenkins.containeragents.aci.AciCleanTask$DeploymentRegistrar.<init>(AciCleanTask.java:103)
        at com.microsoft.jenkins.containeragents.aci.AciCleanTask$DeploymentRegistrar.<clinit>(AciCleanTask.java:94)
        at com.microsoft.jenkins.containeragents.aci.AciCleanTask.cleanDeployments(AciCleanTask.java:174)
        at com.microsoft.jenkins.containeragents.aci.AciCleanTask.cleanDeployments(AciCleanTask.java:168)
        at com.microsoft.jenkins.containeragents.aci.AciCleanTask.execute(AciCleanTask.java:319)
        at hudson.model.AsyncPeriodicWork.lambda$doRun$0(AsyncPeriodicWork.java:100)
        at java.base/java.lang.Thread.run(Thread.java:829)
2021-05-29 22:44:55.041+0000 [id=1111]  INFO    hudson.model.AsyncPeriodicWork#lambda$doRun$0: Finished ACI Period Clean Task. 8 ms

Unexpected exception provisioning ACI agent

2021-05-29 22:46:10.573+0000 [id=57]    INFO    c.m.j.c.aci.AciCloud#provision: Start ACI container for label maven workLoad 1
2021-05-29 22:46:10.573+0000 [id=57]    INFO    c.m.j.c.aci.AciCloud#provision: Using ACI Container template: aci-maven
2021-05-29 22:46:10.579+0000 [id=890]   INFO    c.m.j.c.aci.AciCloud#lambda$provision$1: Add ACI node: aci-maven-8bvfp
2021-05-29 22:46:10.598+0000 [id=890]   WARNING c.m.j.c.aci.AciCloud#lambda$provision$1: AciCloud: Provision agent aci-maven-8bvfp failed: null
2021-05-29 22:46:10.605+0000 [id=652]   INFO    c.m.a.m.AcquireTokenByClientCredentialSupplier#execute: SkipCache set to false. Attempting cache lookup
2021-05-29 22:46:10.782+0000 [id=652]   INFO    c.a.c.util.logging.ClientLogger#performLogging: Azure Identity => getToken() result for scopes [https://management.core.windows.net//.default]: SUCCESS
2021-05-29 22:46:10.870+0000 [id=458]   INFO    c.a.c.util.logging.ClientLogger#info: Ignoring decoding of null or empty value to:com.azure.resourcemanager.containerinstance.fluent.models.ContainerGroupInner
2021-05-29 22:46:10.871+0000 [id=855]   INFO    c.m.j.c.aci.AciService#deleteAciContainerGroup: Delete ACI Container Group: aci-maven-8bvfp successfully
2021-05-29 22:46:20.570+0000 [id=29]    WARNING hudson.slaves.NodeProvisioner#lambda$update$6: Unexpected exception encountered while provisioning agent aci-maven-8bvfp
java.lang.Exception
        at com.microsoft.jenkins.containeragents.aci.AciService.createDeployment(AciService.java:143)
        at com.microsoft.jenkins.containeragents.aci.AciContainerTemplate.provisionAgents(AciContainerTemplate.java:128)
        at com.microsoft.jenkins.containeragents.aci.AciCloud.lambda$provision$1(AciCloud.java:109)
Caused: java.lang.Exception
        at com.microsoft.jenkins.containeragents.aci.AciCloud.lambda$provision$1(AciCloud.java:136)
        at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
        at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
timja commented 3 years ago

Right no such method is expected if you do that, the plugins need to be upgraded together.

Can’t see anything from a quick look will have a closer look when I’m back to a computer on Tuesday

timja commented 3 years ago

I can't reproduce this with the above plugins list.

I've shipped a fix for the logging so we get the actual line numbers: https://github.com/jenkinsci/azure-container-agents-plugin/pull/81

If you or someone could re-try an upgrade with the latest version that would be great

timja commented 3 years ago

I can't see what would be causing this.

It was fixed by re-saving the cloud configuration.