Open atrauzzi opened 6 years ago
you run az container show
combined with az container logs
to get more details.
The initialization failed will produce this problem, but Azure will have two different behaviors to initialization failed. One is deployment shows successful
and container shows terminated
, the other is endless deployment.
I'm tryiing to use my docker image as a JNLP agent (windows). Same problem. But the problem is not the plugin. Is Azure itself
In Linux it is working.
Seems it's a connection issue, but I can't tell why connection timed out. In the new version(0.4.0), we support SSH connection. You can have a try.
I've tried this with an SSH connection. I see similar behaviour. The container is started and appears to run, but exits before doing anything useful.
$ az container logs --name ...
Linux caas-82e05980240a429795153b976151f58a-2347572569-kqlfb 4.4.0-101-generic #124-Ubuntu SMP Fri Nov 10 18:29:59 UTC 2017 x86_64 GNU/Linux
I also see the ALPN callback dropped message in the logs
ALPN callback dropped: SPDY and HTTP/2 are disabled. Is alpn-boot on the boot class path?
@jamesabbottsmith Can you provide more logs about SSHLauncher in Jenkins logs.
Such as:
Jan 16, 2018 6:16:32 AM INFO com.microsoft.jenkins.containeragents.remote.SSHLauncher launch SSHLauncher: Start to connect node acissh-g0w3t : 52.224.69.226 via SSH Jan 16, 2018 6:16:44 AM INFO com.microsoft.jenkins.containeragents.remote.SSHLauncher launch SSHLauncher: Copy slave.jar to remote host successfully
I've noticed a lot of issues with ACI even in just manually creating containers. It seems like the service has a lot of performance and consistency issues. Definitely to the extent that a plugin like this wouldn't make sense until the start time of any Windows or Linux container can be brought to under ~15 seconds.
I'm trying to setup an Jenkins deployment using the Solution Template together with an ACI based build environment. I'm running into very similar problems with the this setup and it is impossible to debug. az container show
and az container logs
are impossible to use because the containers are gone before you know it.
jenkins.log shows many entries like these:
Jan 26, 2018 1:01:10 PM com.microsoft.jenkins.containeragents.aci.AciCloud provision
INFO: Start ACI container for label ACI-container workLoad 1
Jan 26, 2018 1:01:10 PM com.microsoft.jenkins.containeragents.aci.AciCloud provision
INFO: Using ACI Container template: ACI-container
Jan 26, 2018 1:01:10 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
INFO: Started provisioning ACI-container from AciAgents with 1 executors. Remaining excess workload: 0
Jan 26, 2018 1:01:10 PM com.microsoft.jenkins.containeragents.aci.AciCloud$1 call
INFO: Add ACI node: aci-container-8q2gl
Jan 26, 2018 1:01:10 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Jan 26, 2018 1:01:12 PM com.microsoft.jenkins.containeragents.aci.AciService createDeployment
INFO: Waiting for deployment aci-container-q2m0kn9b
Jan 26, 2018 1:01:22 PM com.microsoft.jenkins.containeragents.aci.AciCloud$1 call
WARNING: AciCloud: Provision agent aci-container-8q2gl failed: Deployment aci-container-q2m0kn9b status: Failed
Jan 26, 2018 1:01:22 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Jan 26, 2018 1:01:22 PM com.microsoft.jenkins.containeragents.aci.AciService deleteAciContainerGroup
INFO: Delete ACI Container Group: aci-container-8q2gl successfully
Jan 26, 2018 1:01:30 PM hudson.slaves.NodeProvisioner$2 run
WARNING: Unexpected exception encountered while provisioning agent ACI-container
java.lang.Exception: java.lang.Exception: Deployment aci-container-q2m0kn9b status: Failed
at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:136)
at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:80)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Deployment aci-container-q2m0kn9b status: Failed
at com.microsoft.jenkins.containeragents.aci.AciService.createDeployment(AciService.java:137)
at com.microsoft.jenkins.containeragents.aci.AciContainerTemplate.provisionAgents(AciContainerTemplate.java:128)
at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:104)
... 6 more
So I'm finding my way around the Azure Portal and I finally found some relevant error feedback in the Deployments view of my ResourceGroup. Turned out I needed to issue this command and then it started working!!
az provider register --namespace Microsoft.ContainerInstance
@atrauzzi We recommend to use SSH to launch container as it's much more stable than JNLP. Please have a try and let us know if there is any issue.
I don't have SSH access to launch nodes on certain services. So this is still an issue.
Why did you close this issue before I could reply?
@atrauzzi Please provide more logs and configurations about using SSH to launch ACI as you haven't succeeded. From our test, SSH is more stable than JNLP. Please keep in contact or we don't know what going on now.
I am facing similar issue with jenkins/ssh-slave setup with this plugin. I see the agent gets deployed in ACI but gets deleted and another instance keeps coming up.
2020-03-29T17:18:15.716749905Z: [ERROR] 2020-03-29 17:18:15.716+0000 [id=112036] INFO c.m.j.c.aci.AciService#createDeployment: Deployment ztest-r3tkt376 succeed 2020-03-29T17:18:15.971375792Z: [ERROR] 2020-03-29 17:18:15.971+0000 [id=112036] INFO o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? 2020-03-29T17:18:16.119031025Z: [ERROR] 2020-03-29 17:18:16.114+0000 [id=112209] INFO c.m.j.c.remote.SSHLauncher#launch: SSHLauncher: Start to connect node ztest-kctsj : 52.152.198.206 via SSH
2020-03-29T17:18:16.422975930Z: [ERROR] 2020-03-29 17:18:16.422+0000 [id=112209] INFO c.m.j.c.remote.SSHLauncher#launch: SSHLauncher: Copy slave.jar to remote host successfully 2020-03-29T17:18:16.605629524Z: [ERROR] 2020-03-29 17:18:16.605+0000 [id=112209] INFO c.m.j.c.remote.SSHLauncher#launch: SSHLauncher: launching agent failed 2020-03-29T17:18:16.606118343Z: [ERROR] 2020-03-29 17:18:16.605+0000 [id=112036] WARNING c.m.j.c.aci.AciCloud$1#call: AciCloud: Provision agent ztest-kctsj failed: java.lang.InterruptedException: java.io.EOFException: unexpected stream termination 2020-03-29T17:18:17.050554205Z: [ERROR] 2020-03-29 17:18:17.050+0000 [id=112036] INFO o.j.p.cloudstats.CloudStatistics#getIdFor: No support for cloud-stats-plugin by class com.microsoft.jenkins.containeragents.aci.AciAgent
2020-03-29T17:18:17.241691129Z: [ERROR] 2020-03-29 17:18:17.241+0000 [id=112215] INFO o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? 2020-03-29T17:18:18.162858608Z: [ERROR] 2020-03-29 17:18:18.161+0000 [id=112209] INFO c.m.j.c.aci.AciService#deleteAciContainerGroup: Delete ACI Container Group: ztest-kctsj successfully
2020-03-29T17:18:24.527343702Z: [ERROR] 2020-03-29 17:18:24.526+0000 [id=88] WARNING hudson.slaves.NodeProvisioner#lambda$update$6: Unexpected exception encountered while provisioning agent ztest 2020-03-29T17:18:24.527390804Z: [ERROR] java.lang.InterruptedException: java.io.EOFException: unexpected stream termination 2020-03-29T17:18:24.527398804Z: [ERROR] at com.microsoft.jenkins.containeragents.remote.SSHLauncher.launch(SSHLauncher.java:121) 2020-03-29T17:18:24.527403904Z: [ERROR] at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:292) 2020-03-29T17:18:24.527409204Z: [ERROR] Caused: java.util.concurrent.ExecutionException 2020-03-29T17:18:24.527414405Z: [ERROR] at java.util.concurrent.FutureTask.report(FutureTask.java:122) 2020-03-29T17:18:24.527419405Z: [ERROR] at java.util.concurrent.FutureTask.get(FutureTask.java:192) 2020-03-29T17:18:24.527424205Z: [ERROR] at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:115) 2020-03-29T17:18:24.527429005Z: [ERROR] Caused: java.lang.Exception 2020-03-29T17:18:24.527433705Z: [ERROR] at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:140) 2020-03-29T17:18:24.527438606Z: [ERROR] at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:80) 2020-03-29T17:18:24.527443306Z: [ERROR] at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) 2020-03-29T17:18:24.527448206Z: [ERROR] at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) 2020-03-29T17:18:24.527452906Z: [ERROR] at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2020-03-29T17:18:24.527457606Z: [ERROR] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 2020-03-29T17:18:24.527462407Z: [ERROR] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 2020-03-29T17:18:24.527467207Z: [ERROR] at java.lang.Thread.run(Thread.java:748) 2020-03-29T17:18:24.527471907Z: [ERROR] 2020-03-29 17:18:24.527+0000 [id=88] WARNING c.m.j.c.aci.AciCloud#canProvision: Cannot provision: template for label ztest is not available now, because it failed to provision last time.
2020-03-29T17:18:28.893307318Z: [ERROR] 2020-03-29 17:18:28.893+0000 [id=112245] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$0: Started DockerContainerWatchdog Asynchronous Periodic Work 2020-03-29T17:18:28.893558628Z: [ERROR] 2020-03-29 17:18:28.893+0000 [id=112245] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog has been triggered 2020-03-29T17:18:28.893581829Z: [ERROR] 2020-03-29 17:18:28.893+0000 [id=112245] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 571, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms 2020-03-29T17:18:28.893631631Z: [ERROR] 2020-03-29 17:18:28.893+0000 [id=112245] INFO c.n.j.p.d.DockerContainerWatchdog#loadNodeMap: We currently have 2 nodes assigned to this Jenkins instance, which we will check 2020-03-29T17:18:28.893644931Z: [ERROR] 2020-03-29 17:18:28.893+0000 [id=112245] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog check has been completed 2020-03-29T17:18:28.893813838Z: [ERROR] 2020-03-29 17:18:28.893+0000 [id=112245] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$0: Finished DockerContainerWatchdog Asynchronous Periodic Work. 0 ms
2020-03-29T17:18:34.526268101Z: [ERROR] 2020-03-29 17:18:34.525+0000 [id=96] INFO c.m.j.c.aci.AciCloud#provision: Start ACI container for label ztest workLoad 1 2020-03-29T17:18:34.526304302Z: [ERROR] 2020-03-29 17:18:34.526+0000 [id=96] INFO c.m.j.c.aci.AciCloud#provision: Using ACI Container template: ztest 2020-03-29T17:18:34.526324603Z: [ERROR] 2020-03-29 17:18:34.526+0000 [id=96] INFO h.s.NodeProvisioner$StandardStrategyImpl#apply: Started provisioning ztest from SaaSBld with 1 executors. Remaining excess workload: 0 2020-03-29T17:18:34.526693417Z: [ERROR] 2020-03-29 17:18:34.526+0000 [id=112036] INFO c.m.j.c.aci.AciCloud$1#call: Add ACI node: ztest-3qs59 2020-03-29T17:18:34.538040658Z: [ERROR] 2020-03-29 17:18:34.537+0000 [id=112036] INFO c.m.j.c.a.AciCleanTask$DeploymentRegistrar#registerDeployment: AzureAciCleanUpTask: registerDeployment: Registering deployment ztest-6rmxcrfl in saas-*****-01 2020-03-29T17:18:34.853962320Z: [ERROR] 2020-03-29 17:18:34.853+0000 [id=112036] INFO o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-03-29T17:18:35.578729453Z: [ERROR] 2020-03-29 17:18:35.578+0000 [id=112036] INFO c.m.j.c.aci.AciService#createDeployment: Waiting for deployment ztest-6rmxcrfl
Logs from the container before it dies out
TX1-XX:aci$ az container logs --name ztest-23n25 --resource-group XX-01 --subscription XX
I'm trying out the ACI based cloud but am encountering an issue with both Windows and Linux based containers. While I do end up seeing that the plugin attempts to create instances, my tasks never seem to run. Instead, the instances linger for a while and then disappear, having done nothing.
Feedback on the containers both in the portal and via
az
seems sparse:Looking at the Jenkins logs, I do see one exception per type if I leave things to run long enough:
Nothing fancy about my templates, just trying the defaults for now to make sure everything works: