jenkinsci / azure-vm-agents-plugin

This repo is for azure vm agents plugin for jenkins. Azure devops CICD is the team which owns it for now
https://plugins.jenkins.io/azure-vm-agents/
43 stars 99 forks source link

Latest version of this plugin seems to have inactivity broken for "Azure Pool Retention Strategy" #476

Closed limeman40 closed 10 months ago

limeman40 commented 11 months ago

Jenkins and plugins versions report

Jenkins: 2.429
OS: Linux - 6.2.0-1015-azure
Java: 11.0.20.1 - Ubuntu (OpenJDK 64-Bit Server VM)
---
Office-365-Connector:4.20.0
ace-editor:1.1
ansible:276.vef26df37652f
ant:497.v94e7d9fffa_b_9
antisamy-markup-formatter:162.v0e6ec0fcfcf6
apache-httpcomponents-client-4-api:4.5.14-208.v438351942757
apache-httpcomponents-client-5-api:5.2.1-1.1
async-http-client:1.9.40.0
authentication-tokens:1.53.v1c90fd9191a_b_
azure-acs:1.0.4
azure-ad:412.vdf45b_6a_b_da_81
azure-app-service:1.0.2
azure-artifact-manager:133.vf94ad3455cdc
azure-cli:0.9
azure-commons:1.1.3
azure-container-agents:253.vd2f5cd5c5040
azure-container-registry-tasks:0.6.5
azure-credentials:293.vb_d506148f506
azure-credentials-ext:1.0
azure-function:0.3.3
azure-keyvault:228.va_31b_a_451e7d6
azure-sdk:157.v855da_0b_eb_dc2
azure-vm-agents:883.v63c930b_025dc
azure-vmss:0.2.4
badge:1.9.1
bitbucket:223.vd12f2bca5430
blackduck-detect:9.0.0
block-queued-job:0.2.0
blueocean-bitbucket-pipeline:1.27.8
blueocean-commons:1.27.8
blueocean-core-js:1.27.8
blueocean-jwt:1.27.8
blueocean-pipeline-api-impl:1.27.8
blueocean-pipeline-scm-api:1.27.8
blueocean-rest:1.27.8
blueocean-rest-impl:1.27.8
blueocean-web:1.27.8
bootstrap4-api:4.6.0-6
bootstrap5-api:5.3.2-2
bouncycastle-api:2.29
branch-api:2.1128.v717130d4f816
build-user-vars-plugin:1.9
caffeine-api:3.1.8-133.v17b_1ff2e0599
changes-since-last-success:0.6
checks-api:2.0.2
cloud-stats:320.v96b_65297a_4b_b_
cloudbees-bitbucket-branch-source:848.v42c6a_317eda_e
cloudbees-folder:6.858.v898218f3609d
command-launcher:107.v773860566e2e
commons-httpclient3-api:3.1-3
commons-lang3-api:3.13.0-62.v7d18e55f51e2
commons-text-api:1.10.0-78.v3e7b_ea_d5a_fe1
conditional-buildstep:1.4.3
config-file-provider:959.vcff671a_4518b_
copyartifact:722.v0662a_9b_e22a_c
credentials:1304.v5ec13eecef46
credentials-binding:642.v737c34dea_6c2
crx-content-package-deployer:1.9
data-tables-api:1.13.6-5
datadog:5.5.1
digitalocean-plugin:1.3.1
display-url-api:2.200.vb_9327d658781
docker-commons:439.va_3cb_0a_6a_fb_29
docker-java-api:3.3.1-79.v20b_53427e041
durable-task:523.va_a_22cf15d5e0
echarts-api:5.4.0-7
envinject:2.908.v66a_774b_31d93
envinject-api:1.199.v3ce31253ed13
extended-read-permission:53.v6499940139e5
extensible-choice-parameter:1.8.1
external-monitor-job:215.v2e88e894db_f8
favorite:2.4.3
font-awesome-api:6.4.2-1
generic-webhook-trigger:1.88.0
git:5.2.0
git-client:4.5.0
git-parameter:0.9.19
git-server:99.va_0826a_b_cdfa_d
github:1.37.3.1
github-api:1.316-451.v15738eef3414
github-branch-source:1741.va_3028eb_9fd21
github-pullrequest:0.5.0
gitlab-api:5.3.0-91.v1f9a_fda_d654f
gitlab-branch-source:684.vea_fa_7c1e2fe3
google-metadata-plugin:0.5
google-oauth-plugin:1.318.vb_39c5db_e3041
gradle:2.9
handlebars:3.0.8
handy-uri-templates-2-api:2.1.8-22.v77d5b_75e6953
htmlpublisher:1.32
instance-identity:173.va_37c494ec4e5
ionicons-api:56.v1b_1c8c49374e
jackson2-api:2.15.3-366.vfe8d1fa_f8c87
jakarta-activation-api:2.0.1-3
jakarta-mail-api:2.0.1-3
javadoc:243.vb_b_503b_b_45537
javax-activation-api:1.2.0-6
javax-mail-api:1.6.2-9
jaxb:2.3.9-1
jdk-tool:73.vddf737284550
jenkins-design-language:1.27.8
jersey2-api:2.41-133.va_03323b_a_1396
jjwt-api:0.11.5-77.v646c772fddb_0
jnr-posix-api:3.1.18-1
jobConfigHistory:1229.v3039470161a_d
jquery:1.12.4-1
jquery-detached:1.2.1
jquery3-api:3.7.1-1
jsch:0.2.8-65.v052c39de79b_2
junit:1240.vf9529b_881428
kubernetes-cd:2.3.1
kubernetes-client-api:6.8.1-224.vd388fca_4db_3b_
kubernetes-credentials:0.11
label-linked-jobs:6.0.1
ldap:711.vb_d1a_491714dc
lockable-resources:1185.v0c528656ce04
mailer:463.vedf8358e006b_
mapdb-api:1.0.9-28.vf251ce40855d
matrix-auth:3.2.1
matrix-project:818.v7eb_e657db_924
maven-plugin:3.23
mercurial:1260.vdfb_723cdcc81
metrics:4.2.18-442.v02e107157925
mina-sshd-api-common:2.11.0-86.v836f585d47fa_
mina-sshd-api-core:2.11.0-86.v836f585d47fa_
momentjs:1.1.1
msbuild:1.30
nexus-jenkins-plugin:3.16.510.v4d23e22cf563
node-iterator-api:55.v3b_77d4032326
node-sharing-executor:2.0.8
oauth-credentials:0.646.v02b_66dc03d2e
okhttp-api:4.11.0-157.v6852a_a_fa_ec11
pam-auth:1.10
pipeline-build-step:505.v5f0844d8d126
pipeline-graph-analysis:202.va_d268e64deb_3
pipeline-groovy-lib:689.veec561a_dee13
pipeline-input-step:477.v339683a_8d55e
pipeline-milestone-step:111.v449306f708b_7
pipeline-model-api:2.2144.v077a_d1928a_40
pipeline-model-definition:2.2144.v077a_d1928a_40
pipeline-model-extensions:2.2144.v077a_d1928a_40
pipeline-rest-api:2.33
pipeline-stage-step:305.ve96d0205c1c6
pipeline-stage-tags-metadata:2.2144.v077a_d1928a_40
pipeline-stage-view:2.33
pipeline-utility-steps:2.16.0
plain-credentials:143.v1b_df8b_d3b_e48
plugin-util-api:3.6.0
popper-api:1.16.1-3
popper2-api:2.11.6-4
powershell:2.1
promoted-builds:936.va_571a_a_b_f8da_5
pubsub-light:1.18
rebuild:320.v5a_0933a_e7d61
resource-disposer:0.23
run-condition:1.7
saml:4.429.v9a_781a_61f1da_
scm-api:676.v886669a_199a_a_
script-security:1275.v23895f409fb_d
service-fabric:1.6
shelve-project-plugin:3.2
snakeyaml-api:2.2-111.vc6598e30cc65
ssh:2.6.1
ssh-agent:333.v878b_53c89511
ssh-credentials:308.ve4497b_ccd8f4
ssh-slaves:2.916.vd17b_43357ce4
ssh2easy:1.6
sshd:3.312.v1c601b_c83b_0e
stashNotifier:1.439.v202358346a_7d
strict-crumb-issuer:2.1.1
structs:325.vcb_307d2a_2782
synopsys-coverity:3.0.3
thinBackup:1.18
timestamper:1.26
token-macro:384.vf35b_f26814ec
trilead-api:2.84.v72119de229b_7
uno-choice:2.8.0
variant:60.v7290fc0eb_b_cd
windows-azure-storage:386.v673495b0a5de
windows-slaves:1.8.1
workflow-aggregator:596.v8c21c963d92d
workflow-api:1283.v99c10937efcb_
workflow-basic-steps:1042.ve7b_140c4a_e0c
workflow-cps:3806.va_3a_6988277b_2
workflow-cps-global-lib:609.vd95673f149b_b
workflow-durable-task-step:1289.v4d3e7b_01546b_
workflow-job:1360.vc6700e3136f5
workflow-multibranch:756.v891d88f2cd46
workflow-scm-step:415.v434365564324
workflow-step-api:639.v6eca_cd8c04a_a_
workflow-support:865.v43e78cc44e0d
ws-cleanup:0.45

What Operating System are you using (both controller, and any agents involved in the problem)?

Controller: Ubuntu 22.04.3 LTS

Agents Ubuntu 22.04.3 LTS Agent: Windows Server 2019 Datacente

Reproduction steps

Allow instance to spin up VMs based on gallery images wait for them to disconnect.

Expected Results

The VMs stay up and complete the jobs that Jenkins tells them to build

Actual Results

VMs disconnect at randomly intervals often when they are in the middle of building out code

Anything else?

This just started happening like a couple weeks ago. We are on the latest release of this plugin. I am unsure what to do I am also seeing some messaging in the Azure logs but I am not sure who is to blame at this point.

I am seeing a lot of these message in the Jenkins.log file:

2023-10-29 04:18:15.090+0000 [id=40]    INFO    c.m.a.v.AzureVMCloudPoolRetentionStrategy#check: Delete VM com.microsoft.azure.vmagent.AzureVMComputer@7fc6bc89 for time>
2023-10-29 04:18:58.380+0000 [id=86248] INFO    h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel k4500-6bf4d0
java.io.EOFException
        at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2911)
        at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3406)
        at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:932)
        at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:375)
        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:50)
        at hudson.remoting.Command.readFrom(Command.java:142)
        at hudson.remoting.Command.readFrom(Command.java:128)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
2023-10-29 04:20:08.775+0000 [id=86447] INFO    h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel k2500-2fb110
java.io.EOFException
        at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2911)
        at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3406)
        at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:932)
        at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:375)
        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:50)
        at hudson.remoting.Command.readFrom(Command.java:142)
        at hudson.remoting.Command.readFrom(Command.java:128)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)

On the Azure side I see things like this:


{
    "channels": "Operation",
    "correlationId": "a4d488fb-e649-4357-897d-c10a2ab9492c",
    "description": "",
    "eventDataId": "223c09e5-2805-e060-5834-4ec3cbae52a7",
    "eventName": {
        "value": "",
        "localizedValue": ""
    },
    "category": {
        "value": "ResourceHealth",
        "localizedValue": "Resource Health"
    },
    "eventTimestamp": "2023-10-31T20:23:30.9701198Z",
    "id": "/SUBSCRIPTIONS/<retacted>/RESOURCEGROUPS/RG-INFRA-DEVOPS/PROVIDERS/MICROSOFT.COMPUTE/VIRTUALMACHINES/K4500-817ED0/events/223c09e5-2805-e060-5834-4ec3cbae52a7/ticks/638343806109701198",
    "level": "Critical",
    "operationId": "",
    "operationName": {
        "value": "Microsoft.Resourcehealth/healthevent/Updated/action",
        "localizedValue": "Health Event Updated"
    },
    "resourceGroupName": "RG-INFRA-DEVOPS",
    "resourceProviderName": {
        "value": "MICROSOFT.COMPUTE",
        "localizedValue": "MICROSOFT.COMPUTE"
    },
    "resourceType": {
        "value": "MICROSOFT.COMPUTE/virtualmachines",
        "localizedValue": "MICROSOFT.COMPUTE/virtualmachines"
    },
    "resourceId": "/SUBSCRIPTIONS/<retacted>/RESOURCEGROUPS/<retacted>/PROVIDERS/MICROSOFT.COMPUTE/VIRTUALMACHINES/K4500-817ED0",
    "status": {
        "value": "Updated",
        "localizedValue": "Updated"
    },
    "subStatus": {
        "value": "",
        "localizedValue": ""
    },
    "submissionTimestamp": "2023-10-31T20:23:30.9701198Z",
    "subscriptionId": "<retacted>",
    "tenantId": "",
    "properties": {
        "title": "Down: Virtual machine has been unavailable for 15 minutes",
        "details": "Unknown",
        "currentHealthStatus": "Unavailable",
        "previousHealthStatus": "Unavailable",
        "type": "Downtime",
        "cause": "PlatformInitiated"
    },
    "relatedEvents": []
}
timja commented 11 months ago

Is it possible the VM is getting overloaded? What are the metrics like?

limeman40 commented 11 months ago

I am not seeing high metrics on any of the spun up VMs from gallery images do you have anything else I should look at on my end?

timja commented 11 months ago

Maybe ask Microsoft about the health events?

limeman40 commented 11 months ago

I did have a support case open with them. I have asked to escalate the issue it seems like a HyperVisor issue perhaps from the messages I am seeing in Azure logs

On Thu, Nov 2, 2023 at 12:38 PM Tim Jacomb @.***> wrote:

Maybe ask Microsoft about the health events?

— Reply to this email directly, view it on GitHub https://github.com/jenkinsci/azure-vm-agents-plugin/issues/476#issuecomment-1791087395, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATBNP2UDG6ENHLEKZAEKB2TYCPEANAVCNFSM6AAAAAA6ZTCCUKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJRGA4DOMZZGU . You are receiving this because you authored the thread.Message ID: @.***>

limeman40 commented 11 months ago

I had another question. I had an issue with Jenkins where I accident deleted system files for it. I had to restore the whole VM from a snapshot. It seems to be working fine. However it almost like this issue lines up with that timeline.

Is there a way I could export the existing plugin configuration then I could remove the plugin and completely reinstall it. I am just curious if that could help this issue.

timja commented 11 months ago

using this plugin would be the easiest probably: https://github.com/jenkinsci/configuration-as-code-plugin

otherwise you could copy the config out from the config.xml for the clouds section

limeman40 commented 11 months ago

I ended up just jotting down all the configuration in a couple text files and pulled the plugin out and reinstalled it. Am curious if this helps fix it.

I will have to keep an eye on this tomorrow. I will come back and close this bug if this indeed solves it.

limeman40 commented 11 months ago

Did not fix the issue how does the cleanup process work in the plugin? I am wondering if there some hiccup on the Azure side it recovers but the plugin thinks the VM is broken and has it removed.

Is it possible this is some kind of race condition I am seeing?

timja commented 11 months ago

Did not fix the issue how does the cleanup process work in the plugin? I am wondering if there some hiccup on the Azure side it recovers but the plugin thinks the VM is broken and has it removed.

Is it possible this is some kind of race condition I am seeing?

Unsure I haven't used the Pool retention strategy in awhile. I use the idle one set to timeout of 5 minutes and it works fine.

limeman40 commented 11 months ago

Can you do some testing on Pool Retention?

On Tue, Nov 7, 2023 at 11:55 AM Tim Jacomb @.***> wrote:

Did not fix the issue how does the cleanup process work in the plugin? I am wondering if there some hiccup on the Azure side it recovers but the plugin thinks the VM is broken and has it removed.

Is it possible this is some kind of race condition I am seeing?

Unsure I haven't used the Pool retention strategy in awhile. I use the idle one set to timeout of 5 minutes and it works fine.

— Reply to this email directly, view it on GitHub https://github.com/jenkinsci/azure-vm-agents-plugin/issues/476#issuecomment-1799198768, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATBNP2VDQED42UML54ZTQVTYDJRYLAVCNFSM6AAAAAA6ZTCCUKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJZGE4TQNZWHA . You are receiving this because you authored the thread.Message ID: @.***>

limeman40 commented 11 months ago

I will give idle retention a try and see if works better

On Tue, Nov 7, 2023 at 1:44 PM limeman @.***> wrote:

Can you do some testing on Pool Retention?

On Tue, Nov 7, 2023 at 11:55 AM Tim Jacomb @.***> wrote:

Did not fix the issue how does the cleanup process work in the plugin? I am wondering if there some hiccup on the Azure side it recovers but the plugin thinks the VM is broken and has it removed.

Is it possible this is some kind of race condition I am seeing?

Unsure I haven't used the Pool retention strategy in awhile. I use the idle one set to timeout of 5 minutes and it works fine.

— Reply to this email directly, view it on GitHub https://github.com/jenkinsci/azure-vm-agents-plugin/issues/476#issuecomment-1799198768, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATBNP2VDQED42UML54ZTQVTYDJRYLAVCNFSM6AAAAAA6ZTCCUKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJZGE4TQNZWHA . You are receiving this because you authored the thread.Message ID: @.***>

limeman40 commented 11 months ago

I just saw this in the Jenkins logs:

java.io.IOException: Agent failed to connect, even though the launcher didn't report it. See the log output for details. at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:325) Caused: java.util.concurrent.ExecutionException at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) at com.microsoft.azure.vmagent.AzureVMCloud$2.call(AzureVMCloud.java:856) Caused: com.microsoft.azure.vmagent.exceptions.AzureCloudException at com.microsoft.azure.vmagent.exceptions.AzureCloudException.create(AzureCloudException.java:54) at com.microsoft.azure.vmagent.exceptions.AzureCloudException.create(AzureCloudException.java:33) at com.microsoft.azure.vmagent.AzureVMCloud$2.call(AzureVMCloud.java:885) at com.microsoft.azure.vmagent.AzureVMCloud$2.call(AzureVMCloud.java:808) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

this error as well:

java.lang.Exception: Node ProvisioningActivity for Azure-Cloud/winagent/null (-1363608790) has lost. Mark as failure at com.microsoft.azure.vmagent.AzureVMAgentCleanUpTask.cleanCloudStatistics(AzureVMAgentCleanUpTask.java:577) at com.microsoft.azure.vmagent.AzureVMAgentCleanUpTask.clean(AzureVMAgentCleanUpTask.java:596) at com.microsoft.azure.vmagent.AzureVMAgentCleanUpTask.lambda$execute$1(AzureVMAgentCleanUpTask.java:604) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

This seems to be when it is trying to spin up a VM

limeman40 commented 11 months ago

I made this change it is working better but I am still getting random disconnects from Azure. Is there anything I can do to get more details on why this is happening from the plugin?

limeman40 commented 11 months ago

I also setup an SSH logger in Jenkins to see if it perhaps might be some kind of SSH disconnect I am seeing this in those logs:

Failed connecting to host 10.188.0.39:22. java.net.NoRouteToHostException: No route to host (Host unreachable) at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412) at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255) at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237) at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.base/java.net.Socket.connect(Socket.java:609) at java.base/java.net.Socket.connect(Socket.java:558) at java.base/java.net.Socket.<init>(Socket.java:454) at java.base/java.net.Socket.<init>(Socket.java:231) at com.jcraft.jsch.Util.lambda$createSocket$0(Util.java:389) Caused: com.jcraft.jsch.JSchException at com.jcraft.jsch.Util.createSocket(Util.java:417) at com.jcraft.jsch.Session.connect(Session.java:217) at com.jcraft.jsch.Session.connect(Session.java:187) at com.microsoft.azure.vmagent.remote.AzureVMAgentSSHLauncher.getRemoteSession(AzureVMAgentSSHLauncher.java:311) at com.microsoft.azure.vmagent.remote.AzureVMAgentSSHLauncher.connectToSsh(AzureVMAgentSSHLauncher.java:457) at com.microsoft.azure.vmagent.remote.AzureVMAgentSSHLauncher.launch(AzureVMAgentSSHLauncher.java:111) at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:297) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

We have had been luck changing the IDLE retention strategy but the disconnects still happens but now an agent might last over an hour before it does.

We have also done a test where we just statically connect an agent and those do not disconnect at all. So I am thinking it is some issue with the cleanup process for this plugin.

Does the cleanup process not take into account any of the agent Node Monitoring changes? I have response time turned off in mine so Jenkins will not randomly like disconnect agents. Whatever it is seems to be on the plugin side. We enjoy using this plugin but will have to stop if it keeps being unstable solution for us.

timja commented 11 months ago

Failed connecting to host 10.188.0.39:22. java.net.NoRouteToHostException: No route to host (Host unreachable) at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) at

That could be on initial startup before the VM is available. If it's while it's running something is wrong.

Really unsure all I can say is we use it for 1000s of builds a day and it works really well without this issue.

limeman40 commented 11 months ago

I am seeing a lot of Health Event messages in the Activity Logs in Azure. Is it possible the cleanup process is like cleaning up things that are in use:

"details": "This virtual machine is stopped and deallocated as requested by an authorized user or process.",

"title": "Down: Virtual machine has been unavailable for 15 minutes",

I am not sure where to look at this point. It seems like the cleanup process is cleaning up things that need to not be cleaned up

How does this class for instance com.microsoft.azure.vmagent.AzureVMAgentCleanUpTask

Is it possible something about this is broken in the current version?

What logging can I turn on that might give me more of an idea what is happening?

timja commented 11 months ago

Enabling com.microsoft.azure (there should already be a log recorder setup for this) Should give you all the plugins logging

timja commented 10 months ago

you figured it out?

limeman40 commented 10 months ago

No I just opened a new issue. It seems more cleanup task realted

On Fri, Nov 17, 2023 at 4:22 PM Tim Jacomb @.***> wrote:

you figured it out?

— Reply to this email directly, view it on GitHub https://github.com/jenkinsci/azure-vm-agents-plugin/issues/476#issuecomment-1817124062, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATBNP2WLZMBY33NISLBCHFLYE7IS3AVCNFSM6AAAAAA6ZTCCUKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJXGEZDIMBWGI . You are receiving this because you modified the open/close state.Message ID: @.***>