Open hiddenshadow21 opened 1 month ago
Strange the test file is tiny and it works fine for me with no delay (on Mac).
Are you able to get a few thread dumps while this is ongoing? Both on the controller and the agent its running on please? https://www.jenkins.io/doc/book/troubleshooting/obtaining-a-thread-dump/
The issue just happened again on 3 windows agents, attaching thread dumps from controller and agents.
I got curious, so on my side I tried this
def c = { target ->
node('WINDOWS') {
def r = junit(keepTestNames: true, testResults: "test-user_test-host_2024-10-02_05_58_22.xml", testDataPublishers: [attachments(), [$class: 'StabilityTestDataPublisher']])
echo "${r.totalCount} ${r.failCount}"
}
}
def p1c = { c('p1') }
def p2c = { c('p2') }
def m = [ 'p1': p1c, 'p2': p2c ]
parallel m
And got this in one of the builds
java.util.ConcurrentModificationException
at java.base/java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1486)
at java.base/java.util.TreeMap$ValueIterator.next(TreeMap.java:1531)
at java.base/java.util.AbstractCollection.addAll(AbstractCollection.java:335)
at java.base/java.util.HashSet.<init>(HashSet.java:121)
at hudson.tasks.test.TestObject.uniquifyName(TestObject.java:395)
at hudson.tasks.junit.PackageResult.getSafeName(PackageResult.java:87)
at hudson.tasks.test.TestObject.getId(TestObject.java:87)
at hudson.tasks.test.TestObject.getId(TestObject.java:91)
at hudson.tasks.test.TestObject.getId(TestObject.java:91)
at de.esailors.jenkins.teststability.StabilityTestData.getTestAction(StabilityTestData.java:61)
at hudson.tasks.junit.TestResultAction.getActions(TestResultAction.java:378)
at hudson.tasks.test.TestObject.getTestActions(TestObject.java:248)
at hudson.tasks.test.TestObject.getTestAction(TestObject.java:261)
at de.esailors.jenkins.teststability.StabilityTestDataPublisher.getPreviousHistory(StabilityTestDataPublisher.java:133)
at de.esailors.jenkins.teststability.StabilityTestDataPublisher.contributeTestData(StabilityTestDataPublisher.java:82)
at hudson.tasks.junit.JUnitResultArchiver.parseAndSummarize(JUnitResultArchiver.java:325)
at hudson.tasks.junit.pipeline.JUnitResultsStepExecution.run(JUnitResultsStepExecution.java:62)
at hudson.tasks.junit.pipeline.JUnitResultsStepExecution.run(JUnitResultsStepExecution.java:27)
at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
I checked the code and the existing operations around accessing the history while populating it are unsafe but also very slow the way StabilityTestDataPublisher is doing it. I then ran even more parallel parallel builds and they do appear stuck eventually, although they do finish after a long time.
Try adding a lock('result-recording-lock') { timeout(15) { ... } } around the junit step. The bottleneck is around the Map UNIQUIFIED_NAMES, which is synchronized anyway, being my initial assumption after a quick glance-over.
Thanks for the analysis. I will try with lock
and timeout
there. Hope it will resolve this issue.
Jenkins and plugins versions report
Environment
```text Jenkins: 2.462.2 OS: Windows Server 2022 - 10.0 Java: 21.0.3 - Oracle Corporation (Java HotSpot(TM) 64-Bit Server VM) --- ace-editor:1.1 ant:511.v0a_a_1a_334f41b_ antisamy-markup-formatter:162.v0e6ec0fcfcf6 apache-httpcomponents-client-4-api:4.5.14-208.v438351942757 asm-api:9.7-33.v4d23ef79fcc8 authentication-tokens:1.119.v50285141b_7e1 aws-credentials:231.v08a_59f17d742 aws-java-sdk-ec2:1.12.767-467.vb_e93f0c614b_6 aws-java-sdk-minimal:1.12.767-467.vb_e93f0c614b_6 bootstrap4-api:4.6.0-6 bootstrap5-api:5.3.3-1 bouncycastle-api:2.30.1.78.1-248.ve27176eb_46cb_ branch-api:2.1178.v969d9eb_c728e build-timeout:1.33 caffeine-api:3.1.8-133.v17b_1ff2e0599 checks-api:2.2.1 cloudbees-bitbucket-branch-source:888.v8e6d479a_1730 cloudbees-folder:6.955.v81e2a_35c08d3 command-launcher:115.vd8b_301cc15d0 commons-compress-api:1.26.1-2 commons-lang3-api:3.17.0-84.vb_b_938040b_078 commons-text-api:1.12.0-129.v99a_50df237f7 config-file-provider:978.v8e85886ffdc4 configurationslicing:548.ve92d48e66b_f8 copyartifact:749.vfb_dca_a_9b_6549 credentials:1380.va_435002fa_924 credentials-binding:681.vf91669a_32e45 data-tables-api:2.1.6-1 display-url-api:2.204.vf6fddd8a_8b_e9 docker-commons:443.v921729d5611d docker-workflow:580.vc0c340686b_54 dtkit-api:3.0.2 durable-task:577.v2a_8a_4b_7c0247 ec2:1688.v8c07e01d657f echarts-api:5.5.1-1 eddsa-api:0.3.0-4.v84c6f0f4969e email-ext:1844.v3ea_a_b_842374a_ emailext-template:1.5 envinject:2.919.v009a_a_1067cd0 envinject-api:1.199.v3ce31253ed13 extended-read-permission:53.v6499940139e5 external-monitor-job:215.v2e88e894db_f8 favorite:2.221.v19ca_666b_62f5 file-operations:266.v9d4e1eb_235b_a_ font-awesome-api:6.6.0-2 ghprb:1.42.2 git:5.5.1 git-client:5.0.0 git-server:126.v0d945d8d2b_39 github:1.40.0 github-api:1.321-468.v6a_9f5f2d5a_7e github-branch-source:1797.v86fdb_4d57d43 gradle:2.13 greenballs:1.15.1 gson-api:2.11.0-41.v019fcf6125dc handlebars:3.0.8 handy-uri-templates-2-api:2.1.8-30.v7e777411b_148 htmlpublisher:1.36 http_request:1.19 instance-identity:201.vd2a_b_5a_468a_a_6 ionicons-api:74.v93d5eb_813d5f jackson2-api:2.17.0-379.v02de8ec9f64c jakarta-activation-api:2.1.3-1 jakarta-mail-api:2.1.3-1 javadoc:280.v050b_5c849f69 javax-activation-api:1.2.0-7 javax-mail-api:1.6.2-10 jaxb:2.3.9-1 jdk-tool:80.v8a_dee33ed6f0 jersey2-api:2.44-151.v6df377fff741 jira:3.13 jjwt-api:0.11.5-112.ve82dfb_224b_a_d jnr-posix-api:3.1.19-2 jobConfigHistory:1268.v75ce751da_911 joda-time-api:2.13.0-85.vb_64d1c2921f1 jquery:1.12.4-1 jquery-detached:1.2.1 jquery3-api:3.7.1-2 jsch:0.2.16-86.v42e010d9484b_ json-api:20240303-41.v94e11e6de726 json-path-api:2.9.0-58.v62e3e85b_a_655 junit:1302.va_b_878c32eb_b_5 ldap:725.v3cb_b_711b_1a_ef locale:519.v4e20f313cfa_f lockable-resources:1315.v4ea_8e5159ec8 mailer:488.v0c9639c1a_eb_3 mapdb-api:1.0.9-40.v58107308b_7a_7 matrix-auth:3.2.2 matrix-project:838.v4d7b_7b_f9b_d4b_ maven-plugin:3.23 mercurial:1260.vdfb_723cdcc81 metrics:4.2.21-451.vd51df8df52ec mina-sshd-api-common:2.13.2-125.v200281b_61d59 mina-sshd-api-core:2.13.2-125.v200281b_61d59 momentjs:1.1.1 monitoring:1.99.0 msbuild:1.35 mstest:1.0.5 node-iterator-api:55.v3b_77d4032326 nodejs:1.6.2 nodelabelparameter:1.12.0 nunit:485.ve8a_85357320d okhttp-api:4.11.0-172.vda_da_1feeb_c6e pam-auth:1.11 parameterized-scheduler:277.v61a_4b_a_49a_c5c performance:962.v95a_4913d332e perfpublisher:8.09 pipeline-build-step:540.vb_e8849e1a_b_d8 pipeline-github-lib:61.v629f2cc41d83 pipeline-graph-analysis:216.vfd8b_ece330ca_ pipeline-groovy-lib:730.ve57b_34648c63 pipeline-input-step:495.ve9c153f6067b_ pipeline-milestone-step:119.vdfdc43fc3b_9a_ pipeline-model-api:2.2214.vb_b_34b_2ea_9b_83 pipeline-model-definition:2.2214.vb_b_34b_2ea_9b_83 pipeline-model-extensions:2.2214.vb_b_34b_2ea_9b_83 pipeline-rest-api:2.34 pipeline-stage-step:312.v8cd10304c27a_ pipeline-stage-tags-metadata:2.2214.vb_b_34b_2ea_9b_83 pipeline-stage-view:2.34 pipeline-utility-steps:2.17.0 plain-credentials:183.va_de8f1dd5a_2b_ plugin-util-api:5.1.0 popper-api:1.16.1-3 popper2-api:2.11.6-5 powershell:2.1 pubsub-light:1.18 resource-disposer:0.24 role-strategy:743.v142ea_b_d5f1d3 scm-api:696.v778d637b_a_762 script-security:1362.v67dc1f0e1b_b_3 shelve-project-plugin:3.2 simple-queue:1.4.4 simple-theme-plugin:196.v96d9592f4efa_ slack:741.v00f9591c586d snakeyaml-api:2.3-123.v13484c65210a_ sse-gateway:1.27 ssh-agent:376.v8933585c69d3 ssh-credentials:343.v884f71d78167 ssh-slaves:2.973.v0fa_8c0dea_f9f sshd:3.330.vc866a_8389b_58 structs:338.v848422169819 subversion:1275.va_7b_014f3fc2c test-stability:2.4 thinBackup:2.1.1 timestamper:1.27 token-macro:400.v35420b_922dcb_ trilead-api:2.147.vb_73cc728a_32e uno-choice:2.8.3 variant:60.v7290fc0eb_b_cd windows-slaves:1.8.1 workflow-aggregator:600.vb_57cdd26fdd7 workflow-api:1336.vee415d95c521 workflow-basic-steps:1058.vcb_fc1e3a_21a_9 workflow-cps:3969.vdc9d3a_efcc6a_ workflow-cps-global-lib:612.v55f2f80781ef workflow-durable-task-step:1371.vb_7cec8f3b_95e workflow-job:1436.vfa_244484591f workflow-multibranch:795.ve0cb_1f45ca_9a_ workflow-scm-step:427.v4ca_6512e7df1 workflow-step-api:678.v3ee58b_469476 workflow-support:926.v9f4f9b_b_98c19 ws-cleanup:0.46 xunit:3.1.5 ```What Operating System are you using (both controller, and any agents involved in the problem)?
Both on
Windows Server 2022
Reproduction steps
BLAME_HANG_DUMP_TYPE
=full
TEST_BLAME_HANG_TIMEOUT
=15m
RAVEN_LICENSE
={license}
You can get free development Raven License from https://ravendb.net/license/request/devwindows-agent
Run pipeline on agent:
pipeline { agent { label 'windows-agent' }
}
16:23:20 [Pipeline] bat 16:23:20
16:23:20 c:\Jenkins\workspace\PR_Tests\s\test\SlowTests>trx2junit .*.trx 16:23:20 trx2junit (c) gfoidl -- v2.0.4 16:23:20 https://github.com/gfoidl/trx2junit 16:23:20
16:23:20 Converting 1 trx file(s) to JUnit-xml... 16:23:20 Converting '.\test-user_test-host_2024-10-02_05_58_22.trx' to '.\test-user_test-host_2024-10-02_05_58_22.xml' 16:23:23 done in 2.3058151 seconds. bye. 16:23:23 [Pipeline] junit 16:23:23 Recording test results 18:38:14 Cancelling nested steps due to timeout