We have generative tests that generate 10,000 test cases. These have recently started to sometimes hang for one hour before completing. We are not yet sure why (we have some hypotheses), but by taking thread dumps we can see that the coordinating node is stuck on populating a TreeMap which causes expensive string concatenations. This PR optimizes this to avoid concatenations while keeping the same behavior.
Thread dump where the coordinating node apparently gets stuck:
at java.base@21.0.3/java.lang.StringConcatHelper.prepend(StringConcatHelper.java:326)
at java.base@21.0.3/java.lang.StringConcatHelper.prepend(StringConcatHelper.java:343)
at java.base@21.0.3/java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder)
at java.base@21.0.3/java.lang.invoke.LambdaForm$MH/0x00007fbb34008400.invoke(LambdaForm$MH)
at java.base@21.0.3/java.lang.invoke.LambdaForm$MH/0x00007fbb340b9400.invoke(LambdaForm$MH)
at java.base@21.0.3/java.lang.invoke.LambdaForm$MH/0x00007fbb340bb000.invoke(LambdaForm$MH)
at java.base@21.0.3/java.lang.invoke.Invokers$Holder.linkToTargetMethod(Invokers$Holder)
at PluginClassLoader for junit//hudson.tasks.junit.CaseResult.getFullName(CaseResult.java:616)
at PluginClassLoader for junit//hudson.tasks.junit.CaseResult.compareTo(CaseResult.java:958)
at PluginClassLoader for junit//hudson.tasks.junit.CaseResult.compareTo(CaseResult.java:62)
at java.base@21.0.3/java.util.TreeMap.put(TreeMap.java:849)
at java.base@21.0.3/java.util.TreeMap.put(TreeMap.java:569)
at java.base@21.0.3/java.util.TreeSet.add(TreeSet.java:259)
at PluginClassLoader for junit//hudson.tasks.junit.ClassResult.add(ClassResult.java:198)
at PluginClassLoader for junit//hudson.tasks.junit.PackageResult.add(PackageResult.java:304)
at PluginClassLoader for junit//hudson.tasks.junit.TestResult.freeze(TestResult.java:1156)
at PluginClassLoader for junit//hudson.tasks.junit.TestResultAction.loadCached(TestResultAction.java:326)
at PluginClassLoader for junit//hudson.tasks.junit.TestResultAction.load(TestResultAction.java:279)
at PluginClassLoader for junit//hudson.tasks.junit.TestResultAction.getResult(TestResultAction.java:173)
- locked hudson.tasks.junit.TestResultAction@25ac2d6e
at PluginClassLoader for junit//hudson.tasks.junit.TestResultAction.getResult(TestResultAction.java:70)
at PluginClassLoader for junit//hudson.tasks.test.AbstractTestResultAction.findCorrespondingResult(AbstractTestResultAction.java:286)
at PluginClassLoader for junit//hudson.tasks.test.TestResult.getPreviousResult(TestResult.java:146)
at PluginClassLoader for junit//hudson.tasks.junit.TestResult.getPreviousResult(TestResult.java:467)
at PluginClassLoader for junit//hudson.tasks.junit.CaseResult.getPreviousResult(CaseResult.java:768)
at PluginClassLoader for junit//hudson.tasks.junit.CaseResult.getPreviousResult(CaseResult.java:62)
at PluginClassLoader for test-stability//de.esailors.jenkins.teststability.StabilityTestDataPublisher.getPreviousResultSafely(StabilityTestDataPublisher.java:171)
at PluginClassLoader for test-stability//de.esailors.jenkins.teststability.StabilityTestDataPublisher.getPreviousHistory(StabilityTestDataPublisher.java:130)
at PluginClassLoader for test-stability//de.esailors.jenkins.teststability.StabilityTestDataPublisher.contributeTestData(StabilityTestDataPublisher.java:82)
at PluginClassLoader for junit//hudson.tasks.junit.JUnitResultArchiver.parseAndSummarize(JUnitResultArchiver.java:325)
- locked hudson.model.FreeStyleBuild@6bb44616
at PluginClassLoader for junit//hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:194)
at jenkins.tasks.SimpleBuildStep.perform(SimpleBuildStep.java:123)
at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:80)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:767)
at hudson.model.Build$BuildExecution.post2(Build.java:179)
at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:711)
at hudson.model.Run.execute(Run.java:1917)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
Testing done
Ran mvn hpi:run and could reach web interface locally. I'm not sure how to trigger this code path from a blank jenkins setup, however judging from the changed lines the behavior should be identical.
Submitter checklist
[x] Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
[x] Ensure that the pull request title represents the desired changelog entry
[x] Please describe what you did
[ ] Link to relevant issues in GitHub or Jira
[ ] Link to relevant pull requests, esp. upstream and downstream changes
[ ] Ensure you have provided tests - that demonstrates feature works or fixes the issue
Lets see if it helps.
Otherwise if you can raise an issue with a way to generate data in a similar manner that demonstrates the slowness that would be useful too.
We have generative tests that generate 10,000 test cases. These have recently started to sometimes hang for one hour before completing. We are not yet sure why (we have some hypotheses), but by taking thread dumps we can see that the coordinating node is stuck on populating a TreeMap which causes expensive string concatenations. This PR optimizes this to avoid concatenations while keeping the same behavior.
Thread dump where the coordinating node apparently gets stuck:
Testing done
Ran
mvn hpi:run
and could reach web interface locally. I'm not sure how to trigger this code path from a blank jenkins setup, however judging from the changed lines the behavior should be identical.Submitter checklist