jenkinsci / concurrent-step-plugin

Jenkins plugin to use utils in Java concurrent package.
https://plugins.jenkins.io/concurrent-step/
MIT License
21 stars 8 forks source link

Deadlock when using countDownLatch with closure #11

Open pit3k opened 4 years ago

pit3k commented 4 years ago

The following Jenkinsfile results in a deadlock when run with parallelism greater than fork-pool size. On my PC 5 is enough.

def PARALLELISM = 5

def latches = []
for(int i = 0; i < PARALLELISM+1; i++) {
  latches[i] = createLatch count: 1
}

def s = [:]

for(int i = 0; i < PARALLELISM; i++) {
  def id = i
  s.put("latch" + id, {
    countDownLatch(latches[id+1]) {
      echo "awaiting latch " + id
      awaitLatch(latches[id])
      echo "got latch " + id
      sleep time: 100, unit: "MILLISECONDS"
      echo "releasing latch " + (id+1)
    }
    echo "released latch " + (id+1)
  })
}

echo "releasing latch 0"
countDownLatch(latches[0])
parallel s

Here is console out:

Started
Running in Durability level: MAX_SURVIVABILITY
[Pipeline] Start of Pipeline
[Pipeline] createLatch
[Pipeline] createLatch
[Pipeline] createLatch
[Pipeline] createLatch
[Pipeline] createLatch
[Pipeline] createLatch
[Pipeline] countDownLatch
[Pipeline] parallel
[Pipeline] { (Branch: latch0)
[Pipeline] { (Branch: latch1)
[Pipeline] { (Branch: latch2)
[Pipeline] { (Branch: latch3)
[Pipeline] { (Branch: latch4)
[Pipeline] countDownLatch
[Pipeline] {
[Pipeline] countDownLatch
[Pipeline] {
[Pipeline] countDownLatch
[Pipeline] {
[Pipeline] countDownLatch
[Pipeline] countDownLatch
[Pipeline] echo
awaiting latch 0
[Pipeline] awaitLatch
[Pipeline] echo
awaiting latch 1
[Pipeline] awaitLatch
[Pipeline] echo
awaiting latch 2
[Pipeline] awaitLatch
[Pipeline] echo
got latch 0
[Pipeline] sleep
Sleeping for 0.1 sec
[Pipeline] echo
releasing latch 1
[Pipeline] }
[Pipeline] // countDownLatch
[Pipeline] {
[Pipeline] echo
released latch 1
[Pipeline] }
[Pipeline] echo
got latch 1
[Pipeline] sleep
Sleeping for 0.1 sec
[Pipeline] echo
awaiting latch 3
[Pipeline] awaitLatch
[Pipeline] echo
releasing latch 2
[Pipeline] }
[Pipeline] {
[Pipeline] echo
awaiting latch 4
[Pipeline] awaitLatch

And relevant threads from tracedump:

ForkJoinPool.commonPool-worker-1
threadId:107 - state:WAITING
stackTrace:
at java.lang.Object.wait(Native Method)
- waiting on org.jenkinsci.plugins.workflow.cps.CpsBodyExecution@1643a3e
at java.lang.Object.wait(Object.java:502)
at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution.get(CpsBodyExecution.java:305)
at com.github.topikachu.jenkins.concurrent.latch.CountDownStep$Execution.lambda$start$0(CountDownStep.java:81)
at com.github.topikachu.jenkins.concurrent.latch.CountDownStep$Execution$$Lambda$83/961550691.run(Unknown Source)
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
at java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1618)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

ForkJoinPool.commonPool-worker-2
threadId:108 - state:WAITING
stackTrace:
at java.lang.Object.wait(Native Method)
- waiting on org.jenkinsci.plugins.workflow.cps.CpsBodyExecution@368f088b
at java.lang.Object.wait(Object.java:502)
at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution.get(CpsBodyExecution.java:305)
at com.github.topikachu.jenkins.concurrent.latch.CountDownStep$Execution.lambda$start$0(CountDownStep.java:81)
at com.github.topikachu.jenkins.concurrent.latch.CountDownStep$Execution$$Lambda$83/961550691.run(Unknown Source)
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
at java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1618)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

ForkJoinPool.commonPool-worker-3
threadId:109 - state:WAITING
stackTrace:
at java.lang.Object.wait(Native Method)
- waiting on org.jenkinsci.plugins.workflow.cps.CpsBodyExecution@4888f736
at java.lang.Object.wait(Object.java:502)
at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution.get(CpsBodyExecution.java:305)
at com.github.topikachu.jenkins.concurrent.latch.CountDownStep$Execution.lambda$start$0(CountDownStep.java:81)
at com.github.topikachu.jenkins.concurrent.latch.CountDownStep$Execution$$Lambda$83/961550691.run(Unknown Source)
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
at java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1618)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#1]
threadId:110 - state:WAITING
stackTrace:
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.CountDownLatch$Sync@73d4066e
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at com.github.topikachu.jenkins.concurrent.latch.AwaitStep$Execution.lambda$run$0(AwaitStep.java:91)
at com.github.topikachu.jenkins.concurrent.latch.AwaitStep$Execution$$Lambda$87/1373648057.apply(Unknown Source)
at java.util.Optional.map(Optional.java:215)
at com.github.topikachu.jenkins.concurrent.latch.AwaitStep$Execution.run(AwaitStep.java:80)
at com.github.topikachu.jenkins.concurrent.latch.AwaitStep$Execution.run(AwaitStep.java:69)
at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$$Lambda$80/1455251952.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@5910cbd9

org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#2]
threadId:112 - state:WAITING
stackTrace:
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.CountDownLatch$Sync@9aa2002
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at com.github.topikachu.jenkins.concurrent.latch.AwaitStep$Execution.lambda$run$0(AwaitStep.java:91)
at com.github.topikachu.jenkins.concurrent.latch.AwaitStep$Execution$$Lambda$87/1373648057.apply(Unknown Source)
at java.util.Optional.map(Optional.java:215)
at com.github.topikachu.jenkins.concurrent.latch.AwaitStep$Execution.run(AwaitStep.java:80)
at com.github.topikachu.jenkins.concurrent.latch.AwaitStep$Execution.run(AwaitStep.java:69)
at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$$Lambda$80/1455251952.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@694929f6

org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#3]
threadId:113 - state:WAITING
stackTrace:
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.CountDownLatch$Sync@78c1372d
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at com.github.topikachu.jenkins.concurrent.latch.AwaitStep$Execution.lambda$run$0(AwaitStep.java:91)
at com.github.topikachu.jenkins.concurrent.latch.AwaitStep$Execution$$Lambda$87/1373648057.apply(Unknown Source)
at java.util.Optional.map(Optional.java:215)
at com.github.topikachu.jenkins.concurrent.latch.AwaitStep$Execution.run(AwaitStep.java:80)
at com.github.topikachu.jenkins.concurrent.latch.AwaitStep$Execution.run(AwaitStep.java:69)
at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$$Lambda$80/1455251952.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@158f949d

Converting Jenkinsfile to equivalent try-finally construct solves the issue:

def PARALLELISM = $PARALLELISM$

def latches = []
for(int i = 0; i < PARALLELISM+1; i++) {
  latches[i] = createLatch count: 1
}

def s = [:]

for(int i = 0; i < PARALLELISM; i++) {
  def id = i
  s.put("latch" + id, {
    try {
      echo "awaiting latch " + id
      awaitLatch(latches[id])
      echo "got latch " + id
      sleep time: 100, unit: "MILLISECONDS"
    } finally {
      echo "releasing latch " + (id+1)
      countDownLatch(latches[id+1])
    }
    echo "released latch " + (id+1)
  })
}

echo "releasing latch 0"
countDownLatch(latches[0])
parallel s
moshavnik commented 1 month ago

@pit3k

  1. do you think your solution might work with the awaitBarrier implementation?
  2. do you think increasing the value for the JVM parameter "java.util.concurrent.ForkJoinPool.common.parallelism" might be a workaround (NOT A SOLUTION!) for this problem? thanks.