msemys / esjc

EventStore Java Client
MIT License
108 stars 27 forks source link

Read-Thread is blocked forever (state:WAITING) in a Subscription #27

Closed siamak-haschemi closed 7 years ago

siamak-haschemi commented 7 years ago

We sometimes see the problem, that the esjc library does block forever when using subscriptions. The interesting part of our thread-dump:

es-1-xxxxxx - priority:5 - threadId:0x............... - nativeId:0x............ - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007af484080> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)          <<<<<<<<<<<<<<<<<<<<<<
at com.github.msemys.esjc.subscription.StreamCatchUpSubscription.readEventsTill(StreamCatchUpSubscription.java:40)           <<<<<<<<<<<<<<<<<<<<<<
at com.github.msemys.esjc.CatchUpSubscription.lambda$runSubscription$1(CatchUpSubscription.java:161)           <<<<<<<<<<<<<<<<<<<<<<
at com.github.msemys.esjc.CatchUpSubscription$$Lambda$25/2142536057.run(Unknown Source)           <<<<<<<<<<<<<<<<<<<<<<
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

We have about 60 Subscriptions using the same connection. All Threads of the 60 Subscriptions show the same WAITING status.

We use the esjc 1.7.0 release.

My question is, if there are any timeouts we can set to avoid blocking forever. Our code could handle this timeout with retrying.

Thank you for your help.

msemys commented 7 years ago

you can set operation timeout here EventStoreBuilder.operationTimeout(...) (default 7 seconds).

but I think you are facing the bug, that was fixed in v1.8.0 (https://github.com/msemys/esjc/pull/20). there is a bug in v1.0.0 - v1.7.0 - operation timeouts not triggering when there is no server response (as a result, operation could hang indefinitely).

could you try with v1.8.0 or v1.8.1?

msemys commented 7 years ago

Closing this issue for now but feel free to reopen it if you have any further questions.