oracle / oracle-r2dbc

R2DBC Driver for Oracle Database
https://oracle.com
Other
194 stars 40 forks source link

Tests deadlock on JDK 11 #48

Closed dstepanov closed 2 years ago

dstepanov commented 2 years ago

Our R2dbc test suite is deadlocking on JDK 11 https://github.com/micronaut-projects/micronaut-data/pull/1152 I have managed to provide the thread dump. blocked.txt

dstepanov commented 2 years ago

One more run:


2021-09-07T12:13:21.7969398Z     "Test worker" 
2021-09-07T12:13:21.7970562Z        java.lang.Thread.State: WAITING
2021-09-07T12:13:21.7971704Z             at java.base@11.0.12/jdk.internal.misc.Unsafe.park(Native Method)
2021-09-07T12:13:21.7973485Z             at java.base@11.0.12/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
2021-09-07T12:13:21.7976564Z             at java.base@11.0.12/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
2021-09-07T12:13:21.7981379Z             at java.base@11.0.12/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
2021-09-07T12:13:21.7986066Z             at java.base@11.0.12/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
2021-09-07T12:13:21.7989700Z             at java.base@11.0.12/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
2021-09-07T12:13:21.7992334Z             at app//io.reactivex.internal.observers.BlockingMultiObserver.blockingGet(BlockingMultiObserver.java:85)
2021-09-07T12:13:21.7994725Z             at app//io.reactivex.Maybe.blockingGet(Maybe.java:2321)
2021-09-07T12:13:21.7996480Z             at jdk.internal.reflect.GeneratedMethodAccessor97.invoke(Unknown Source)
2021-09-07T12:13:21.7999144Z             at java.base@11.0.12/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2021-09-07T12:13:21.8001520Z             at java.base@11.0.12/java.lang.reflect.Method.invoke(Method.java:566)
2021-09-07T12:13:21.8003300Z             at app//org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
2021-09-07T12:13:21.8005230Z             at app//groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
2021-09-07T12:13:21.8006986Z             at app//groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1268)
2021-09-07T12:13:21.8009123Z             at app//groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
2021-09-07T12:13:21.8016684Z             at app//org.codehaus.groovy.runtime.InvokerHelper.invokePojoMethod(InvokerHelper.java:1017)
2021-09-07T12:13:21.8027949Z             at app//org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1008)
2021-09-07T12:13:21.8030692Z             at app//org.spockframework.runtime.GroovyRuntimeUtil.invokeMethod(GroovyRuntimeUtil.java:170)
2021-09-07T12:13:21.8033357Z             at app//org.spockframework.runtime.SpockRuntime.verifyMethodCondition(SpockRuntime.java:99)
2021-09-07T12:13:21.8036386Z             at app//io.micronaut.data.tck.tests.AbstractReactiveRepositorySpec.$spock_feature_0_1(AbstractRea

2021-09-07T12:13:21.8325401Z     "ForkJoinPool.commonPool-worker-3" 
2021-09-07T12:13:21.8326192Z        java.lang.Thread.State: WAITING
2021-09-07T12:13:21.8327031Z             at java.base@11.0.12/jdk.internal.misc.Unsafe.park(Native Method)
2021-09-07T12:13:21.8328289Z             at java.base@11.0.12/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
2021-09-07T12:13:21.8330561Z             at java.base@11.0.12/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:2018)
2021-09-07T12:13:21.8332734Z             at app//oracle.jdbc.driver.RestrictedLock.lock(RestrictedLock.java:247)
2021-09-07T12:13:21.8334057Z             at app//oracle.jdbc.internal.Monitor.acquireLock(Monitor.java:131)
2021-09-07T12:13:21.8335525Z             at app//oracle.jdbc.internal.Monitor.acquireCloseableLock(Monitor.java:109)
2021-09-07T12:13:21.8337413Z             at app//oracle.jdbc.driver.PhysicalConnection.closeAsyncOracle(PhysicalConnection.java:2893)
2021-09-07T12:13:21.8339280Z             at app//oracle.r2dbc.impl.OracleReactiveJdbcAdapter$$Lambda$2256/0x0000000101230440.getOrThrow(Unknown Source)
2021-09-07T12:13:21.8341001Z             at app//oracle.r2dbc.impl.OracleR2dbcExceptions.fromJdbc(OracleR2dbcExceptions.java:277)
2021-09-07T12:13:21.8343036Z             at app//oracle.r2dbc.impl.OracleReactiveJdbcAdapter.lambda$deferOnce$24(OracleReactiveJdbcAdapter.java:1041)
2021-09-07T12:13:21.8344938Z             at app//oracle.r2dbc.impl.OracleReactiveJdbcAdapter$$Lambda$2141/0x00000001011f2c40.subscribe(Unknown Source)
2021-09-07T12:13:21.8346479Z             at app//reactor.core.publisher.FluxSource.subscribe(FluxSource.java:67)
2021-09-07T12:13:21.8348360Z             at app//reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62)
...
2021-09-07T12:13:21.8372471Z             at app//reactor.core.publisher.FluxFilter$FilterSubscriber.onNext(FluxFilter.java:113)
2021-09-07T12:13:21.8374134Z             at app//reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
2021-09-07T12:13:21.8375922Z             at app//org.reactivestreams.FlowAdapters$FlowToReactiveSubscriber.onNext(FlowAdapters.java:218)
2021-09-07T12:13:21.8377679Z             at app//oracle.jdbc.driver.PhasedPublisher$PhasedSubscription.lambda$emitNextItem$0(PhasedPublisher.java:404)
2021-09-07T12:13:21.8379341Z             at app//oracle.jdbc.driver.PhasedPublisher$PhasedSubscription$$Lambda$2304/0x0000000101264840.accept(Unknown Source)
2021-09-07T12:13:21.8380966Z             at app//oracle.jdbc.driver.PhasedPublisher.handleOnNext(PhasedPublisher.java:268)
2021-09-07T12:13:21.8382731Z             at app//oracle.jdbc.driver.PhasedPublisher$PhasedSubscription.lambda$emitNextItem$1(PhasedPublisher.java:402)
2021-09-07T12:13:21.8384340Z             at app//oracle.jdbc.driver.PhasedPublisher$PhasedSubscription$$Lambda$2302/0x0000000101264040.run(Unknown Source)
2021-09-07T12:13:21.8385883Z             at java.base@11.0.12/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
Michael-A-McMahon commented 2 years ago

Thanks, @dstepanov!

From the stack, I can see a ForkJoinPool thread has become blocked when subscribing to a Connection.close() publisher. The thread is blocked because another database call is executing. The thread only becomes unblocked when the database call completes, and the database call only completes when JDBC has an available thread.

This is yet another case of a deadlock I've seen several times before (and most of those times happened because of code that I wrote). I'm thinking that Oracle R2DBC really needs to implement a solution that just eliminates the possibility of threads contending for the Oracle JDBC lock.

I have some time this week, so I'll be looking into a fix.

dstepanov commented 2 years ago

@Michael-A-McMahon I don't think using common ForkJoinPool is a good idea, there could be some user code that is also using it and it might cause some unexpected behavior.

Michael-A-McMahon commented 2 years ago

I know that FJP can be problematic when code is causing its threads to block, or to execute very long running computations. These scenarios will starve the pool, and deprive it from running any other tasks.

@dstepanov: Do you think there are other reasons we should avoid FJP?

In any case, there could be value in allowing a non-default thread pool to be configured. This is supported by Oracle JDBC's Reactive Extensions, with OracleConnectionBuilder.executorOracle(Executor): https://docs.oracle.com/en/database/oracle/oracle-database/21/jajdb/oracle/jdbc/OracleConnectionBuilder.html#executorOracle_java_util_concurrent_Executor_ Oracle R2DBC can expose a ConnectionFactory Option that specifies an Executor, and it can configure Oracle JDBC to use that.

By the way, I've made some progress on having Oracle R2DBC control access to the JDBC connection to avoid blocked threads. I think I'll have the branch up for review later today.

dstepanov commented 2 years ago

The main reason is that the user code can block the common pool, so maybe some custom pool would be better to have by default.

Anyway, that was just a suggestion, you probably know better what should be done.