oracle / oracle-r2dbc

R2DBC Driver for Oracle Database
https://oracle.com
Other
197 stars 40 forks source link

Asynchronous Locking #49

Closed Michael-A-McMahon closed 3 years ago

Michael-A-McMahon commented 3 years ago

This branch adds an asynchronous and non-blocking lock to the OracleReactiveJdbcAdapter. The adapter uses this lock to guard access to a JDBC Connection.

The asynchronous lock allows Oracle R2DBC to avoid contention for the internal lock of the Oracle JDBC Connection. JDBC's internal lock will block threads that contend for it. This internal blocking lock has been the source of numerous deadlock scenarios. These deadlocks have been observed throughout the development of Oracle R2DBC:

48

35

https://github.com/oracle/oracle-r2dbc/issues/25#issuecomment-853508083 https://github.com/oracle/oracle-r2dbc/issues/5 https://github.com/oracle/oracle-r2dbc/pull/43#issuecomment-906883314

We originally thought that the solution to this problem would be to document that Oracle R2DBC does not support concurrent database calls over a single connection, and that programmers wouldn't have much trouble with that.

However, nearly every incident I've linked to above was caused by code that I wrote, and I have far more knowledge about Oracle JDBC's lock than the typical programmer. If I'm having trouble writing code that doesn't contend for that lock, then it is completely unreasonable to expect other programmers to work with this.

There is one incident above that was not caused by code I wrote. That code looked like this:

    Flux.usingWhen(
      connectionFactory.create(),
      connection ->
        Flux.usingWhen(
          Mono.from(connection.beginTransaction())
            .thenReturn(connection),
          nil ->
            connection.createStatement("INSERT INTO mytable VALUES(?)")
              .bind(0, 0)
              .execute(),
          Connection::commitTransaction),
      Connection::close)
      .hasElements()

To most R2DBC programmers, this might look like some top-notch coding. It's making good use of Flux.usingWhen to ensure that the connection gets closed and that the transaction is committed. With any other R2DBC driver, this code does what you'd expect. With the Oracle R2DBC driver, this code can deadlock your system, and the reason is far from obvious.

The next few paragraphs will attempt to explain the reason why the code above causes a deadlock. If you get lost or confused while reading the next few paragraphs, that's exactly the point I'm trying to make.

We simply can not inflict this kind of pain on to another programmer: First, the programmer has to understand the behavior of Flux.usingWhen and Flux.hasElements. The hasElements operator sends a cancel signal to the upstream publisher. Upon receiving the cancel signal, the inner usingWhen operator subscribes to the Connection.commitTransaction() publisher, and then the outer usingWhen operator subscribes to the Connection.close() publisher. Next, the programmer needs to understand the Reactive Streams Specification. When a subscriber sends a cancel signal upstream, that subscriber can no longer expect to receive any signals. This means that the outer usingWhen operator can not know when the inner operator's commitTransaction publisher has completed. So, the outer operator is going to subscribe to the Connection.close() publisher before the commitTransaction Publisher completes. Then, the programmer needs to understand behavior that is specific to Oracle's implementation of R2DBC. Oracle's driver will block your thread if you subscribe to the close() publisher before the commitTransaction() publisher has completed. Finally, the programmer needs to understand the internal implementation of Oracle JDBC's Reactive Extensions. When a Reactive Extension API initiates a database call, it locks the connection. The connection can only become unlocked when the database's response is received and processed. In order to process the response, a thread must be available. If all threads have become blocked, then the response is never processed and the system is deadlocked.

Changes in this branch will spare other programmers from having to understand anything I wrote in the paragraphs above.

Future branches may follow that use the asynchronous lock for synchronous JDBC calls as well. Even seemingly safe APIs like Connection.getMetadata() can contend for JDBC's lock. It would be best if we can avoid that contention as well.