oracle / oracle-r2dbc

R2DBC Driver for Oracle Database
https://oracle.com
Other
194 stars 40 forks source link

Memory Leak Suspect in ForwardOnlyResultSet #85

Closed sgtcortez closed 1 year ago

sgtcortez commented 1 year ago

Hi, i am so glad to be using this wonderful reactive driver for Oracle!
Thanks

Application

Description

I have in production an application which, servers a lot of data, and, have many hits per second.
Basically, it uses cache-aside technique with redis.
So, go redis, key not in redis, go to oracle, do some mappings, store it into redis.

It works like a charm, but, when the application is running on a day without interruption, on our apm, I start to see a memory consumption growth, and the garbage collector can not clean the memory and, the memory average starts to growth, and growth and growth.

I took a heap dump, and, with eclipse memory analyzer I saw that, the heap has a lot of ForwardOnlyResultSet objects, which, it seems that its not closed after the use.

Here, we can see that we have a lot of instances of ForwardOnlyResultSet image

Since, we are using r2dbc-pool, with a pool of 5 connections,
image

I might be missing something, and forgetting to close the result set. Which, is something that I would to on pure jdbc. But, I havent seen nothing here in the docs about that.

Example of use

I suspect that I am not closing the result set below, I am just closing(returning the connection back to the pool) after each statement

    public Mono<CouponDto> fetchCoupon(final long itemId, final long couponId, final long subsidiaryId) {
        return Flux.usingWhen(
                connectionPool.create(),
                connection -> Mono.from(connection.createStatement(QUERY)
                        .bind(0, subsidiaryId)
                        .bind(1, itemId)
                        .bind(2, couponId)
                        .execute()
                ).flatMapMany(it -> it.map(mapper)),
                Connection::close,
                ((connection, throwable) -> connection.close()),
                Connection::close
        ).next();
    }

    private final BiFunction<Row, RowMetadata, CouponDto> mapper = (row, rowMetadata) ->
            new CouponDto(
                    row.get("id", Long.class),
                    row.get("discountPercentage", Double.class),
                    row.get("listPrice", Double.class),
                    row.get("discountType", Integer.class)
            );

Again, thanks for your help & time! Best Regards, Matheus Rambo

Michael-A-McMahon commented 1 year ago

Hi Matheus! Thank you for this wonderfully detailed analysis; It helps me immensely to see it all.

I'm sort of out this week, so a proper investigation might have to wait until next week.

For now, I'll fill you in on how Oracle R2DBC closes the underlying JDBC Statement and ResultSet. As you've noted, the R2DBC SPI does not have an explicit close() method for statements and results. Instead, we have an implicit close() that occurs when your code to "fully consumes" the results. By results, I mean the Result objects emitted by the Publisher<Result> that Statement.execute() returns. For each Result emitted, your code must consume it by calling a method like updateCount or map (I see your code is calling map). In the case of Result.map(BiFunction), a Publisher<Row> is returned. To fully consume this result, your code must subscribe to this publisher and receive all Rows from the publisher. After the publisher emits the last row, Oracle R2DBC will close the JDBC Statement and ResultSet. Keep in mind that if the Publisher<Result> is emitting multiple results, each result must be fully consumed to close the statement. So if you have a PL/SQL call returning multiple cursors (ie: multiple ResultSets via DBMS_SQL.RETURN_RESULT), a Publisher<Result> emits a Result for each cursor, and a subscriber must receive each row of each result. Final thing, and this is important: A subscriber may cancel the subscription to the Publisher<Result>or the Publisher<Row>. If the subscriber cancels, then Oracle R2DBC needs to handle that just as if the subscriber had received all the rows; Oracle R2DBC should close the JDBC statement if a subscriber cancels their subscription to these publishers.

I suspect that DBMS_SQL.RETURN_RESULT is in play here. In the dump, we see T4CPreparedStatement.openImplicitResultSets. This field is where Oracle JDBC stores ResultSets for RETURN_RESULT cursors.

I mentioned cancellation earlier. Flux.next() is specified to do exactly that: https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Flux.html#next-- So what I think your code will do is receive the first row of the first result, and then send a cancel signal up to Oracle R2DBC. This should trigger a close of the JDBC statement, but your memory dump suggests it is not. There's good reason to think Oracle R2DBC has a bug here. I'll want to investigate this more when I can. This code looks suspect: https://github.com/oracle/oracle-r2dbc/blob/c35a1dd480c51146cf8762090efd81f9914999e1/src/main/java/oracle/r2dbc/impl/OracleStatementImpl.java#L1159

sgtcortez commented 1 year ago

Hi, @Michael-A-McMahon thank you for your response.

We do have some queries which results more than one line, but, if our application works correctly it consumes all rows returned by the driver.

I will try a workaround here, and see if it works. For example, I do not know why I am using Flux.usingWhen(), because, the query will return just one row. So, I might be able to switch to Mono.usingWhen.

sgtcortez commented 1 year ago

I was looking the code that I have, and, the thing about cancellation on next.

So what I think your code will do is receive the first row of the first result, and then send a cancel signal up to Oracle R2DBC. This should trigger a close of the JDBC statement, but your memory dump suggests it is not. There's good reason to think Oracle R2DBC has a bug here.

As you have commented on this issue, a time ago

For Oracle R2DBC, a Result is not longer valid after the connection is closed.

So, since I have the next operator, downstream, I suspect that when it cancels, the connection was closed and the result set was not closed.

sgtcortez commented 1 year ago

So, instead of:

    public Mono<CouponDto> fetchCoupon(final long itemId, final long couponId, final long subsidiaryId) {
        return Flux.usingWhen(
                connectionPool.create(),
                connection -> Mono.from(connection.createStatement(QUERY)
                        .bind(0, subsidiaryId)
                        .bind(1, itemId)
                        .bind(2, couponId)
                        .execute()
                ).flatMapMany(it -> it.map(mapper)),
                Connection::close,
                ((connection, throwable) -> connection.close()),
                Connection::close
        ).next();
    }

I can write something like:

    public Mono<CouponDto> fetchCoupon(final long itemId, final long couponId, final long subsidiaryId) {
        return Mono.usingWhen(
                connectionPool.create(),
                connection -> Mono.from(connection.createStatement(QUERY)
                        .bind(0, subsidiaryId)
                        .bind(1, itemId)
                        .bind(2, couponId)
                        .execute()
                ).flatMapMany(it -> it.map(mapper)).next(),
                Connection::close,
                ((connection, throwable) -> connection.close()),
                Connection::close
        );
    }

Does it makes sense for you? For me, it makes. Because the result is no longer available after the connection is closed. Now, the next() should emit the cancellation signal. And the driver must free the resources.

sgtcortez commented 1 year ago

I have tried my guess, without success.
Memory leaks still occurs ...

I applied a load test in both forms, in both the leak occurs. I have put a breakpoint in the method that you have mentioned(dellocate) and it is called.

So, in theory the preparedStatement is closed, and, according to the docs of ResultSet, if the statement is cloed, the resultSet is automatically closed.

A ResultSet object is automatically closed when the Statement object that generated it is closed, re-executed, or used to retrieve the next result from a sequence of multiple results.

So, I might be wrong, but something strange is happening, and I suspect that the problem is not in r2dbc-oracle.

Douglas-Surber commented 1 year ago

Matheus,

The heap dump suggests that your app is executing PLSQL stored procedures that return multiple result sets and that not all of those result sets are being processed. This is not the more common way of getting a result set by executing a query. There's nothing wrong with this, just that it is not the most common use case.

As Michael said there may be a problem in Oracle R2DBC (or Oracle Database JDBC). In particular closing a CallableStatement may not always close unprocessed result sets as expected. And keep in mind that under the covers Oracle Database JDBC caches CallableStatements for reuse. This caching may be a link in the chain that is resulting in this apparent memory leak.

I'm not very knowledgeable about the R2DBC API so I can't offer specific code suggestions. First, am I correct that the app is executing a stored procedure that returns multiple result set. Assuming that is true, you should verify that the app is subscribed to the stream of results from the stored procedure call and that it processes all the rows of all the results returned.

I'm making a lot of assumptions based on a quick look at the heap dump, but maybe this is of some help.

Douglas

On Aug 18, 2022, at 10:52 AM, Matheus Rambo @.**@.>> wrote:

I have tried my guess, without success. Memory leaks still occurs ...

I applied a load test in both forms, in both the leak occurs. I have put a breakpoint in the method that you have mentioned(dellocate) and it is called.

So, in theory the preparedStatement is closed, and, according to the docs of ResultSethttps://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html, if the statement is cloed, the resultSet is automatically closed.

A ResultSet object is automatically closed when the Statement object that generated it is closed, re-executed, or used to retrieve the next result from a sequence of multiple results.

So, I might be wrong, but something strange is happening, and I suspect that the problem is not in r2dbc-oracle.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/oracle/oracle-r2dbc/issues/85*issuecomment-1219771388__;Iw!!ACWV5N9M2RV99hQ!IRfB24B2UCzIwg8fu76Be84UOKY4kngCf5hQ4uiq8VXtPFs3FKukPzsqoGMTh0wvVeHw74N7XF1cHzo_ny6CsA-kF_Sv$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AARZS6NOYGXYLI5E63DX7XDVZZZ65ANCNFSM56WVWNNQ__;!!ACWV5N9M2RV99hQ!IRfB24B2UCzIwg8fu76Be84UOKY4kngCf5hQ4uiq8VXtPFs3FKukPzsqoGMTh0wvVeHw74N7XF1cHzo_ny6CsM9HSaw3$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

sgtcortez commented 1 year ago

Hi @Douglas-Surber thans for your help.

You are right, I do have a call to a stored procedure.

I have the following statement:

  select Pck_Packs_Virtuais.Fnc_Analisa_Cesta_Json(?)
  as retorno from dual

which, is called from:

    public Flux<PackDto> fetchPackIdentifier(final PackIdentifierDto input) {
        return Flux.usingWhen(
                        connectionPool.create(),
                        connection -> Mono.from(connection.createStatement(PROCEDURE)
                                .bind(0, JsonUtil.toJson(List.of(input)))
                                .execute()
                        ).flatMapMany(it -> it.map(mapper)),
                        Connection::close,
                        ((connection, throwable) -> connection.close()),
                        Connection::close
                ).map(s -> List.of(JsonUtil.fromJson(s, PackDto[].class)))
                .flatMap(Flux::fromIterable);
    }

    private final BiFunction<Row, RowMetadata, String> mapper = (row, rowMetadata) ->
            Optional.ofNullable(row.get(0, String.class)).orElse("");

JsonUtil is just a static class which converts json to java and vice-versa.

It seems to consume all the results ... Something that diverge is that I am not doing the procedural call correctly. As shown in the readme

Michael-A-McMahon commented 1 year ago

I've had some time to look at the Oracle JDBC code, and I've made a concerning discovery: That openImplicitResultSets field we see in your dump is never cleared. Combined with statement caching (Oracle R2DBC enables this), that ArrayDeque will continue to grow until the statement is evicted from the cache.

Next steps:

  1. Fix the bug in Oracle JDBC.
  2. Work around the bug in Oracle R2DBC.

Things are quite busy in Oracle land these days, but I'll try to get on this as soon as I can.

Huge THANK YOU to @sgtcortez for finding this issue and bringing it to our attention.

sgtcortez commented 1 year ago

Thank you @Michael-A-McMahon.

I am so glad to help!

sgtcortez commented 1 year ago

Hi @Michael-A-McMahon

Working for a week now ... image

I compiled the main branch and added to my repository. Waiting for the release on maven ...

Thanks

Michael-A-McMahon commented 1 year ago

Wow! Great to see this follow up!

We aim for a quarterly release cadence. 1.0.0 dropped in the summer, so we should have a new release soon.

Thanks again for the excellent data you provided on this issue; It really helped us a lot. But more importantly, it will help other Oracle JDBC and R2DBC users who might have hit this bug.

htejwani commented 1 year ago

Hello, do you see this bug causing "ORA-01000: maximum open cursors exceeded error"? We are getting this error during our load tests.

Michael-A-McMahon commented 1 year ago

Absent of any bug in Oracle R2DBC, ORA-01000 is a symptom of a Result not being consumed. That is: A Result is emitted downstream, but no method is called to consume it: map, getRowsUpdated, or flatMap.

If you can share some code that recreates the failure, I can try to debug it.

On my side, I wrote something just now to try and catch a bug:

import io.r2dbc.spi.Connection;
import io.r2dbc.spi.ConnectionFactories;
import io.r2dbc.spi.ConnectionFactoryOptions;
import oracle.r2dbc.OracleR2dbcOptions;
import org.reactivestreams.Publisher;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

import static java.lang.String.format;

public class LoadTest {

  static Publisher<? extends Connection> connect() {
    var options =
      ConnectionFactoryOptions
        .parse("r2dbc:oracle://phoenix")
        .mutate()
        .option(
          OracleR2dbcOptions.TNS_ADMIN, "/Users/michaelmcmahon/.oracle/database")
        .build();
    return ConnectionFactories.get(options).create();
  }

  static Publisher<Void> query(Connection connection, int value) {
    return Mono.from(connection.createStatement(format(
      "SELECT %d FROM sys.dual", value))
      .execute())
      .flatMap(result ->
        Mono.from(result.map(row -> row.get(0))))
      .doOnNext(System.out::println)
      .then();
  }

  public static void main(String[] args) {
    Flux.usingWhen(
      connect(),
      connection ->
        Flux.range(0, 5000)
          .concatMap(i -> query(connection, i)),
      Connection::close)
      .blockLast();
  }
}

But this is code is not failing, so yours must be doing something different.

htejwani commented 1 year ago

Thanks @Michael-A-McMahon for your prompt response. Yes, all our INSERT/UPDATE queries were leaking cursors as we were not invoking the getRowsUpdated method since we didn't need that info to be processed. Adding this call fixed the cursor leak issue. Thanks.

Michael-A-McMahon commented 1 year ago

I'm glad you've found the issue.

It is understandable that applications won't always call getRowsUpdated for INSERT/UPDATE. So in the 1.1.0 release, I've relaxed the requirement: For INSERT/UPDATE/DELETE, the cursor is automatically closed, even if the Result is not consumed. This seemed more sensible because we only need to keep the cursor open if we are fetching row data or PL/SQL OUT parameters from that cursor. There was no need to keep the cursor around if the result is just an update count.

I'm mentioning this because, if you can update to 1.1.0, then you won't need to change your code. (As of writing, we are still waiting for the 1.1.0 jar to be published on Maven Central. I believe it will be there very soon though.)