pgjdbc / r2dbc-postgresql

Postgresql R2DBC Driver
https://r2dbc.io
Apache License 2.0
1.01k stars 177 forks source link

No response from the database when same SQL statement is sent to the database multiple times #658

Open spc16670 opened 3 months ago

spc16670 commented 3 months ago

Current Behavior

We decided to use r2dbc in our new new project what required porting some of our existing blocking code onto a reactive stack. This included paging functions - these were rewritten to preserve the semantics from the traditional model. Instead of creating Fluxes and reading items to page limit, the SQL query is written to use LIMIT with OFFSET.

We noticed that that the query would always time out while fetching the 7th page.

Solution

It turned out that 'salting' the SQL statement to make it unique solves the issue. As a workaround we wrap all queries we send to the database into a sub query:

alias = now()
SELECT (a + alias).* FROM (SQL)as (a + alias)

The above fixes the problem - all our queries now return without timeout as expected, even if 10s of pages are involved.

This however seriously undermined our confidence in our new stack. We cannot tell for sure if the problem is with the driver, the DatabaseClient Spring class, or the database. The solution to the problem we found was more of a luck rather than anything else, neither the driver nor the database would offer any clues as to which end was causing the behaviour. We have taken a guess that the database is responding with data but the driver fails to notice the response therefore triggering the statement timeout exception.

I will try to rewrite our code to provide a reproduction here when I have more time.

mp911de commented 3 months ago

You can enable debug logging to see at which point the statement gets stuck. This should help to identify the cause.

spc16670 commented 3 months ago

Hope this is going to help:

2024-07-29T19:14:55.660+01:00  INFO 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] c.i.a.r.a.s.r.q.FindByAssigneesAndScopes : Getting page 6/152 for a batch of 152
2024-07-29T19:14:55.660+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.pool.ConnectionPool             : Obtaining new connection from the pool
2024-07-29T19:14:55.660+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [0] to: In{Inferred: java.lang.String}
2024-07-29T19:14:55.660+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [1] to: In{Inferred: java.lang.String}
2024-07-29T19:14:55.660+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [2] to: In{Inferred: java.lang.String}
2024-07-29T19:14:55.660+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [3] to: In{Inferred: [Ljava.lang.Integer;}
2024-07-29T19:14:55.661+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [4] to: In{Inferred: [Ljava.lang.Integer;}
2024-07-29T19:14:55.661+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [5] to: In{Inferred: java.lang.Integer}
2024-07-29T19:14:55.661+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [6] to: In{Inferred: java.lang.Integer}
2024-07-29T19:14:55.661+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.QUERY                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Executing query: SELECT DISTINCT
<fields>
FROM <table>
WHERE <preds> ORDER BY a.id LIMIT $6 OFFSET $7
2024-07-29T19:14:55.661+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Request:  [Bind{name='B_7', parameterFormats=[FORMAT_TEXT, FORMAT_TEXT, FORMAT_TEXT, FORMAT_TEXT, FORMAT_TEXT, FORMAT_BINARY, FORMAT_BINARY], parameters=[CompositeByteBuf(ridx: 0, widx: 37, cap: 37, components=1), CompositeByteBuf(ridx: 0, widx: 37, cap: 37, components=1), CompositeByteBuf(ridx: 0, widx: 4, cap: 4, components=1), CompositeByteBuf(ridx: 0, widx: 13, cap: 13, components=1), CompositeByteBuf(ridx: 0, widx: 13, cap: 13, components=1), CompositeByteBuf(ridx: 0, widx: 4, cap: 4, components=1), CompositeByteBuf(ridx: 0, widx: 4, cap: 4, components=1)], resultFormats=[], source='S_2'}, Describe{name='B_7', type=PORTAL}, Execute{name='B_7', rows=0}, Close{name='B_7', type=PORTAL}, Sync{}]
2024-07-29T19:14:55.661+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] WRITE: 211B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 42 00 00 00 ac 42 5f 37 00 53 5f 32 00 00 07 00 |B....B_7.S_2....|
<HEX FRAME>
|000000d0| 00 00 04                                        |...             |
+--------+-------------------------------------------------+----------------+
2024-07-29T19:14:55.661+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] FLUSH
2024-07-29T19:14:55.662+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] READ COMPLETE
2024-07-29T19:15:55.709+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] READ: 350B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 32 00 00 00 04 54 00 00 00 e4 00 07 69 64 00 00 |2....T......id..|
<HEX FRAME>
+--------+-------------------------------------------------+----------------+
2024-07-29T19:15:55.709+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Response: BindComplete{}
2024-07-29T19:15:55.709+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Response: RowDescription{<FIELDS>}]}
2024-07-29T19:15:55.710+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Response: ErrorResponse{fields=[Field{type=SEVERITY_LOCALIZED, value='ERROR'}, Field{type=SEVERITY_NON_LOCALIZED, value='ERROR'}, Field{type=CODE, value='57014'}, Field{type=MESSAGE, value='canceling statement due to statement timeout'}, Field{type=FILE, value='postgres.c'}, Field{type=LINE, value='3514'}, Field{type=ROUTINE, value='ProcessInterrupts'}]}
2024-07-29T19:15:55.710+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : Error: SEVERITY_LOCALIZED=ERROR, SEVERITY_NON_LOCALIZED=ERROR, CODE=57014, MESSAGE=canceling statement due to statement timeout, FILE=postgres.c, LINE=3514, ROUTINE=ProcessInterrupts
2024-07-29T19:15:55.711+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.postgresql.util.FluxDiscardOnCancel  : received cancel signal
2024-07-29T19:15:55.712+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.pool.PooledConnection           : Releasing connection
2024-07-29T19:15:55.712+01:00 ERROR 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.a.r.a.s.r.e.CubeRdsExceptionTranslator : Database query timed out canceling statement due to statement timeout
2024-07-29T19:15:55.712+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Response: ReadyForQuery{transactionStatus=IDLE}
2024-07-29T19:15:55.713+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] READ COMPLETE

**<HANGS HERE>**

com.xyz.RdsRepositoryException: io.r2dbc.postgresql.ExceptionFactory$PostgresqlNonTransientResourceException: [57014] canceling statement due to statement timeout

    <LONG STACKTRACE>
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: io.r2dbc.postgresql.ExceptionFactory$PostgresqlNonTransientResourceException: [57014] canceling statement due to statement timeout
    at io.r2dbc.postgresql.ExceptionFactory.createException(ExceptionFactory.java:109)
    at io.r2dbc.postgresql.ExceptionFactory.createException(ExceptionFactory.java:65)
    at io.r2dbc.postgresql.ExceptionFactory.handleErrorResponse(ExceptionFactory.java:132)
    at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onNext(FluxHandleFuseable.java:179)
    ... 47 more

2024-07-29T19:15:55.745+01:00 DEBUG 86417 --- [cube-agreement-api] [ionShutdownHook] i.r.p.client.ReactorNettyClient          : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Request:  Terminate{}
2024-07-29T19:15:55.747+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] WRITE: 5B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 58 00 00 00 04                                  |X....           |
+--------+-------------------------------------------------+----------------+
2024-07-29T19:15:55.747+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] FLUSH
2024-07-29T19:15:55.748+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] CLOSE
2024-07-29T19:15:55.748+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 ! R:localhost/127.0.0.1:15432] INACTIVE
2024-07-29T19:15:55.749+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 ! R:localhost/127.0.0.1:15432] UNREGISTERED

Process finished with exit code 255
mp911de commented 3 months ago

Alright, the database reports a query timeout after 60 seconds. With you receiving an exception that propagates the timeout, everything is fine here and I'm not sure what else the driver could do.

k163377 commented 3 months ago

It appears that we had the same problem as you. https://github.com/jOOQ/jOOQ/issues/16669#issuecomment-2261738516

It seems to be the same solution you were offering, but I have confirmed that adding a condition, e.g. alias = alias (where alias is the timestamp), can work around the problem. Also, as far as I have tried, it seems to work around the problem by issuing the query as a pure string instead of using a prepared statement.

@spc16670 Very big tip, thanks for your report.

spc16670 commented 3 months ago

When you serialize to a pure string the values you are passing make the SQL query unique enough from the perspective of the database - so it is another way to apply the same workaround. I am still not sure where the fix to this is supposed to be applied - the database?

k163377 commented 3 months ago

Sorry if I have misunderstood something.

What I have tried is to execute a query with embedded parameters (without using a prepared statement) directly against Connection with createStatement. With this method, the query no longer hangs even if I issue the exact same query many times, down to the parameters. So I am wondering if there is some problem with the handling of the prepared statement.

However, I have only been able to reproduce it using production code, and I am sorry that I cannot publish a sample.

k163377 commented 3 months ago

I have tried the check by adding print in the following place. https://github.com/pgjdbc/r2dbc-postgresql/blob/a13c02c09f83d44feaaea6d3416e3d6d5e0a5ad6/src/main/java/io/r2dbc/postgresql/client/ReactorNettyClient.java#L176

When hanging, CommandComplete{command=SET, rowId=null, rows=null} and ReadyForQuery{transactionStatus=IDLE} occurred successfully, then the subsequent BindComplete{} did not occur for 2 or 3 minutes. Therefore, I believe that the problem I am encountering is at least related to the processing of prepared statements.