Closed fgstewart closed 3 years ago
The same with #68 ?
I found a way to avoid this. use config like this
spring:
r2dbc:
url: r2dbcs:mysql://127.0.0.1:3306/xxx
username: xx
password: xx
pool:
max-idle-time: 2s
I notice
PoolBuilder<Connection, PoolConfig<Connection>> builder
init with
BiPredicate<Connection, PooledRefMetadata> evictionPredicate = (connection, metadata) -> {
long maxIdleTimeMills = maxIdleTime.toMillis();
long maxLifeTimeMillis = maxLifeTime.toMillis();
boolean isIdleTimeExceeded = maxIdleTimeMills != 0L && metadata.idleTime() >= maxIdleTimeMills;
boolean isLifeTimeExceeded = maxLifeTimeMillis != 0L && metadata.lifeTime() >= maxLifeTimeMillis;
return isIdleTimeExceeded || isLifeTimeExceeded;
};
the disconnected connection below to idle, but default max-idle-time is 30m, I tested with 2s, it works fine
I suggest modify evictionPredicate, where connection is not valid, evict it
Connection validation requires database I/O (Publisher<Boolean> Connection.validate(ValidationDepth depth)
).
When the connection validation fails, then we invalidate the pooled object and propagate the error. Validation is guarded with maxAcquireTime
against timeouts. acquireRetry
is by default 1 which means that upon a validation error we retry to acquire a new connection. If that fails, then the acquisition error gets propagated.
Connection validation requires database I/O (
Publisher<Boolean> Connection.validate(ValidationDepth depth)
).When the connection validation fails, then we invalidate the pooled object and propagate the error. Validation is guarded with
maxAcquireTime
against timeouts.acquireRetry
is by default 1 which means that upon a validation error we retry to acquire a new connection. If that fails, then the acquisition error gets propagated.
If i set an acquire time of 2 seconds it changes behavior and i get this error on the timer queries: Did not observe any item or terminal signal within 2000ms
But the connections don't recover. I also tried adding a "SELET 1" validation query and same result.
Using the max idle time of 2 seconds as suggested above does fix this sample app but if i change the recurring query timer to 1 second it breaks again. I.e. with the query happening in that idle window when the connection has not been invalidated due to idleness.
Thanks.
Did not observe any item or terminal signal within 2000ms
is a timeout signal. We need to dig likely deeper why that is. What could happen here is that all connections in the pool are broken and therefore, the next connection to obtain is invalid, too.
We've seen the exact same issue during a database restart.
First all connections drop:
WARN 9713 --- [reactor-tcp-epoll-1] i.r.p.client.ReactorNettyClient : Error: SEVERITY_LOCALIZED=FATAL, SEVERITY_NON_LOCALIZED=FATAL, CODE=57P01, MESSAGE=terminating connection due to administrator command, FILE=postgres.c, LINE=2902, ROUTINE=ProcessInterrupts
WARN 9713 --- [reactor-tcp-epoll-1] i.r.p.client.ReactorNettyClient : Error: SEVERITY_LOCALIZED=ERROR, CODE=08P01, MESSAGE=server conn crashed?
And then no new connections can be optained (using postgres)
2021-02-12 10:32:51.790 DEBUG 9713 --- [scheduler-1] io.r2dbc.pool.ConnectionPool : Obtaining new connection from the driver
2021-02-12 10:32:52.790 DEBUG 9713 --- [scheduler-1] io.r2dbc.pool.ConnectionPool : Obtaining new connection from the driver
2021-02-12 10:32:53.790 DEBUG 9713 --- [scheduler-1] io.r2dbc.pool.ConnectionPool : Obtaining new connection from the driver
2021-02-12 10:32:54.790 DEBUG 9713 --- [scheduler-1] io.r2dbc.pool.ConnectionPool : Obtaining new connection from the driver
2021-02-12 10:32:55.790 DEBUG 9713 --- [scheduler-1] io.r2dbc.pool.ConnectionPool : Obtaining new connection from the driver
2021-02-12 10:32:56.790 DEBUG 9713 --- [scheduler-1] io.r2dbc.pool.ConnectionPool : Obtaining new connection from the driver
2021-02-12 10:32:57.790 DEBUG 9713 --- [scheduler-1] io.r2dbc.pool.ConnectionPool : Obtaining new connection from the driver
2021-02-12 10:32:58.790 DEBUG 9713 --- [scheduler-1] io.r2dbc.pool.ConnectionPool : Obtaining new connection from the driver
Edit: Only one of three services was affected by this, the one which polls pg every second. The other services were running fine after the pg server restart. We're using pgbouncer in front of pg, which might explain this
This issue is risking one of our releases. Is there something we can do to help?
We need to investigate why this is and we haven't gotten to the point yet where we had sufficient time to investigate the issue. Feel free to have a look into the flow and why some guards (like timeouts) don't seem to work.
This may be of use - when we upgraded all our Spring/R2DBC libs the behavior changed. With the versions below what I see is that after the DB comes back up there are a couple more failures (which i think might remove bad connections form the pool) and then it reconnects.
I've not isolated this to one lib or the other, but just upgrading the R2DBC libs doesn't seem to fix it alone.
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.4.2</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
...
<dependency>
<groupId>io.r2dbc</groupId>
<artifactId>r2dbc-postgresql</artifactId>
<version>0.8.6.RELEASE</version>
</dependency>
<dependency>
<groupId>io.r2dbc</groupId>
<artifactId>r2dbc-pool</artifactId>
<version>0.8.5.RELEASE</version>
</dependency>
Similar to @yossarian81 upgrading to above versions fixed the issue. The visible errors in io.r2dbc stayed the same, see below, and we suspect that the issue is caused somewhere in reactor libs, presumably reactor-pool. The call connectionPool.acquire()
never succeeded after the bad state was reached
service-messaging-647b99846-hszbn service-messaging Caused by: io.r2dbc.spi.R2dbcNonTransientResourceException: Connection validation failed
service-messaging-647b99846-hszbn service-messaging at io.r2dbc.pool.Validation.lambda$validate$2(Validation.java:45) ~[r2dbc-pool-0.8.5.RELEASE.jar:0.8.5.RELEASE]
service-messaging-647b99846-hszbn service-messaging at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onNext(FluxHandleFuseable.java:163) ~[reactor-core-3.3.12.RELEASE.jar:3.3.12.RELEASE]
service-messaging-647b99846-hszbn service-messaging ... 90 common frames omitted
Can you check from where the logging gets issued? That would be helpful to investigate whether failures get propagated and logged or whether the failure is just logged without affecting the application.
Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.
Bug Report
Versions
Current Behavior
With the following steps and attached minimal test app I can repro a DB connection pool hang where killing and restarting the DB results in the app never reconnecting.
Stack trace
``` // your stack trace here ```Table schema
Input Code
```sql CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; DROP TABLE IF EXISTS test_table; CREATE TABLE test_table ( id uuid DEFAULT uuid_generate_v4() NOT NULL, name varchar(50) NOT NULL, CONSTRAINT "PK_1" PRIMARY KEY (id) ) ; ```Steps to reproduce
Link to minimal test app reproducing issue: https://github.com/yossarian81/r2dbc-issue
This app loads a minimal schema and some data, then it uses a Spring scheduled task to periodically read from the DB.
Steps:
The periodic data read application never recovers, it hangs when trying to read and with TRACE enabled the connection pool logs only:
"Obtaining new connection from the driver"
Tried many connection pool settings and I see the same behaviour.
Input Code
```java // your code here; ```Expected behavior/code
Expectation is that the periodic task is able to read from the DB once the DB is restarted.
Possible Solution
Additional context