r2dbc / r2dbc-pool

Connection Pooling for Reactive Relational Database Connectivity
https://r2dbc.io
Apache License 2.0
331 stars 55 forks source link

Validation timeout should invalidate connection #175

Closed GabrielCalin closed 2 years ago

GabrielCalin commented 2 years ago

Bug Report

It seems that when validation times out, the connection is reset to a reusable state instead of being invalidated. Because of this, it will be borrowed again to the client and the cycle can continue. I assume this is a bug and not intended.

Versions

Current Behavior

When validation times out, the connection is reset to a reusable state instead of being invalidated.

Steps to reproduce

In my case, the database endpoint is accessible only from the corporate VPN. We can reproduce the issue by following the next steps:

  1. With the VPN enabled, start the application connecting to the DB
  2. Disable VPN
  3. Make a DB query

We have a custom metrics recorder, logging all the method invocations. You can see in the metrics recorder that the recordResetLatency is called after maxAcquireTime. I can only assume that the validation takes forever because the connection is already open when I disable the VPN and no ReadTimeoutHandler is added to the Netty channel by r2dbc-postgresql.

If these steps don't work for you, most probably you could simulate a long validation query.

Expected behavior/code

I expect the connection to be destroyed when validation times out. In terms of metrics recorder, I expect the recordDestroyLatency to be called.

Possible Solution

I cloned the repo and looked over the code.

I can get the desired behaviour by removing the timeout from this line in the ConnectionPool class: mono = mono.timeout(this.maxAcquireTime).contextWrite(context -> { However, after I do that, some unit tests are failing, so maybe I break something.

Removing it works because getValidationFunction already sets a timeout for validation. However, it seems it is overwritten by the piece of code above (considering its the same timeout, it doesn't matter in terms of timing, but if they are different you can see with some added logs that the timeout in getValidationFunction is not taken into consideration).

Even with the current code, this line seems like it's trying to achieve exactly what this issue reports. conn = conn.onErrorResume(throwable -> ref.invalidate().then(Mono.error(throwable))); The code is triggered, however the invalidate method doesn't seem to invalidate the connection. It's somehow related to the 2 timeouts mentioned above - if I remove the above mentioned timeout, the same line will invalidate the connection.

I hope it's a bug, even though I see a unit test called shouldReusePooledConnectionAfterTimeout (I'm a bit afraid the whole behaviour might be intended).

Additional context

It would also be helpful if r2dbc-postgresql would allow us to set a maximum response time. At the moment, it only allows setting the connectTimeout.

mp911de commented 2 years ago

We use the same timeout for allocation and validation, thus it is highly likely that the general allocation timeout cancels the allocation before the validation has a chance to fail.

The only way to fix that is introducing a validation timeout setting that can be set to a lower value than maxAcquireTime.

GabrielCalin commented 2 years ago

Thanks for the answer, Mark!

I think having a different timeout for the validation is a good idea and I could try to push a PR in this direction.

However, I already tried setting a smaller value for the validation timeout and it seems it's not taken into consideration - the validation timeout will still be triggered after the higer, maxAcquireTime, timeout. I'm still trying to understand why that's happening.

mp911de commented 2 years ago

The timeout is activated during publisher subscription (onSubscribe). Operators are subscribed from the last to the first one and therefore the last timeout starts to count first before the inner validation timeout comes into play.

Feel free to submit a pull request that also updates the PoolingConnectionFactoryProvider and the readme.

GabrielCalin commented 2 years ago

I submitted a pull request: https://github.com/r2dbc/r2dbc-pool/pull/176 Please let me know if any changes are needed.

GabrielCalin commented 2 years ago

Hi. Do you have any input regarding the above PR? We would be interested first if it looks OK and second if there is any scheduled merge. Unfortunately, for us the bug in this ticket is major, causing long periods of unavailability.