Create too many connections to server

haohuif commented 1 year ago

Questions

Reids runs in cluster mode. When userReplicas=always is specified, a large number of connections will be created to the server

After analyzing the source code, I think it is because RequestImpl does not implement the hashCode and equals methods. In the getConnection method of RedisConnectionManager, the request object will be specified to form the ConnectionKey together with the url string of the endpoint. Therefore, each time the connection is obtained through RedisClusterClient, a new connection will be built because the key does not exist

Version

4.2.7

PyAntony commented 1 year ago

I am facing the same issue; I believe this is a bug. The issue is simply resolved by usinguserReplicas=never, so this is definitely happening when using connections to replica nodes. What I have observed is the following:

1) Connection to replica node is used. This connection uses readonly mode (https://github.com/vert-x3/vertx-redis-client/issues/160); if it didn't use readonly we would simply get the MOVED exception. 2) Client gets response successfully, but after that new connections to all cluster nodes are set. This happens with each and every request (I can see this in the logging by using quarkus.log.category."io.vertx.redis".level=DEBUG). 3) On the server side the connections in readonly mode are never closed; they keep piling up. We can track this by using the CLIENT LIST command. There is a point in which the server won't accept more connections; in my case my application just halted (as discussed here: https://github.com/smallrye/smallrye-reactive-messaging/issues/2102).

So the problems That I see are:

Why are all connections to the cluster reset after each request to a slave node? Resetting all connections again and again on each request is very expensive. I don't think this is intended isn't?
There is definitely an issue with the way the connections are closed (are they even closed? maybe they are leaked somehow?). The server doesn't close these open connections in readonly mode, so there is something buggy with the client, maybe a swollen exception somewhere that triggers a recreation of the entire connection pool again and again. @haohuif mentioned that "a new connection will be built", but these new connections built are not coming from the same pool since we would quickly run out of available connections and get the pool exception ("not available connections" or something like that). So my hypothesis is that a new pool is created over and over.

Regardless of if and when these issues would be fixed, I think it is a necessity to document this quickly in the site: https://vertx.io/docs/vertx-redis-client/java/. There is a section called Implementing Reconnect on Error, but it doesn't say anything about the fact that you will always get problems with the client in cluster mode. The impression is that (again, for cluster mode only) you simply set up your client by passing all node hosts and you are good to go. This is not the case; the reality is that the client in cluster mode is buggy, or at least it requires additional custom code for it to work properly:

You can use userReplicas=never but all query requests will go to master nodes, not ideal for query intensive applications.
If you use userReplicas=always|share YOU WILL GET the dreaded MOVED exception when using multiple shards (many people are facing this issue).
For a single shard client, when using replicas for queries, we see the bug explained above.

It seems for cluster mode we should always use userReplicas=never, as other modes are unreliable. If we have a query intensive application and want to use replicas then we need to prepare a custom solution.

By the way, these issues have been observed with quarkus 2.16.4. These are the vertx versions I see:

antony@pop-os:~/Desktop/rest-client$ mvn dependency:tree | grep vertx
[INFO] |     |     \- io.vertx:vertx-web-client:jar:4.3.7:compile
[INFO] |     \- io.quarkus.resteasy.reactive:resteasy-reactive-vertx:jar:2.16.4.Final:compile
[INFO] |  +- io.quarkus:quarkus-vertx-http:jar:2.16.4.Final:compile
[INFO] |  |  +- io.smallrye.common:smallrye-common-vertx-context:jar:1.13.2:compile
[INFO] |  |  +- io.quarkus:quarkus-vertx-http-dev-console-runtime-spi:jar:2.16.4.Final:compile
[INFO] |  |  +- io.smallrye.reactive:smallrye-mutiny-vertx-web:jar:2.30.1:compile
[INFO] |  |  |  +- io.smallrye.reactive:smallrye-mutiny-vertx-web-common:jar:2.30.1:compile
[INFO] |  |  |  +- io.smallrye.reactive:smallrye-mutiny-vertx-auth-common:jar:2.30.1:compile
[INFO] |  |  |  +- io.smallrye.reactive:smallrye-mutiny-vertx-bridge-common:jar:2.30.1:compile
[INFO] |  |  |  \- io.smallrye.reactive:smallrye-mutiny-vertx-uri-template:jar:2.30.1:compile
[INFO] |  |  |     \- io.vertx:vertx-uri-template:jar:4.3.7:compile
[INFO] |  |  +- io.vertx:vertx-web:jar:4.3.7:compile
[INFO] |  |  |  +- io.vertx:vertx-web-common:jar:4.3.7:compile
[INFO] |  |  |  +- io.vertx:vertx-auth-common:jar:4.3.7:compile
[INFO] |  |  |  \- io.vertx:vertx-bridge-common:jar:4.3.7:compile
[INFO] |  +- io.quarkus:quarkus-vertx:jar:2.16.4.Final:compile
[INFO] |  |  +- io.quarkus:quarkus-vertx-latebound-mdc-provider:jar:2.16.4.Final:compile
[INFO] |  |  +- io.smallrye.reactive:smallrye-mutiny-vertx-core:jar:2.30.1:compile
[INFO] |  |  |  +- io.smallrye.reactive:smallrye-mutiny-vertx-runtime:jar:2.30.1:compile
[INFO] |  |  |  \- io.smallrye.reactive:vertx-mutiny-generator:jar:2.30.1:compile
[INFO] |  |  |     \- io.vertx:vertx-codegen:jar:4.3.7:compile
[INFO] |  |  \- io.smallrye:smallrye-fault-tolerance-vertx:jar:5.6.0:compile
[INFO] |  +- io.vertx:vertx-kafka-client:jar:4.3.7:compile
[INFO] |  |  \- io.vertx:vertx-core:jar:4.3.7:compile
[INFO] |  \- io.smallrye.reactive:smallrye-mutiny-vertx-redis-client:jar:2.30.1:compile
[INFO] |     \- io.vertx:vertx-redis-client:jar:4.3.7:compile

Ladicek commented 1 year ago

This is fixed on the master branch by #374. Since this issue basically means that connection pooling doesn't work at all, I believe we need to backport the fix to 4.x. I opened #406 to backport the fix to 4.x.

cescoffier commented 1 year ago

@Ladicek should this issue be fixed?

Ladicek commented 1 year ago

Indeed. This is fixed in 4.4.6 (#411), in the 4.x branch (#406) and in master (#374). Closing.

vert-x3 / vertx-redis-client

Create too many connections to server #365

Questions

Version