Closed 970655147 closed 1 year ago
Actuator is part of Spring Boot, not Spring Data Redis.
but core code is in spring-data-redis, main context in LettuceConnectionFactory$SharedConnection.getConnection if ClusterConnectionProvider.getConnection throw Exception, there maybe tcp connection leak spring-boot-acurator is just a trigger in this context
but core code is in spring-data-redis, main context in LettuceConnectionFactory$SharedConnection.getConnection if ClusterConnectionProvider.getConnection throw Exception, there maybe tcp connection leak spring-boot-acurator is just a trigger in this context
Actuator is part of Spring Boot, not Spring Data Redis.
sorry, my presentation is not exactly but core code is in spring-data-redis, main context in LettuceConnectionFactory$SharedConnection.getConnection if ClusterConnectionProvider.getConnection throw Exception, there maybe tcp connection leak spring-boot-acurator is just a trigger in this context
Let me review and evaluate this further.
Closed by mistake.
After further analysis, I am not convinced that this is necessarily a problem with Spring Data Redis.
To state this case slightly differently, what would you expect to happen in this case (regarding this statement):
"if ClusterConnectionProvider.getConnection throw Exception, there maybe tcp connection leak spring-boot-acurator is just a trigger in this context"
And specifically, when an Exception is thrown from ClusterConnectionProvider.getConnection()
? Close the connection? But, by who/what? How? And, when?
I agree that the RedisConnection
should not leak TCP socket resources. And, you could argue that Spring Data Redis should be responsible for closing and cleaning up any resources acquired by the RedisConnection
. But, this depends on the "type" of Exception in the first place, in addition to the context (e.g. Spring Boot Actuator), as well as the (application) use case/requirements, and whether any useful information could be derived from the Exception or captured from the state of the RedisConnection
both before and after the Exception occurs, which would possibly dictate "how" it is handled, or whether it can or should be handled at all (application dependent).
Typically, a connection (e.g. RedisConnection
) to some resource (e.g. Redis server) is established for some purpose, it is handed off to the user (consumer) and the user is left to be responsible for "releasing" that connection, where the "how" and "when" is application and use case dependent. This is not unlike other data access frameworks, such as JDBC. With JDBC, it would be "driver" (vendor) dependent.
You could also argue that 1) Since a RedisConnection
is AutoCloseable
(Javadoc) and 2) the Spring Boot Actuator HealthIndicator
for Redis did not safely acquire a RedisConnection
, that it is actually Spring Boot's responsibility.
The logic inside Spring Boot Actuator's RedisHealthIndicator
used to acquire connection should technically be:
try (RedisConnection connection = RedisConnectionUtils.getConnection(this.redisConnectionFactory)) {
doHealthCheck(builder, connection);
}
finally {
RedisConnectionUtils.releaseConnection(connection, this.redisConnectionFactory);
}
By establishing a RedisConnection
outside the try-finally
block, when an Exception occurs, the logic inside the finally
block (i.e. "release" the RedisConnection
) will NOT get executed.
However, unlike JDBC, and even unlike Jedis (though maybe more similar to Reactive), is that opening RedisConnections
using the Lettuce driver is "async". This certainly complicates the "when" the connection gets established bit.
Although, in this particular case/context (RedisHealthIndicator
for Spring Boot Actuator), we see that the SD Redis RedisConnectUtils.getConnection(:RedisConnectionFactory)
method was invoked (here). If you trace this all the way through, we eventually arrive here.
You can see that Lettuce RedisConnections
are established (opened) "async", and that SD Redis "joins
" on the CompletableFuture
to turn an otherwise asynchronous operation into a synchronous, blocking call.
Given the Lettuce driver is returning a "CompletableFuture
" (extending CompletionStage
) as the wrapper for the RedisConnection
(technically, a Lettuce StatefulRedisClusterConnection
instance in the "clustered" case and a Lettuce StatefulConnection
in the non-clustered case), then it should be possible to maybe pass in an Exception handler (Function
) that releases connection resources immediately when a connection fails on open. This, I could possible agree to. However, it does not excuse Spring Boot Actuator from handling the connection properly.
Something like the following, possibly(??):
interface LettuceConnectionProvider {
default <T extends StatefulConnection<?, ?>> T getConnection(Class<T> connectionType) {
BiFunction<T, Throwable, T> handleExceptionOnConnectionOpen = (connection cause) -> {
try {
if (cause instanceof <WhatTypeOfRuntimeExceptionHere>) {
RedisConnectionUtils.releaseConnection(connection);
throw cause;
}
return connection;
}
catch (Throwable ignore) {
}
} ;
return LettuceFutureUtils.join(getConnectionAsync(connectionType)
.handleAsync(handleExceptionOnConnectionOpen));
}
}
Other factors to consider include, but are not limited to:
1) Whether a "Pool" of connections has been configured, and...
2) Whether it is possible, or whether any SocketOptions
have been supplied (indirectly) to the LettuceClientConfiguration
(from here to here).
Typically, a connection "Pool" is responsible for acquiring and releasing connections along with their associated resources (TCP sockets, etc), especially for "failed" connections, even on "open" no less.
I'd almost argue that your Spring Boot (Data Redis) application should be failing-fast, particularly if the authentication criteria has not been properly specified and configured. Why should the application even be allowed to start much less allow users to access the Spring Boot Actuator HealthIndicator
for Redis to reveal anything? Of course, if Redis is not used as your application's primary data store (you do mention MySQL) and only as a cache, then this would make sense.
Spring Data Redis does offer and allow (disabled by default) validation of RedisConnections
on startup.
Finally, and arguably, this could be something the (Lettuce) Redis driver(s) should be handling specifically, especially as it pertains to low-level resource (like TCP sockets), given drivers might do different things depending on the underlying OS, no less.
In any case, I want to get @mp911de or @christophstrobl's thoughts and opinions on this before we proceed with any changes to the framework.
if ClusterConnectionProvider.getConnection throw Exception, there maybe tcp connection leak
Do you have an example exception @970655147 that causes connection leaks for you?
leaks
Hello, reoccur like that new simple spring-boot project as follow
pom.xml include spring-boot-starter-data-redis `
`
application.yml config redis info, following config is mysql & kafka[just for test]
spring: redis: cluster: nodes: "192.168.0.105:3306,192.168.0.105:9092" timeout: "PT3S"
add case code in Application of spring-boot ` package com.hx.boot;
import com.hx.boot.context.SpringContext; import org.mybatis.spring.annotation.MapperScan; import org.springframework.boot.CommandLineRunner; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.data.redis.connection.ReactiveRedisConnection; import org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory; import org.springframework.scheduling.annotation.EnableAsync;
@EnableAsync @MapperScan("com.hx.boot.mapper") @SpringBootApplication public class HelloSpringBootApplication implements CommandLineRunner {
public static void main(String[] args) {
SpringApplication.run(HelloSpringBootApplication.class, args);
}
@Override
public void run(String... args) throws Exception {
LettuceConnectionFactory factory = SpringContext.getBean(LettuceConnectionFactory.class);
for (int i = 0; i < 100; i++) {
try {
ReactiveRedisConnection connection = factory.getReactiveConnection();
} catch (Exception e) {
// ignore
}
}
int x = 0;
}
} `
root@ubuntu:~# ll /proc/5795/fd | grep 41 lrwx------ 1 root root 64 Jun 29 06:41 103 -> socket:[328474] lrwx------ 1 root root 64 Jun 29 06:41 104 -> socket:[328475] lrwx------ 1 root root 64 Jun 29 06:41 105 -> socket:[320448] // ignore some file descriptor lrwx------ 1 root root 64 Jun 29 06:41 322 -> socket:[328586] lrwx------ 1 root root 64 Jun 29 06:41 323 -> socket:[323163] lrwx------ 1 root root 64 Jun 29 06:41 324 -> socket:[328587] lrwx------ 1 root root 64 Jun 29 06:41 325 -> socket:[320477]
if ClusterConnectionProvider.getConnection throw Exception, there maybe tcp connection leak
Do you have an example exception @970655147 that causes connection leaks for you?
@970655147 - First of all, you "ignored" the very thing (i.e. resulting "Exception") we asked you to provide (in detail) in the first place:
// get connection
} catch (Exception e) {
// ignore
}
Regarding:
"if ClusterConnectionProvider.getConnection throw Exception, there maybe tcp connection leak"
What Exception?
By ignoring the Exception, you are implying any Exception, and it does not matter. But, it does! It does because it matters what happens during the "process" of opening the connection.
An Exception resulting from a username/password auth error is different from an Exception occurring because an incorrect IP address or port to the server was configured/resolved is different from an Exception thrown because of a failed TLS handshake (verification) during SSL/TLS identification & authentication (as described by IBM, for instance) is different from other TCP errors, and so on.
In the later case, as with username/password-based auth, a TCP connection is still established and therefore system resources will be allocated and used to complete the steps of the TLS handshake/verification (over SSL/TLS). After all, Secure Sockets, or a "secure" Socket, like SSLSocket
, is nothing more than encrypted data over an otherwise "plaintext" Socket
(i.e. data must be "encrypted" before it is sent over a Socket
/network connection to be "secure"; there is no such thing as a hardware "secured" Socket; it is all "software").
This is also OS/system dependent.
As such, this why (TCP) Sockets have configuration (such as SO_LINGER, among other Socket
configuration options, timeouts, reuse, etc).
Secondarily, this code:
LettuceConnectionFactory factory = SpringContext.getBean(LettuceConnectionFactory.class);
for (int i = 0; i < 100; i++) {
try {
ReactiveRedisConnection connection = factory.getReactiveConnection();
...
..
.
Does not necessarily constitute a connection leak simply because an Exception might be thrown during an attempt to acquire a connection. You are simply iterating and opening connections without proper connection "handling". And, if it were unbounded, your problem would only exasberate the "perceived" problem.
Most file-based system resources (TCP Socket connections are no exception; see here), must be properly handled inside a block, such as the following (using your example):
for (int i = 0; i < 100; i++) {
try (ReactiveRedisConnection connection = factory.getReactiveConnection()) {
}
catch (Exception cause) {
// handle the exceptional condition appropriately; such as releasing/closing related resources and
}
Just because you attempt to establish a connection and 1) an Exception is thrown or 2) the block of code opening/using the connection falls out of scope, does not mean the framework, or any other facility for that matter, will automatically (or automagically) "close" the connection. And, even if "close" is called, it does not mean system resources will be immediately released (again, it depends on the Socket configuration using a factory when the Socket is created/opened).
It is always the user's responsibility to properly handle the connection, whether you are opening a Redis connection (regardless of type, e.g. Reactive), a (SQL) database connection (e.g. with JDBC), or simply opening a file.
For example:
try (FileReader fileReader = new FileReader(file)) {
// do something with the "open" file
}
If not properly handled, there is no guarantee that systems resources (even in the file case) will be properly released. Again, mileage varies based on OS/system, among other factors.
Also see this SO post as further evidence to this case.
1 last thing...
This is also why it is important to use some connection management facility, such as "pooling", which Spring Data Redis supports for both the Jedis and Lettuce drivers (see here).
In addition, the closer you get to system controlled resources, like TCP connections, the more it matters what type of driver you are using, some of which can be system dependent, even. It is also why certain drivers perform better or behave slightly differently on different platforms, especially given no 2 things (like OSes) are equal in this regard.
This could also arguably be a driver responsibility and concern more than a framework-level concern. Frameworks like Spring Data Redis are not in the business of managing connections, rather, only acquiring and using them for some higher purpose, like the data access patterns for querying, etc.
And, in this case, it is allowing your application code to acquire a connection "directly" (I might add) for some low-level purpose, and as such, your application code becomes responsible for that resource. It would be different if the framework were acquiring/opening a connection on your behalf, such as would be the case in Repository.save(entity)
or a Repository.findByXyz(..)
call.
@970655147 - First of all, you "ignored" the very thing (i.e. resulting "Exception") we asked you to provide (in detail) in the first place:
// get connection } catch (Exception e) { // ignore }
Regarding:
"if ClusterConnectionProvider.getConnection throw Exception, there maybe tcp connection leak"
What Exception?
By ignoring the Exception, you are implying any Exception, and it does not matter. But, it does! It does because it matters what happens during the "process" of opening the connection.
An Exception resulting from a username/password auth error is different from an Exception occurring because an incorrect IP address or port to the server was configured/resolved is different from an Exception thrown because of a failed TLS handshake (verification) during SSL/TLS identification & authentication (as described by IBM, for instance) is different from other TCP errors, and so on.
In the later case, as with username/password-based auth, a TCP connection is still established and therefore system resources will be allocated and used to complete the steps of the TLS handshake/verification (over SSL/TLS). After all, Secure Sockets, or a "secure" Socket, like
SSLSocket
, is nothing more than encrypted data over an otherwise "plaintext"Socket
(i.e. data must be "encrypted" before it is sent over aSocket
/network connection to be "secure"; there is no such thing as a hardware "secured" Socket; it is all "software").This is also OS/system dependent.
As such, this why (TCP) Sockets have configuration (such as SO_LINGER, among other
Socket
configuration options, timeouts, reuse, etc).Secondarily, this code:
LettuceConnectionFactory factory = SpringContext.getBean(LettuceConnectionFactory.class); for (int i = 0; i < 100; i++) { try { ReactiveRedisConnection connection = factory.getReactiveConnection(); ... .. .
Does not necessarily constitute a connection leak simply because an Exception might be thrown during an attempt to acquire a connection. You are simply iterating and opening connections without proper connection "handling". And, if it were unbounded, your problem would only exasberate the "perceived" problem.
Most file-based system resources (TCP Socket connections are no exception; see here), must be properly handled inside a block, such as the following (using your example):
for (int i = 0; i < 100; i++) { try (ReactiveRedisConnection connection = factory.getReactiveConnection()) { } catch (Exception cause) { // handle the exceptional condition appropriately; such as releasing/closing related resources and }
Just because you attempt to establish a connection and 1) an Exception is thrown or 2) the block of code opening/using the connection falls out of scope, does not mean the framework, or any other facility for that matter, will automatically (or automagically) "close" the connection. And, even if "close" is called, it does not mean system resources will be immediately released (again, it depends on the Socket configuration using a factory when the Socket is created/opened).
It is always the user's responsibility to properly handle the connection, whether you are opening a Redis connection (regardless of type, e.g. Reactive), a (SQL) database connection (e.g. with JDBC), or simply opening a file.
For example:
try (FileReader fileReader = new FileReader(file)) { // do something with the "open" file }
If not properly handled, there is no guarantee that systems resources (even in the file case) will be properly released. Again, mileage varies based on OS/system, among other factors.
Also see this SO post as further evidence to this case.
following code based on spring-data-redis 2.0.5-RELEASE
"Do you have an example exception", I have a wrong reading, I make sense of "example case" exception as following, throw ex2 from lettuce, then spring-data-redis wrap ex1
ex1. getNativeConnection:964, LettuceConnectionFactory$SharedConnection (org.springframework.data.redis.connection.lettuce): "Unable to connect to Redis", and hold cause as follow
ex2. loadPartitions:790, RedisClusterClient (io.lettuce.core.cluster) - io.lettuce.core.RedisException: Cannot retrieve initial cluster partitions from initial URIs [RedisURI [host='192.168.0.106', port=3306], RedisURI [host='192.168.0.106', port=9092]]
my presentation is not imprecisely "factory.getReactiveConnection();" throw ex, so I just ignore that for casualness
the call stack trace in this case is like following
getConnection:55, ClusterConnectionProvider (org.springframework.data.redis.connection.lettuce)
getNativeConnection:957, LettuceConnectionFactory$SharedConnection (org.springframework.data.redis.connection.lettuce)
getConnection:932, LettuceConnectionFactory$SharedConnection (org.springframework.data.redis.connection.lettuce)
getSharedReactiveConnection:795, LettuceConnectionFactory (org.springframework.data.redis.connection.lettuce)
getReactiveConnection:329, LettuceConnectionFactory (org.springframework.data.redis.connection.lettuce)
if ex "Unable to connect to Redis" thrown from LettuceConnectionFactory$SharedConnection.getNativeConnection, so this.connection is always null an in next, nextnext, .. call, it will connect to redis-server over and over again
LettuceConnectionFactory$SharedConnection.getConnection
StatefulConnection<E, E> getConnection() {
synchronized (this.connectionMonitor) {
if (this.connection == null) {
this.connection = getNativeConnection();
}
if (getValidateConnection()) {
validateConnection();
}
return this.connection;
}
}
I was wondering following processing handle "getConnections(seed)" established connection, but "connections.close()" doesn't release that
loadPartitions:787, RedisClusterClient (io.lettuce.core.cluster)
public Map<RedisURI, Partitions> loadViews(Iterable<RedisURI> seed, boolean discovery) {
long commandTimeoutNs = getCommandTimeoutNs(seed);
Connections connections = null;
try {
connections = getConnections(seed).get(commandTimeoutNs, TimeUnit.NANOSECONDS);
// ignore some code
return nodeSpecificViews.toMap();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new RedisCommandInterruptedException(e);
} finally {
if (connections != null) {
connections.close();
}
}
}
but I'm not sure, I'm not experienced, the "NioSocketChannel" is only increase without decrease until "too many open files"
root@ubuntu:~# jmap -histo:live 6970 | grep NioSocketChannel
38: 558 58032 io.netty.channel.socket.nio.NioSocketChannel
66: 558 35712 io.netty.channel.socket.nio.NioSocketChannel$NioSocketChannelConfig
99: 558 22320 io.netty.channel.socket.nio.NioSocketChannel$NioSocketChannelUnsafe
Regarding #3 above, and specifically, this statement:
"an in next, nextnext, .. call, it will connect to redis-server over and over again"
"What" will connect to redis-server over and over again?
There is nothing in the call stack to which your refer that repeatedly (i.e. loops, and) attempts to establish a connection until (implied) successful (e.g. non-null).
Using Spring Data Redis 2.7.13
, you can navigate the call stack as follows (let's assume shared, clustered connections since you often refer to a "cluster" and is the subject of this ticket), then:
NOTE: Your line numbers do not line up. And, you somehow mixed clustered with non-clustered calls in your stack. This does not make sense. Never-the-less, it ultimately leads to a similar call stack since the difference between clustered and non-clustered depends on the Lettuce
RedisClient
configured and created when theLettuceConnectionFactory
bean is "initialized".
caused by: ...
108: ClusterConnectionProvider.getConnectionAsync(:Class)
53: LettuceConnectionProvider.getConnection(:Class)
1383: LettuceConnectionFactory.SharedConnection.getNativeConnection()
1366: LettuceConnectionFactory.SharedConnection.getConnection()
1117: LettuceConnectionFactory.getSharedReactiveConnection()
529: LettuceConnectionFactory.getReactiveClusterConnection()
505: LettuceConnectionFactory.getReactiveConnection(..)
...
..
.
If you are using "pooling" (highly recommended in this case), then the only difference would be an additional call between 1383 and 53, on line 97 in the LettucePoolingConnectionProvider.getConnection(:Class)
.
NOTE: A
LettucePoolingConnectonProvider
simply "wraps" the underlying, actualLettuceConnectionProvider
implementation, such as theClusterConnectionProvider
, to "pool" connections. The pool configuration can also "limit" the number of connections so they do not grow unbounded, and your run out of system resources (like file descriptors).
In any case, this is very much dependent on the caller.
In your specific case, that would be the Spring Boot RedisHealthIndicator
, which, TMK, is only triggered when you trip the URL endpoint exposed by Spring Boot Actuator to access the "health" of your Redis cluster (Redis servers in the cluster). There would need to be some "active" monitoring for the RedisHealthIndicator
to repeatedly attempt to open connections and exhaust your "unbounded" (non-pooled) resources.
Spring Data Redis cannot possibly know all the ways in which callers will attempt to acquire and use connections. Again, outside of RedisTemplate
operations or Spring Data Redis Repository infrastructure, which might acquire connections on the user's (or caller's) behalf, then if the caller or your application (client) code acquires a connection, it is the responsibility of the caller or your application logic to properly manage the connection.
You have 1 of 3 options, here:
1) Increase the hard and soft limits of your File Descriptors (FDs) in your Linus-based system. For example.
2) Configure and use connection pooling (recommended).
3) Fix your underlying Exception: Cannot retrieve initial cluster partitions from initial URIs [RedisURI [host='192.168.0.106', port=3306], RedisURI [host='192.168.0.106', port=9092]]
.
I strongly encourage you to consider #2
and investigate #3
. You clearly have misconfigured your environment in some way or have some other (network) environment or Redis problem. Once control is down inside line 108, you are at the mercy of the Redis driver (Lettuce).
Regarding #4:
I gather, the assessment of the Redis partitions distributed across the clutter is triggered by Spring Data Redis, here, and specifically.
Again, this is very much Redis driver based (Lettuce, in this instance), and largely outside the control of Spring Data Redis.
Also, the Lettuce driver code does not strike me as incorrect. It properly wraps the connection in a try-catch-finally
block, attempting to close the connection if any kind of Throwable
is thrown. What ultimately happens to the underlying system resources (for example, like Sockets
) is runtime and system dependent.
this is a project using spring-data-redis, spring-boot-actuator there are 2 case, might reoccur that
config address that isn't redis server, follow try a mysql services, then visit "http://localhost:8080/actuator/health", there leak 6 tcp connection each visit
spring.redis.cluster.nodes: 192.168.220.133:3306,192.168.220.133:3307,192.168.220.133:3308,192.168.220.133:3309,192.168.220.133:3310,192.168.220.133:3311
config redis server with password, but didn't config "spring.redis.password" then visit "http://localhost:8080/actuator/health", there leak 6 tcp connection each visit
spring.redis.cluster.nodes: 192.168.220.133:7001,192.168.220.133:7002,192.168.220.133:7003,192.168.220.133:7004,192.168.220.133:7005,192.168.220.133:7006
use "ll /proc/$pid/fd | grep socket" to watch connections