Closed lukasz-weareuncapped closed 2 months ago
that cached map does not work anymore
Can you clarify here?
while reading/writing to map i've got an exception. Unfortunately i have no clue what's the exact exception is due to lack of logs.. but we do read and write on it.
sorry, without details I'm unable to help. Redisson handles failover and you don't need to call redissonClient.getMap() after that.
Now i know much more. I've turned on debug logs and see that redisson keeps trying to reconnect(scaled down redis cluster to 0 and scaled up to repeat that issue). Unfortunately it keeps trying to reconnect to IP that is not active anymore. After reboot the IP has changed.. but looks that somehow it is cached and not able to connect to newly started redis nodes. If keeps trying forever with no luck. Is there any caching mechanism?
`2024-04-26 11:03:16.519 CEST Connection created [addr=redis://locking-redis-node-0.locking-redis-headless.default.svc.cluster.local:26379]
2024-04-26 11:03:16.618 CEST reconnecting RedisPubSubConnection@395557197 [redisClient=[addr=redis://10.1.69.64:6379], channel=[id: 0x3d2eea36, L:/10.1.75.4:52138 ! R:10.1.69.64/10.1.69.64:6379], currentCommand=null, usage=0] to 10.1.69.64/10.1.69.64:6379
2024-04-26 11:03:16.618 CEST Connection created [addr=redis://10.1.69.64:6379]
2024-04-26 11:03:17.818 CEST connection released for command (INFO REPLICATION) and params [] from slot NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null] using connection RedisConnection@1350729690 [redisClient=[addr=redis://10.1.69.64:6379], channel=[id: 0xc72cb405, L:/10.1.75.4:52132 ! R:10.1.69.64/10.1.69.64:6379], currentCommand=null, usage=0]
2024-04-26 11:03:17.818 CEST attempt 3 for command (INFO REPLICATION) and params [] to NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null]
`
Which Redisson version?
redisson client 3.26.1 Configuration we have:
"SPRING_DATA_REDIS_PASSWORD": "123",
"SPRING_DATA_REDIS_SENTINEL_MASTER": "mymaster",
"SPRING_DATA_REDIS_SENTINEL_NODES": "locking-redis-headless.default.svc.cluster.local:26379",
Can you share Redisson config?
Can you set trace
logging level for org.redisson
package and share the output?
redisson configuration is injected via redisson-spring-boot-starter
spring:
data:
redis:
password: mypass
sentinel:
master: mymaster
nodes: locking-redis-headless.default.svc.cluster.local:26379
Traces on the way...
Explore-logs-2024-04-26 20_28_22.txt - > Logs from 15 minutes of working application when trying to reconnect (after restarting all slaves)
and logs where redis slaves were working fine.-> Explore-logs-2024-04-26 20_34_00.txt
I found one exception related with DNS netty and found the issue solved by netty team that is related with caching DNS https://github.com/netty/netty/issues/11885 which is fixed in 4.1.109 or 4.1.110. (also related with https://github.com/netty/netty/issues/12842 ) while the latest 3.29.0 redisson uses 4.1.104 under the hood
one more exception i'm facing on daily basis...
ERROR 2024-04-28T02:10:10.190425043Z Can't execute SENTINEL commands on 10.236.135.253/10.236.135.253:26379 java.util.concurrent.CompletionException: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (SENTINEL GET-MASTER-ADDR-BY-NAME), params: [mymaster], Redis client: [addr=redis://xx:26379] at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) at org.redisson.client.RedisConnection.lambda$async$0(RedisConnection.java:254) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) Caused by: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (SENTINEL GET-MASTER-ADDR-BY-NAME), params: [mymaster], Redis client: [addr=redis://xx:26379] at org.redisson.client.RedisConnection.lambda$async$0(RedisConnection.java:253) ... 7 common frames omitted
and
org.redisson.client.RedisTimeoutException: Command execution timeout for command: (AUTH), params: (password masked), Redis client: [addr=redis://locking-redis-node-2.locking-redis-headless.default.svc.cluster.local:26379] java.util.concurrent.CompletionException: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (AUTH), params: (password masked), Redis client: [addr=redis://locking-redis-node-2.locking-redis-headless.default.svc.cluster.local:26379] at java.base/java.util.concurrent.CompletableFuture.encodeRelay(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeRelay(Unknown Source) at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) at org.redisson.client.RedisConnection.lambda$async$0(RedisConnection.java:254) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) Caused by: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (AUTH), params: (password masked), Redis client: [addr=redis://locking-redis-node-2.locking-redis-headless.default.svc.cluster.local:26379] at org.redisson.client.RedisConnection.lambda$async$0(RedisConnection.java:253) ... 7 common frames omitted
while the latest 3.29.0 redisson uses 4.1.104 under the hood
It uses 4.1.109 https://repo1.maven.org/maven2/org/redisson/redisson-parent/3.29.0/redisson-parent-3.29.0.pom
As for timeouts you need to read this https://github.com/redisson/redisson/wiki/16.-FAQ#q-what-is-the-cause-of-redistimeoutexception I'm sure it's a network issue
Hi @mrniko Thank you for your advice. We went trough whole network but have not found any problem.
I went trough several tests and had a discovery right now and confirmed that for sure it is not our network problem. I did a scheduled test(places within an app) in single thread checking each 5 minutes via InetAddress resolving all my hosts. Each time we were getting right IP , even after restarting all slaves i got the right IP address (at the same time redisson were not able to reconnect). Based on that I came up with the idea that something has changed on redisson recently and i've tried downgrade to 3.26.0 - and buuum. It started working again as it should. So basically on version 3.29.0 we have same issue on all our applications that is unable to reconnect trying constantly to make a connection. Isn't that existing connections in theconnection pool are reused to "reconnect" them instead of creating new ones? (that is suggestion i've started making when seeing in logs same IP after restarts). Below logs are visible after each time when i did slaves restart only on 3.26.0.. not on 3.29.0.. Now, after reconnect i may see in logs (on 3.26.0):
1 connections initialized for 10.1.65.24/10.1.65.24:6379
master 10.1.65.24/10.1.65.24:6379 used as slave
24 connections initialized for 10.1.65.24/10.1.65.24:6379
sentinel: redis://10.1.65.23:26379 is down
sentinel: redis://10.1.69.26:26379 added
1 connections initialized for locking-redis-node-1.locking-redis-headless.default.svc.cluster.local/10.1.69.26:6379
slave: locking-redis-node-1.locking-redis-headless.default.svc.cluster.local/10.1.69.26:6379 added
24 connections initialized for locking-redis-node-1.locking-redis-headless.default.svc.cluster.local/10.1.69.26:6379
What i've spotted (on 3.26.0) that log "connections initialized for" is printed in logs each time when we restart nodes. So reconnect
Can you try 3.27.0 version?
But if i'm not mistaken we already had some tests with 3.27.0 version. Anyway, it's bean a while since i did it so let me bump it up and verify results.
Btw I saw one fix for similar issue we have here -> https://github.com/redisson/redisson/issues/5857 made by you @mrniko. Isn't already released with 3.32.0?
@lukasz-weareuncapped
Btw I saw one fix for similar issue we have here -> https://github.com/redisson/redisson/issues/5857 made by you @mrniko. Isn't already released with 3.32.0?
Yes
I have spring app works and running with RMap create from RedissonClient.getMap cached. Once something happens with few redis nodes (we had a forced restart) that cached map does not work anymore. What is the recommended approach in that case? Call redissonClient.getMap each time? As i saw under the hood new RedissonMap is created so could be time and resource consuming. So please advice. Thank you.