redisson / redisson

Redisson - Easy Redis Java client and Real-Time Data Platform. Valkey compatible. Sync/Async/RxJava/Reactive API. Over 50 Redis or Valkey based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Bloom filter, Spring, Tomcat, Scheduler, JCache API, Hibernate, RPC, local cache...
https://redisson.pro
Apache License 2.0
23.26k stars 5.34k forks source link

LOAD SCRIPT fails after failover with MasterSlave topology (3.17.2) #4351

Closed avcad closed 2 years ago

avcad commented 2 years ago

Redisson throws errors after failover and continues until the client is restarted.

org.redisson.client.RedisTimeoutException: Unable to acquire connection! java.util.concurrent.CompletableFuture@f37ad6fe[Completed exceptionally]Increase connection pool size. Node source: NodeSource [slot=null, addr=null, redisClient=[addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379], redirect=null, entry=MasterSlaveEntry [masterEntry=[freeSubscribeConnectionsAmount=1, freeSubscribeConnectionsCounter=value:50:queue:0, freeConnectionsAmount=24, freeConnectionsCounter=value:65:queue:0, freezeReason=null, client=[addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379], nodeType=MASTER, firstFail=0]]], command: (SCRIPT LOAD), params: [return 1] after 3 retry attempts
at org.redisson.command.RedisExecutor$2.run(RedisExecutor.java:189)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:825)

I am testing in Kubernetes, using the Bitnami chart, one master, one replica.

I created a small program to reproduce. Only the load command fails

Config config = new Config();
config.useMasterSlaveServers().setMasterAddress("redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379")
.addSlaveAddress("redis://redisms-replicas-0.redisms-headless.redisms.svc.cluster.local.:6379");
RedissonClient redissonClient = Redisson.create(config);
while (true) {
   redissonClient.getBucket( "loop", StringCodec.INSTANCE ).get();
   redissonClient.getBucket("loop", StringCodec.INSTANCE ).set( i );

   final RScript rscript = redissonClient.getScript(StringCodec.INSTANCE);
   String sha = rscript.scriptLoad("return 1");
}

Before:

NAME                 READY   STATUS    IP              
redisms-master-0     1/1     Running   192.168.109.226 
redisms-replicas-0   1/1     Running   192.168.127.44

I delete redisms-master-0 and wait for it to recover - IP changes from 192.168.109.226 to 192.168.109.210

NAME                 READY   STATUS    IP              
redisms-master-0     1/1     Running   192.168.109.210 
redisms-replicas-0   1/1     Running   192.168.127.44  

After the system recovers, the client keeps failing on SCRIPT LOAD. Interestingly, SCRIPT LOAD runs on the master and replica ok See log of last SCRIPT LOAD. You can see it is executed on master and replica before failing:

Jun 08, 2022 1:20:21 AM org.redisson.command.RedisExecutor sendCommand
FINE: acquired connection for command (SCRIPT LOAD) and params [return 1] from slot NodeSource [slot=null, addr=null, redisClient=null, redirect=null, entry=MasterSlaveEntry [masterEntry=[freeSubscribeConnectionsAmount=1, freeSubscribeConnectionsCounter=value:50:queue:0, freeConnectionsAmount=23, freeConnectionsCounter=value:64:queue:0, freezeReason=null, client=[addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379], nodeType=MASTER, firstFail=0]]] using node redisms-master-0.redisms-headless.redisms.svc.cluster.local./192.168.109.210:6379... RedisConnection@-1661374143 [redisClient=[addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379], channel=[id: 0x95b01351, L:/192.168.127.31:51030 - R:redisms-master-0.redisms-headless.redisms.svc.cluster.local./192.168.109.210:6379], currentCommand=null, usage=1]
Jun 08, 2022 1:20:21 AM org.redisson.client.handler.CommandEncoder encode
FINEST: channel: [id: 0x95b01351, L:/192.168.127.31:51030 - R:redisms-master-0.redisms-headless.redisms.svc.cluster.local./192.168.109.210:6379] message: *3

$6

SCRIPT

$4

LOAD

$8

return 1

Jun 08, 2022 1:20:21 AM org.redisson.command.RedisExecutor sendCommand
FINE: acquired connection for command (SCRIPT LOAD) and params [return 1] from slot NodeSource [slot=null, addr=null, redisClient=[addr=redis://redisms-replicas-0.redisms-headless.redisms.svc.cluster.local.:6379], redirect=null, entry=MasterSlaveEntry [masterEntry=[freeSubscribeConnectionsAmount=1, freeSubscribeConnectionsCounter=value:50:queue:0, freeConnectionsAmount=23, freeConnectionsCounter=value:64:queue:0, freezeReason=null, client=[addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379], nodeType=MASTER, firstFail=0]]] using node redisms-replicas-0.redisms-headless.redisms.svc.cluster.local./192.168.127.44:6379... RedisConnection@-960916607 [redisClient=[addr=redis://redisms-replicas-0.redisms-headless.redisms.svc.cluster.local.:6379], channel=[id: 0xfd42fb0f, L:/192.168.127.31:42326 - R:redisms-replicas-0.redisms-headless.redisms.svc.cluster.local./192.168.127.44:6379], currentCommand=null, usage=1]
Jun 08, 2022 1:20:21 AM org.redisson.client.handler.CommandEncoder encode
FINEST: channel: [id: 0xfd42fb0f, L:/192.168.127.31:42326 - R:redisms-replicas-0.redisms-headless.redisms.svc.cluster.local./192.168.127.44:6379] message: *3

$6

SCRIPT

$4

LOAD

$8

return 1

Jun 08, 2022 1:20:21 AM org.redisson.client.handler.CommandDecoder decode
FINEST: reply: $40

e0e1f9fabfc9d4800c877a703b823ac0578ff8db

, channel: [id: 0x95b01351, L:/192.168.127.31:51030 - R:redisms-master-0.redisms-headless.redisms.svc.cluster.local./192.168.109.210:6379], command: CommandData [promise=java.util.concurrent.CompletableFuture@f93713f[Not completed, 1 dependents], command=(SCRIPT LOAD), params=[return 1], codec=org.redisson.codec.MarshallingCodec]
Jun 08, 2022 1:20:21 AM org.redisson.client.handler.CommandDecoder decode
FINEST: reply: $40

e0e1f9fabfc9d4800c877a703b823ac0578ff8db

, channel: [id: 0xfd42fb0f, L:/192.168.127.31:42326 - R:redisms-replicas-0.redisms-headless.redisms.svc.cluster.local./192.168.127.44:6379], command: CommandData [promise=java.util.concurrent.CompletableFuture@59cdfd6c[Not completed, 1 dependents], command=(SCRIPT LOAD), params=[return 1], codec=org.redisson.codec.MarshallingCodec]
Jun 08, 2022 1:20:21 AM org.redisson.client.RedisConnection <init>
FINE: Connection created [addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379]
Jun 08, 2022 1:20:21 AM org.redisson.command.RedisExecutor releaseConnection
FINE: connection released for command (SCRIPT LOAD) and params [return 1] from slot NodeSource [slot=null, addr=null, redisClient=null, redirect=null, entry=MasterSlaveEntry [masterEntry=[freeSubscribeConnectionsAmount=1, freeSubscribeConnectionsCounter=value:50:queue:0, freeConnectionsAmount=24, freeConnectionsCounter=value:65:queue:0, freezeReason=null, client=[addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379], nodeType=MASTER, firstFail=0]]] using connection RedisConnection@-1661374143 [redisClient=[addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379], channel=[id: 0x95b01351, L:/192.168.127.31:51030 - R:redisms-master-0.redisms-headless.redisms.svc.cluster.local./192.168.109.210:6379], currentCommand=CommandData [promise=java.util.concurrent.CompletableFuture@f93713f[Completed normally], command=(SCRIPT LOAD), params=[return 1], codec=org.redisson.codec.MarshallingCodec], usage=0]
Jun 08, 2022 1:20:21 AM org.redisson.command.RedisExecutor releaseConnection
FINE: connection released for command (SCRIPT LOAD) and params [return 1] from slot NodeSource [slot=null, addr=null, redisClient=[addr=redis://redisms-replicas-0.redisms-headless.redisms.svc.cluster.local.:6379], redirect=null, entry=MasterSlaveEntry [masterEntry=[freeSubscribeConnectionsAmount=1, freeSubscribeConnectionsCounter=value:50:queue:0, freeConnectionsAmount=24, freeConnectionsCounter=value:65:queue:0, freezeReason=null, client=[addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379], nodeType=MASTER, firstFail=0]]] using connection RedisConnection@-960916607 [redisClient=[addr=redis://redisms-replicas-0.redisms-headless.redisms.svc.cluster.local.:6379], channel=[id: 0xfd42fb0f, L:/192.168.127.31:42326 - R:redisms-replicas-0.redisms-headless.redisms.svc.cluster.local./192.168.127.44:6379], currentCommand=CommandData [promise=java.util.concurrent.CompletableFuture@59cdfd6c[Completed normally], command=(SCRIPT LOAD), params=[return 1], codec=org.redisson.codec.MarshallingCodec], usage=0]
Jun 08, 2022 1:20:23 AM org.redisson.command.RedisExecutor$2 run
FINE: attempt 1 for command (SCRIPT LOAD) and params [return 1]
Jun 08, 2022 1:20:23 AM org.redisson.client.RedisConnection <init>
FINE: Connection created [addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379]
Jun 08, 2022 1:20:24 AM org.redisson.command.RedisExecutor$2 run
FINE: attempt 2 for command (SCRIPT LOAD) and params [return 1]
Jun 08, 2022 1:20:24 AM org.redisson.client.RedisConnection <init>
FINE: Connection created [addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379]
Jun 08, 2022 1:20:25 AM org.redisson.connection.DNSMonitor monitorMasters
FINE: Request sent to resolve ip address for master host: redisms-master-0.redisms-headless.redisms.svc.cluster.local.
Jun 08, 2022 1:20:25 AM org.redisson.connection.DNSMonitor monitorSlaves
FINE: Request sent to resolve ip address for slave host: redisms-replicas-0.redisms-headless.redisms.svc.cluster.local.
Jun 08, 2022 1:20:25 AM org.redisson.connection.DNSMonitor lambda$monitorMasters$1
FINE: Resolved ip: redisms-master-0.redisms-headless.redisms.svc.cluster.local./192.168.109.210 for master host: redisms-master-0.redisms-headless.redisms.svc.cluster.local.
Jun 08, 2022 1:20:25 AM org.redisson.connection.DNSMonitor lambda$monitorSlaves$3
FINE: Resolved ip: redisms-replicas-0.redisms-headless.redisms.svc.cluster.local./192.168.127.44 for slave host: redisms-replicas-0.redisms-headless.redisms.svc.cluster.local.
Jun 08, 2022 1:20:26 AM org.redisson.command.RedisExecutor$2 run
FINE: attempt 3 for command (SCRIPT LOAD) and params [return 1]
Jun 08, 2022 1:20:26 AM org.redisson.client.RedisConnection <init>
FINE: Connection created [addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379]
Jun 08, 2022 1:20:28 AM demo.MasterSlaveFailoverTest main(String[] args)
INFO: <<<<< LOAD ERROR - ms 6,392 - ERROR org.redisson.client.RedisTimeoutException: Unable to acquire connection! java.util.concurrent.CompletableFuture@f37ad6fe[Completed exceptionally]Increase connection pool size. Node source: NodeSource [slot=null, addr=null, redisClient=[addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379], redirect=null, entry=MasterSlaveEntry [masterEntry=[freeSubscribeConnectionsAmount=1, freeSubscribeConnectionsCounter=value:50:queue:0, freeConnectionsAmount=24, freeConnectionsCounter=value:65:queue:0, freezeReason=null, client=[addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379], nodeType=MASTER, firstFail=0]]], command: (SCRIPT LOAD), params: [return 1] after 3 retry attempts
Jun 08, 2022 1:20:28 AM demo.MasterSlaveFailoverTest main(String[] args)
SEVERE: SCRIPT LOAD
org.redisson.client.RedisTimeoutException: Unable to acquire connection! java.util.concurrent.CompletableFuture@f37ad6fe[Completed exceptionally]Increase connection pool size. Node source: NodeSource [slot=null, addr=null, redisClient=[addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379], redirect=null, entry=MasterSlaveEntry [masterEntry=[freeSubscribeConnectionsAmount=1, freeSubscribeConnectionsCounter=value:50:queue:0, freeConnectionsAmount=24, freeConnectionsCounter=value:65:queue:0, freezeReason=null, client=[addr=redis://redisms-master-0.redisms-headless.redisms.svc.cluster.local.:6379], nodeType=MASTER, firstFail=0]]], command: (SCRIPT LOAD), params: [return 1] after 3 retry attempts
at org.redisson.command.RedisExecutor$2.run(RedisExecutor.java:189)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:825)
avcad commented 2 years ago

Complete log of the test. Started with Redis up, deleted node, waited for Redis to recover and errors continued... redisson-test-2.log

I also just re-tested with 3.16.8 fails but 3.16.3 works fine.

mrniko commented 2 years ago

Please try version attached.

redisson-3.17.4-SNAPSHOT.jar.zip

avcad commented 2 years ago

I will continue testing, but so far it is working. Thanks @mrniko for the quick turnaround!

mrniko commented 2 years ago

Fixed. Thanks for report