redis / ioredis

🚀 A robust, performance-focused, and full-featured Redis client for Node.js.
MIT License
14.06k stars 1.19k forks source link

Redis Cluster slave nodes and scaleReads, possible bug managing the MOVED response? #1583

Open ligreman opened 2 years ago

ligreman commented 2 years ago

Related to https://github.com/luin/ioredis/issues/1582

In a Redis Cluster, 3 masters + 3 slaves, if you send the reads commands to the slaves by default they will answer with a "MOVED" response, redirecting you ti the master that owns the slot of the key you are asking for.

In order to get a response from the slave directly you mus issue first a "READONLY" command, to enable reads on the connection with the slave. Then, it will give you the value of the key that it is stored in the slave, instead of redirecting you to the master.

I have checked that ioredis, when you use scaleReads "all" or "slave", if you don't send first a READONLY command, you will get constantly MOVED responses.

This can lead to a loop of redirections. Eg, with scaleReads=slave, when sending a GET command ioredis will choose the slave. Then it will receive a MOVED response, but ioredis instead of sending the command to the master pointed in that MOVED, it will send it again to the slave. And you enter a loop.

So here I think there are two issues:

1 - Managing the MOVED response in the right way

ioredis should issue the command to the node pointed by the MOVED response. Maybe this is a bug? In my case when testing it always send it to the slave again and entered a loop.

I had ioredis configured this way:

2 - Should ioredis send a READONLY command to the slaves, on connection to the cluster?

To get the value of the key in the response from the slaves, instead of a MOVED response, you must send READONLY (or READWRITE) to the slaves before issuing the command.

I think that ioredis should send it automatically to enable READONLY in the connection, when allReads is "all" or "slave".

I don't know if the configuration property readOnly has something to do here:

  ioredis:cluster:connectionPool   { host: 'localhost', port: 63795, readOnly: true },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63792, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63796, readOnly: true },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63794, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63793, readOnly: true }

If it is supposed to work this way, delegating in the client to issue the READONLY command, I would add a note in the documentation to clarify it. Because if not, you might enter this "death-loop".

Thanks!

luin commented 2 years ago

Hey @ligreman 👋

When readOnly is true, ioredis should already ensure that READONLY command is sent before other commands. We have a test case for that: https://github.com/luin/ioredis/blob/b530a0b9fe0f987d6786e5cfccbfae8b5b9c9294/test/functional/connection.ts#L355. In your log:

  ioredis:cluster:connectionPool   { host: 'localhost', port: 63795, readOnly: true },

It shows readOnly was correctly passed to Redis constructor. So not sure why it didn't work. Can you double-check whether it doesn't send the READONLY command? Thanks!

ligreman commented 2 years ago

Hi @luin !

I've made a test and this is the result. But before that... I create a Redis Cluster with "new Redis.Cluster", I comment this becuse the test case you mention before uses a "new Redis" connection, I don't know if it does not test a Redis.Cluster one and that is the problem here.

So, my configuration for the Redis.Cluster is:

// These are the startUpNodes
[                                    
  { host: 'localhost', port: 63791 },
  { host: 'localhost', port: 63792 },
  { host: 'localhost', port: 63793 },
  { host: 'localhost', port: 63794 },
  { host: 'localhost', port: 63795 },
  { host: 'localhost', port: 63796 } 
]

I have a proxy in my localhost that I use to connect to the Redis Cluster, so I use natmap for that.

// Here we have the options object with the configuration
{
  scaleReads: 'slave',
  maxRedirections: 8,
  retryDelayOnFailover: 100,
  slotsRefreshInterval: 150000,
  natMap: {
    '10.20.30.220:6379': { host: 'localhost', port: 63791 },
    '10.20.30.221:6379': { host: 'localhost', port: 63792 },
    '10.20.30.222:6379': { host: 'localhost', port: 63793 },
    '10.20.30.223:6379': { host: 'localhost', port: 63794 },
    '10.20.30.224:6379': { host: 'localhost', port: 63795 },
    '10.20.30.225:6379': { host: 'localhost', port: 63796 }
  },
  redisOptions: {
    password: 'testpwd',
    username: 'testusr',   
    enableAutoPipelining: true,
    keyPrefix: 'test:',
    connectTimeout: 10000
  },
  keyPrefix: 'test:'
}

1st test: I create a Redis.Cluster and ejust send some commands to Redis (without sending a READONLY command by myself)

I start the application and it connects to the Redis Cluster

  ioredis:cluster status: [empty] -> connecting +0ms
  ioredis:cluster resolved hostname localhost to IP 127.0.0.1 +373ms
  ioredis:cluster:connectionPool Reset with [
  ioredis:cluster:connectionPool   { host: '127.0.0.1', port: 63791 },
  ioredis:cluster:connectionPool   { host: '127.0.0.1', port: 63792 },
  ioredis:cluster:connectionPool   { host: '127.0.0.1', port: 63793 },
  ioredis:cluster:connectionPool   { host: '127.0.0.1', port: 63794 },
  ioredis:cluster:connectionPool   { host: '127.0.0.1', port: 63795 },
  ioredis:cluster:connectionPool   { host: '127.0.0.1', port: 63796 }
  ioredis:cluster:connectionPool ] +0ms
  ioredis:cluster:connectionPool Connecting to 127.0.0.1:63791 as master +5ms
  ioredis:redis status[127.0.0.1:63791]: [empty] -> wait +0ms
  ioredis:cluster:connectionPool Connecting to 127.0.0.1:63792 as master +3ms
  ioredis:redis status[127.0.0.1:63792]: [empty] -> wait +2ms
  ioredis:cluster:connectionPool Connecting to 127.0.0.1:63793 as master +1ms
  ioredis:redis status[127.0.0.1:63793]: [empty] -> wait +1ms
  ioredis:cluster:connectionPool Connecting to 127.0.0.1:63794 as master +2ms
  ioredis:redis status[127.0.0.1:63794]: [empty] -> wait +2ms
  ioredis:cluster:connectionPool Connecting to 127.0.0.1:63795 as master +1ms
  ioredis:redis status[127.0.0.1:63795]: [empty] -> wait +1ms
  ioredis:cluster:connectionPool Connecting to 127.0.0.1:63796 as master +1ms
  ioredis:redis status[127.0.0.1:63796]: [empty] -> wait +1ms
  ioredis:cluster getting slot cache from 127.0.0.1:63795 +16ms
  ioredis:redis status[127.0.0.1:63795 (ioredis-cluster(refresher))]: [empty] -> wait +2ms
  ioredis:redis status[127.0.0.1:63795 (ioredis-cluster(refresher))]: wait -> connecting +3ms
  ioredis:redis queue command[127.0.0.1:63795 (ioredis-cluster(refresher))]: 0 -> cluster([ 'slots' ]) +1ms
  ioredis:cluster:subscriber selected a subscriber 127.0.0.1:63793 +0ms
  ioredis:redis status[127.0.0.1:63793 (ioredis-cluster(subscriber))]: [empty] -> wait +2ms
  ioredis:cluster:subscriber started +2ms
  ioredis:redis status[127.0.0.1:63795 (ioredis-cluster(refresher))]: connecting -> connect +27ms
  ioredis:redis write command[127.0.0.1:63795 (ioredis-cluster(refresher))]: 0 -> auth([ 'testusr', testpwd ]) +1ms
  ioredis:redis status[127.0.0.1:63795 (ioredis-cluster(refresher))]: connect -> ready +2ms
  ioredis:connection set the connection name [ioredis-cluster(refresher)] +0ms
  ioredis:connection send 1 commands in offline queue +2ms
  ioredis:redis write command[127.0.0.1:63795 (ioredis-cluster(refresher))]: 0 -> cluster([ 'slots' ]) +2ms
  ioredis:redis write command[127.0.0.1:63795 (ioredis-cluster(refresher))]: 0 -> client([ 'setname', 'ioredis-cluster(refresher)' ]) +4ms
  ioredis:cluster cluster slots result count: 3 +49ms
  ioredis:cluster NAT mapping 10.20.30.220:6379 -> { host: 'localhost', port: 63791 } +0ms
  ioredis:cluster NAT mapping 10.20.30.224:6379 -> { host: 'localhost', port: 63795 } +1ms
  ioredis:cluster cluster slots result [0]: slots 0~5460 served by [ 'localhost:63791', 'localhost:63795' ] +1ms
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +14ms
  ioredis:cluster NAT mapping 10.20.30.225:6379 -> { host: 'localhost', port: 63796 } +1ms
  ioredis:cluster cluster slots result [1]: slots 5461~10922 served by [ 'localhost:63792', 'localhost:63796' ] +0ms
  ioredis:cluster NAT mapping 10.20.30.223:6379 -> { host: 'localhost', port: 63794 } +2ms
  ioredis:cluster NAT mapping 10.20.30.222:6379 -> { host: 'localhost', port: 63793 } +1ms
  ioredis:cluster cluster slots result [2]: slots 10923~16383 served by [ 'localhost:63794', 'localhost:63793' ] +1ms
  ioredis:cluster:connectionPool Reset with [
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63791, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63795, readOnly: true },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63792, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63796, readOnly: true },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63794, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63793, readOnly: true }
  ioredis:cluster:connectionPool ] +92ms
  ioredis:cluster:connectionPool Disconnect 127.0.0.1:63791 because the node does not hold any slot +6ms
  ioredis:redis status[127.0.0.1:63791]: wait -> close +54ms
  ioredis:connection skip reconnecting since the connection is manually closed. +58ms
  ioredis:redis status[127.0.0.1:63791]: close -> end +1ms
  ioredis:cluster:connectionPool Remove 127.0.0.1:63791 from the pool +2ms
  ioredis:cluster:connectionPool Disconnect 127.0.0.1:63792 because the node does not hold any slot +1ms
  ioredis:redis status[127.0.0.1:63792]: wait -> close +1ms
  ioredis:connection skip reconnecting since the connection is manually closed. +3ms
  ioredis:redis status[127.0.0.1:63792]: close -> end +1ms
  ioredis:cluster:connectionPool Remove 127.0.0.1:63792 from the pool +1ms
  ioredis:cluster:connectionPool Disconnect 127.0.0.1:63793 because the node does not hold any slot +1ms
  ioredis:redis status[127.0.0.1:63793]: wait -> close +1ms
  ioredis:connection skip reconnecting since the connection is manually closed. +2ms
  ioredis:redis status[127.0.0.1:63793]: close -> end +1ms
  ioredis:cluster:connectionPool Remove 127.0.0.1:63793 from the pool +2ms
  ioredis:cluster:connectionPool Disconnect 127.0.0.1:63794 because the node does not hold any slot +0ms
  ioredis:redis status[127.0.0.1:63794]: wait -> close +2ms
  ioredis:connection skip reconnecting since the connection is manually closed. +2ms
  ioredis:redis status[127.0.0.1:63794]: close -> end +1ms
  ioredis:cluster:connectionPool Remove 127.0.0.1:63794 from the pool +2ms
  ioredis:cluster:connectionPool Disconnect 127.0.0.1:63795 because the node does not hold any slot +1ms
  ioredis:redis status[127.0.0.1:63795]: wait -> close +1ms
  ioredis:connection skip reconnecting since the connection is manually closed. +3ms
  ioredis:redis status[127.0.0.1:63795]: close -> end +1ms
  ioredis:cluster:connectionPool Remove 127.0.0.1:63795 from the pool +2ms
  ioredis:cluster:connectionPool Disconnect 127.0.0.1:63796 because the node does not hold any slot +0ms
  ioredis:redis status[127.0.0.1:63796]: wait -> close +1ms
  ioredis:connection skip reconnecting since the connection is manually closed. +2ms
  ioredis:redis status[127.0.0.1:63796]: close -> end +1ms
  ioredis:cluster:connectionPool Remove 127.0.0.1:63796 from the pool +2ms
  ioredis:cluster:connectionPool Connecting to localhost:63791 as master +0ms
  ioredis:redis status[localhost:63791]: [empty] -> wait +2ms
  ioredis:cluster:connectionPool Connecting to localhost:63795 as slave +2ms
  ioredis:redis status[localhost:63795]: [empty] -> wait +2ms
  ioredis:cluster:connectionPool Connecting to localhost:63792 as master +1ms
  ioredis:redis status[localhost:63792]: [empty] -> wait +1ms
  ioredis:cluster:connectionPool Connecting to localhost:63796 as slave +2ms
  ioredis:redis status[localhost:63796]: [empty] -> wait +1ms
  ioredis:cluster:connectionPool Connecting to localhost:63794 as master +1ms
  ioredis:redis status[localhost:63794]: [empty] -> wait +2ms
  ioredis:cluster:connectionPool Connecting to localhost:63793 as slave +2ms
  ioredis:redis status[localhost:63793]: [empty] -> wait +1ms
  ioredis:cluster status: connecting -> connect +50ms
  ioredis:redis status[localhost:63794]: wait -> connecting +3ms
  ioredis:redis queue command[localhost:63794]: 0 -> cluster([ 'info' ]) +1ms
  ioredis:cluster:subscriber subscriber has left, selecting a new one... +115ms
  ioredis:redis status[127.0.0.1:63793 (ioredis-cluster(subscriber))]: wait -> close +2ms
  ioredis:connection skip reconnecting since the connection is manually closed. +15ms
  ioredis:redis status[127.0.0.1:63793 (ioredis-cluster(subscriber))]: close -> end +1ms
  ioredis:cluster:subscriber selected a subscriber localhost:63796 +2ms
  ioredis:redis status[localhost:63796 (ioredis-cluster(subscriber))]: [empty] -> wait +1ms
  ioredis:redis status[127.0.0.1:63794]: connecting -> connect +6ms
  ioredis:redis write command[127.0.0.1:63794]: 0 -> auth([ 'testusr', testpwd ]) +0ms
  ioredis:redis write command[127.0.0.1:63794]: 0 -> info([]) +2ms
  ioredis:redis status[127.0.0.1:63795 (ioredis-cluster(refresher))]: ready -> close +1ms
  ioredis:connection skip reconnecting since the connection is manually closed. +11ms
  ioredis:redis status[127.0.0.1:63795 (ioredis-cluster(refresher))]: close -> end +1ms
  ioredis:redis status[127.0.0.1:63794]: connect -> ready +6ms
  ioredis:connection send 1 commands in offline queue +8ms
  ioredis:redis write command[127.0.0.1:63794]: 0 -> cluster([ 'info' ]) +2ms
  ioredis:cluster status: connect -> ready +27ms
[2022-05-22 13:56:37] [info] - Connected to Redis (3 master nodes, 3 replica nodes)

The last trace is mine, not from ioredis. I get the cluster info about masters and nodes to print it. Here we have no READONLY so far.

1st test: Now I send some commands using that connection created

The log:

  ioredis:redis status[localhost:63796]: wait -> connecting +1m
  ioredis:redis queue command[localhost:63796]: 0 -> zrangebyscore([ 'prefix:{hash}:mykey', '1', '+inf' ]) +1ms
  ioredis:redis status[127.0.0.1:63796]: connecting -> connect +4ms
  ioredis:redis write command[127.0.0.1:63796]: 0 -> auth([ 'testusr', testpwd ]) +0ms
  ioredis:redis write command[127.0.0.1:63796]: 0 -> info([]) +2ms
  ioredis:redis status[127.0.0.1:63796]: connect -> ready +4ms
  ioredis:connection set the connection to readonly mode +1m
  ioredis:connection send 1 commands in offline queue +1ms
  ioredis:redis write command[127.0.0.1:63796]: 0 -> zrangebyscore([ 'prefix:{hash}:mykey', '1', '+inf' ]) +1ms
  ioredis:redis write command[127.0.0.1:63796]: 0 -> readonly([]) +1ms
  ioredis:cluster command zrangebyscore is moved to 10.20.30.221:6379 +1m
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +0ms
  ioredis:redis write command[127.0.0.1:63796]: 0 -> zrangebyscore([ 'prefix:{hash}:mykey', '1', '+inf' ]) +4ms
  ioredis:cluster refreshing slot caches... (triggered by MOVED error) +2ms
  ioredis:cluster getting slot cache from localhost:63792 +1ms
  ioredis:redis status[localhost:63792 (ioredis-cluster(refresher))]: [empty] -> wait +3ms
  ioredis:redis status[localhost:63792 (ioredis-cluster(refresher))]: wait -> connecting +0ms
  ioredis:redis queue command[localhost:63792 (ioredis-cluster(refresher))]: 0 -> cluster([ 'slots' ]) +1ms
  ioredis:redis write command[127.0.0.1:63794]: 0 -> set([ 'prefix:{hash}:lock:key1', '1653220659708', 'EX', '1800', 'NX' ]) +2ms
  ioredis:redis status[127.0.0.1:63792 (ioredis-cluster(refresher))]: connecting -> connect +2ms
  ioredis:redis write command[127.0.0.1:63792 (ioredis-cluster(refresher))]: 0 -> auth([ 'testusr', testpwd ]) +1ms
  ioredis:redis status[127.0.0.1:63792 (ioredis-cluster(refresher))]: connect -> ready +1ms
  ioredis:connection set the connection name [ioredis-cluster(refresher)] +15ms
  ioredis:connection send 1 commands in offline queue +1ms
  ioredis:redis write command[127.0.0.1:63792 (ioredis-cluster(refresher))]: 0 -> cluster([ 'slots' ]) +2ms
  ioredis:redis write command[127.0.0.1:63792 (ioredis-cluster(refresher))]: 0 -> client([ 'setname', 'ioredis-cluster(refresher)' ]) +1ms
  ioredis:cluster command set is moved to 10.20.30.221:6379 +12ms
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +1ms
  ioredis:redis write command[127.0.0.1:63794]: 0 -> set([ 'prefix:{hash}:lock:key1', '1653220659708', 'EX', '1800', 'NX' ]) +3ms
  ioredis:cluster refreshing slot caches... (triggered by MOVED error) +2ms
  ioredis:cluster cluster slots result count: 3 +1ms
  ioredis:cluster NAT mapping 10.20.30.220:6379 -> { host: 'localhost', port: 63791 } +0ms
  ioredis:cluster NAT mapping 10.20.30.224:6379 -> { host: 'localhost', port: 63795 } +1ms
  ioredis:cluster cluster slots result [0]: slots 0~5460 served by [ 'localhost:63791', 'localhost:63795' ] +1ms
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +1ms
  ioredis:cluster NAT mapping 10.20.30.225:6379 -> { host: 'localhost', port: 63796 } +1ms
  ioredis:cluster cluster slots result [1]: slots 5461~10922 served by [ 'localhost:63792', 'localhost:63796' ] +1ms
  ioredis:cluster NAT mapping 10.20.30.223:6379 -> { host: 'localhost', port: 63794 } +1ms
  ioredis:cluster NAT mapping 10.20.30.222:6379 -> { host: 'localhost', port: 63793 } +3ms
  ioredis:cluster cluster slots result [2]: slots 10923~16383 served by [ 'localhost:63794', 'localhost:63793' ] +0ms
  ioredis:cluster:connectionPool Reset with [
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63791, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63795, readOnly: true },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63792, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63796, readOnly: true },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63794, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63793, readOnly: true }
  ioredis:cluster:connectionPool ] +1m
  ioredis:cluster command set is moved to 10.20.30.221:6379 +29ms
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +1ms
  ioredis:redis status[localhost:63791]: wait -> connecting +45ms
  ioredis:redis queue command[localhost:63791]: 0 -> set([ 'prefix:{hash}:lock:key1', '1653220659708', 'EX', '1800', 'NX' ]) +0ms
  ioredis:cluster refreshing slot caches... (triggered by MOVED error) +5ms
  ioredis:cluster getting slot cache from localhost:63794 +1ms
  ioredis:redis status[localhost:63794 (ioredis-cluster(refresher))]: [empty] -> wait +2ms
  ioredis:redis status[localhost:63794 (ioredis-cluster(refresher))]: wait -> connecting +1ms
  ioredis:redis queue command[localhost:63794 (ioredis-cluster(refresher))]: 0 -> cluster([ 'slots' ]) +1ms
  ioredis:redis status[127.0.0.1:63792 (ioredis-cluster(refresher))]: ready -> close +1ms
  ioredis:connection skip reconnecting since the connection is manually closed. +56ms
  ioredis:redis status[127.0.0.1:63792 (ioredis-cluster(refresher))]: close -> end +2ms
  ioredis:redis status[127.0.0.1:63791]: connecting -> connect +3ms
  ioredis:redis write command[127.0.0.1:63791]: 0 -> auth([ 'testusr', testpwd ]) +2ms
  ioredis:redis write command[127.0.0.1:63791]: 0 -> info([]) +2ms
  ioredis:redis status[127.0.0.1:63794 (ioredis-cluster(refresher))]: connecting -> connect +1ms
  ioredis:redis write command[127.0.0.1:63794 (ioredis-cluster(refresher))]: 0 -> auth([ 'testusr', testpwd ]) +0ms
  ioredis:redis status[127.0.0.1:63794 (ioredis-cluster(refresher))]: connect -> ready +1ms
  ioredis:connection set the connection name [ioredis-cluster(refresher)] +10ms
 ioredis:connection send 1 commands in offline queue +1ms
  ioredis:redis write command[127.0.0.1:63794 (ioredis-cluster(refresher))]: 0 -> cluster([ 'slots' ]) +1ms
  ioredis:redis write command[127.0.0.1:63794 (ioredis-cluster(refresher))]: 0 -> client([ 'setname', 'ioredis-cluster(refresher)' ]) +1ms
  ioredis:redis status[127.0.0.1:63791]: connect -> ready +2ms
  ioredis:connection send 1 commands in offline queue +4ms
  ioredis:redis write command[127.0.0.1:63791]: 0 -> set([ 'prefix:{hash}:lock:key1', '1653220659708', 'EX', '1800', 'NX' ]) +1ms
  ioredis:cluster cluster slots result count: 3 +21ms
  ioredis:cluster NAT mapping 10.20.30.220:6379 -> { host: 'localhost', port: 63791 } +1ms
  ioredis:cluster NAT mapping 10.20.30.224:6379 -> { host: 'localhost', port: 63795 } +0ms
  ioredis:cluster cluster slots result [0]: slots 0~5460 served by [ 'localhost:63791', 'localhost:63795' ] +1ms
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +2ms
  ioredis:cluster NAT mapping 10.20.30.225:6379 -> { host: 'localhost', port: 63796 } +1ms
  ioredis:cluster cluster slots result [1]: slots 5461~10922 served by [ 'localhost:63792', 'localhost:63796' ] +0ms
  ioredis:cluster NAT mapping 10.20.30.223:6379 -> { host: 'localhost', port: 63794 } +1ms
  ioredis:cluster NAT mapping 10.20.30.222:6379 -> { host: 'localhost', port: 63793 } +1ms
  ioredis:cluster cluster slots result [2]: slots 10923~16383 served by [ 'localhost:63794', 'localhost:63793' ] +1ms
  ioredis:cluster:connectionPool Reset with [
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63791, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63795, readOnly: true },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63792, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63796, readOnly: true },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63794, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63793, readOnly: true }
  ioredis:cluster:connectionPool ] +62ms
  ioredis:cluster command set is moved to 10.20.30.221:6379 +26ms
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +0ms
  ioredis:redis write command[127.0.0.1:63794]: 0 -> set([ 'prefix:{hash}:lock:key1', '1653220659708', 'EX', '1800', 'NX' ]) +37ms
  ioredis:cluster refreshing slot caches... (triggered by MOVED error) +2ms
  ioredis:cluster getting slot cache from localhost:63796 +1ms
  ioredis:redis status[localhost:63796 (ioredis-cluster(refresher))]: [empty] -> wait +3ms
  ioredis:redis status[localhost:63796 (ioredis-cluster(refresher))]: wait -> connecting +0ms
  ioredis:redis queue command[localhost:63796 (ioredis-cluster(refresher))]: 0 -> cluster([ 'slots' ]) +1ms
  ioredis:cluster command set is moved to 10.20.30.221:6379 +4ms
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +1ms
  ioredis:redis write command[127.0.0.1:63791]: 0 -> set([ 'prefix:{hash}:lock:key1', '1653220659708', 'EX', '1800', 'NX' ]) +3ms
  ioredis:cluster refreshing slot caches... (triggered by MOVED error) +1ms
  ioredis:redis status[127.0.0.1:63794 (ioredis-cluster(refresher))]: ready -> close +4ms
  ioredis:connection skip reconnecting since the connection is manually closed. +48ms
  ioredis:redis status[127.0.0.1:63794 (ioredis-cluster(refresher))]: close -> end +1ms
  ioredis:cluster command set is moved to 10.20.30.221:6379 +6ms
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +0ms
  ioredis:redis write command[127.0.0.1:63794]: 0 -> set([ 'prefix:{hash}:lock:key1', '1653220659708', 'EX', '1800', 'NX' ]) +4ms
  ioredis:cluster refreshing slot caches... (triggered by MOVED error) +3ms
  ioredis:redis status[127.0.0.1:63796 (ioredis-cluster(refresher))]: connecting -> connect +3ms
  ioredis:redis write command[127.0.0.1:63796 (ioredis-cluster(refresher))]: 0 -> auth([ 'testusr', testpwd ]) +0ms
  ioredis:redis status[127.0.0.1:63796 (ioredis-cluster(refresher))]: connect -> ready +1ms
  ioredis:connection set the connection name [ioredis-cluster(refresher)] +10ms
  ioredis:connection set the connection to readonly mode +1ms
  ioredis:connection send 1 commands in offline queue +0ms
  ioredis:redis write command[127.0.0.1:63796 (ioredis-cluster(refresher))]: 0 -> cluster([ 'slots' ]) +3ms
  ioredis:redis write command[127.0.0.1:63796 (ioredis-cluster(refresher))]: 0 -> client([ 'setname', 'ioredis-cluster(refresher)' ]) +1ms
  ioredis:redis write command[127.0.0.1:63796 (ioredis-cluster(refresher))]: 0 -> readonly([]) +0ms
  ioredis:cluster command set is moved to 10.20.30.221:6379 +8ms
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +1ms
  ioredis:redis write command[127.0.0.1:63794]: 0 -> set([ 'prefix:{hash}:lock:key1', '1653220659708', 'EX', '1800', 'NX' ]) +3ms
  ioredis:cluster refreshing slot caches... (triggered by MOVED error) +2ms
  ioredis:cluster cluster slots result count: 3 +43ms
  ioredis:cluster NAT mapping 10.20.30.220:6379 -> { host: 'localhost', port: 63791 } +0ms
  ioredis:cluster NAT mapping 10.20.30.224:6379 -> { host: 'localhost', port: 63795 } +1ms
  ioredis:cluster cluster slots result [0]: slots 0~5460 served by [ 'localhost:63791', 'localhost:63795' ] +1ms
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +1ms
  ioredis:cluster NAT mapping 10.20.30.225:6379 -> { host: 'localhost', port: 63796 } +1ms
  ioredis:cluster cluster slots result [1]: slots 5461~10922 served by [ 'localhost:63792', 'localhost:63796' ] +0ms
  ioredis:cluster NAT mapping 10.20.30.223:6379 -> { host: 'localhost', port: 63794 } +1ms
  ioredis:cluster NAT mapping 10.20.30.222:6379 -> { host: 'localhost', port: 63793 } +1ms
  ioredis:cluster cluster slots result [2]: slots 10923~16383 served by [ 'localhost:63794', 'localhost:63793' ] +1ms
  ioredis:cluster:connectionPool Reset with [
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63791, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63795, readOnly: true },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63792, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63796, readOnly: true },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63794, readOnly: false },
  ioredis:cluster:connectionPool   { host: 'localhost', port: 63793, readOnly: true }
  ioredis:cluster:connectionPool ] +98ms
  ioredis:cluster command set is moved to 10.20.30.221:6379 +20ms
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +0ms
  ioredis:redis write command[127.0.0.1:63791]: 0 -> set([ 'prefix:{hash}:lock:key1', '1653220659708', 'EX', '1800', 'NX' ]) +72ms
  ioredis:cluster refreshing slot caches... (triggered by MOVED error) +2ms
  ioredis:cluster getting slot cache from localhost:63793 +1ms
  ioredis:redis status[localhost:63793 (ioredis-cluster(refresher))]: [empty] -> wait +2ms
  ioredis:redis status[localhost:63793 (ioredis-cluster(refresher))]: wait -> connecting +1ms
  ioredis:redis queue command[localhost:63793 (ioredis-cluster(refresher))]: 0 -> cluster([ 'slots' ]) +0ms
  ioredis:cluster command set is moved to 10.20.30.221:6379 +3ms
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +0ms
  ioredis:redis write command[127.0.0.1:63791]: 0 -> set([ 'prefix:{hash}:lock:key1', '1653220659708', 'EX', '1800', 'NX' ]) +3ms
  ioredis:cluster refreshing slot caches... (triggered by MOVED error) +2ms
  ioredis:redis status[127.0.0.1:63796 (ioredis-cluster(refresher))]: ready -> close +2ms
  ioredis:connection skip reconnecting since the connection is manually closed. +85ms
  ioredis:redis status[127.0.0.1:63796 (ioredis-cluster(refresher))]: close -> end +1ms
[2022-05-22 13:57:39] [error] - Error: Too many Cluster redirections. Last error: ReplyError: MOVED 6034 10.20.30.221:6379

Here we can see in the first lines that first a "zrangebyscore" command is sent, and just after that a "READONLY" send by ioredis (not me), but it goes after the zrangebyscore, so it gets a "MOVED" response instead of the value (the readonly was not processed yet I suppose).

We can see another thing, the command is sent to "63796" and the MOVED response points to "63792" (after the natmap resolution). But just after that the command is redirected to "63796" again, not where the MOVED pointed. Can this be another bug?

  ioredis:redis write command[127.0.0.1:63796]: 0 -> zrangebyscore([ 'prefix:{hash}:mykey', '1', '+inf' ]) +1ms
  ioredis:redis write command[127.0.0.1:63796]: 0 -> readonly([]) +1ms
  ioredis:cluster command zrangebyscore is moved to 10.20.30.221:6379 +1m
  ioredis:cluster NAT mapping 10.20.30.221:6379 -> { host: 'localhost', port: 63792 } +0ms
  ioredis:redis write command[127.0.0.1:63796]: 0 -> zrangebyscore([ 'prefix:{hash}:mykey', '1', '+inf' ]) +4ms

Anyway, reading all this log, we see just 2 READONLY commands, both sent to the same slave node "63796". After the first READONLY there are other commands sent to other nodes but they are SET commands and are sent to masters so nothing here.

2nd test: I'll send READONLY just after connecting Redis and the send the same commands

I add this code just when I receive the "ready" event of the connection to the cluster:

        // I get the slave nodes

        const slaves = this._redisConnection.nodes('slave');

        // Issue READONLY to them
        Promise.all(slaves.map((node) => node.readonly())).then().catch(e => {
            console.error(e);
        });

At the end of the connection log we see the READONLY commands sent to each slave node (I just put here the last lines, not all the log again, the previous lines are the same as the connection log in the 1st test):

  ioredis:cluster:connectionPool Remove 127.0.0.1:63796 from the pool +3ms
  ioredis:cluster:connectionPool Connecting to localhost:63791 as master +0ms
  ioredis:redis status[localhost:63791]: [empty] -> wait +2ms
  ioredis:cluster:connectionPool Connecting to localhost:63795 as slave +2ms
  ioredis:redis status[localhost:63795]: [empty] -> wait +1ms
  ioredis:cluster:connectionPool Connecting to localhost:63792 as master +1ms
  ioredis:redis status[localhost:63792]: [empty] -> wait +3ms
  ioredis:cluster:connectionPool Connecting to localhost:63796 as slave +3ms
  ioredis:redis status[localhost:63796]: [empty] -> wait +1ms
  ioredis:cluster:connectionPool Connecting to localhost:63794 as master +1ms
  ioredis:redis status[localhost:63794]: [empty] -> wait +2ms
  ioredis:cluster:connectionPool Connecting to localhost:63793 as slave +1ms
  ioredis:redis status[localhost:63793]: [empty] -> wait +1ms
  ioredis:cluster status: connecting -> connect +65ms
  ioredis:redis status[localhost:63792]: wait -> connecting +2ms
  ioredis:redis queue command[localhost:63792]: 0 -> cluster([ 'info' ]) +0ms
  ioredis:cluster:subscriber subscriber has left, selecting a new one... +119ms
  ioredis:redis status[127.0.0.1:63792 (ioredis-cluster(subscriber))]: wait -> close +2ms
  ioredis:connection skip reconnecting since the connection is manually closed. +14ms
  ioredis:redis status[127.0.0.1:63792 (ioredis-cluster(subscriber))]: close -> end +1ms
  ioredis:cluster:subscriber selected a subscriber localhost:63793 +3ms
  ioredis:redis status[localhost:63793 (ioredis-cluster(subscriber))]: [empty] -> wait +1ms
  ioredis:redis status[127.0.0.1:63792]: connecting -> connect +13ms

  ioredis:redis write command[127.0.0.1:63795]: 0 -> readonly([]) +1ms
  ioredis:redis status[127.0.0.1:63793]: connect -> ready +1ms
  ioredis:connection set the connection to readonly mode +4ms
  ioredis:connection send 1 commands in offline queue +0ms
  ioredis:redis write command[127.0.0.1:63793]: 0 -> readonly([]) +2ms
  ioredis:redis write command[127.0.0.1:63796]: 0 -> readonly([]) +1ms
  ioredis:redis write command[127.0.0.1:63793]: 0 -> readonly([]) +2ms

Don't know why there is 2 "write command" to the 63793 node... but at least a command is sent to every node.

And now, if I send commands to Redis, all goes flawlessly, without printing lots of lines in the log, I receive the value in the key directlyin the slave node, not a MOVED answer.

I'm not an expert in Redis, and just starting using ioredis, so maybe I'm doing something wrong. Is there another way to check what commands are sent to Redis? Some log in the Redis servers?

Maybe the best could be to test it with a minimal test code, I'm testing with my project application code. And maybe include a test case with Redis.Cluster, in addition to the "normal Redis" one?

Thanks for your support and for this library!

pandzicivan commented 7 months ago

@ligreman did You ever manage to solve it? I am having the same issue...

ligreman commented 7 months ago

@ligreman did You ever manage to solve it? I am having the same issue...

Yes, the workaround I described in my last comment (2nd test) works, and it is what I'm currently using. I write here againt the workaround to make it clearer:

Send a READONLY command just after connecting to Redis to every slave in the cluster. Just with that, ioredis works fine. Example of my NodeJS code:

        // Connect to Redis Cluster as usual
        // Then get the slave nodes
        const slaves = this._redisConnection.nodes('slave');

        // Issue a READONLY command to every of them
        Promise.all(slaves.map((node) => node.readonly())).then().catch(e => {
            console.error(e);
        });