Open jdwit opened 8 months ago
I'm having the same problem with ioredis 5.3.2:
I believe it's because every 11 days or so, AWS Serverless closes the underlying connection (resulting in ECONNRESET
) and swaps in a new underlying node available at the same <host>:<port>
. But ioredis assumes this can't happen, hence retryStrategy
is always null in cluster mode (source). Note that I already tried retryStrategy
and reconnectOnError
, both to no avail.
I ran with debug logs enabled and got this:
# GET and SET working fine before this point
2024-03-13T11:48:01.363Z ioredis:connection error: Error: read ECONNRESET
at TLSWrap.onStreamRead (node:internal/stream_base_commons:217:20) {
errno: -104,
code: 'ECONNRESET',
syscall: 'read'
}
2024-03-13T11:48:01.368Z ioredis:redis status[<REDACTED>:6379]: ready -> close
2024-03-13T11:48:01.368Z ioredis:connection skip reconnecting because `retryStrategy` is not a function
2024-03-13T11:48:01.368Z ioredis:redis status[<REDACTED>:6379]: close -> end
2024-03-13T11:48:01.369Z ioredis:cluster:connectionPool Remove <REDACTED>.serverless.use2.cache.amazonaws.com:6379 from the pool
2024-03-13T11:48:01.369Z ioredis:cluster:subscriber subscriber has left, selecting a new one...
2024-03-13T11:48:01.370Z ioredis:redis status[<REDACTED>.serverless.use2.cache.amazonaws.com:6379 (ioredis-cluster(subscriber))]: wait -> close
2024-03-13T11:48:01.370Z ioredis:connection skip reconnecting since the connection is manually closed.
2024-03-13T11:48:01.370Z ioredis:redis status[<REDACTED>.serverless.use2.cache.amazonaws.com:6379 (ioredis-cluster(subscriber))]: close -> end
2024-03-13T11:48:01.370Z ioredis:cluster:subscriber selected a subscriber <REDACTED>.serverless.use2.cache.amazonaws.com:6380
2024-03-13T11:48:01.371Z ioredis:redis status[<REDACTED>.serverless.use2.cache.amazonaws.com:6380 (ioredis-cluster(subscriber))]: wait -> wait
# GET returning null, and SET throwing error past this point
Errors after this point look like:
{
"name": "ReplyError",
"message": "ERR MOVED 7351 <REDACTED>.serverless.use2.cache.amazonaws.com:6379",
"stack": [
"parseError (node_modules/.pnpm/redis-parser@3.0.0/node_modules/redis-parser/lib/parser.js:179:12)",
"parseType (node_modules/.pnpm/redis-parser@3.0.0/node_modules/redis-parser/lib/parser.js:302:14)"
]
}
Due to:
2024-03-14T05:14:47.495Z ioredis:redis write command[<REDACTED>:6380]: 0 -> set([ ... ])
import http from "http";
import net from "net";
import { parse } from "url";
import { getRedisClient } from "your-redis-client-constructor-here";
const hostname = "127.0.0.1";
const port = 4020;
// Needs to be Redis.Cluster
const redisClient = getRedisClient();
const econnresetAllNodes = (clusterClient) => {
const allNodes = Object.values(clusterClient.connectionPool.nodes.all);
for (const node of allNodes) {
if (!node.stream) continue;
// Even though it's named "stream", it's actually a NodeJS socket
node.stream.destroy();
}
};
const server = http.createServer((req, res) => {
const url = parse(req.url, true);
if (req.method === "PUT" && url.pathname === "/connreset") {
req.on("data", (chunk) => {});
req.on("end", () => {
econnresetAllNodes(redisClient);
res.statusCode = 200;
res.setHeader("Content-Type", "text/plain");
res.end("Connection reset successfully\n");
});
} else {
res.statusCode = 404;
res.end("Not Found\n");
}
});
server.listen(port, hostname, () => {
console.log(`Server running at http://${hostname}:${port}/`);
});
Then after DEBUG=ioredis:* node index.js
, do curl -X PUT localhost:4020/connreset
. Same issue results:
ioredis:redis status[<REDACTED>:6379]: ready -> close +3s
ioredis:connection skip reconnecting because `retryStrategy` is not a function +3s
ioredis:redis status[<REDACTED>:6379]: close -> end +1ms
ioredis:cluster:connectionPool Remove <REDACTED>.serverless.use2.cache.amazonaws.com:6379 from the pool +3s
ioredis:cluster:subscriber subscriber has left, selecting a new one... +3s
ioredis:redis status[<REDACTED>.serverless.use2.cache.amazonaws.com:6379 (ioredis-cluster(subscriber))]: wait -> close +0ms
ioredis:connection skip reconnecting since the connection is manually closed. +1ms
ioredis:redis status[<REDACTED>..serverless.use2.cache.amazonaws.com:6379 (ioredis-cluster(subscriber))]: close -> end +0ms
ioredis:cluster:subscriber selected a subscriber <REDACTED>.serverless.use2.cache.amazonaws.com:6380 +1ms
ioredis:redis status[<REDACTED>.serverless.use2.cache.amazonaws.com:6380 (ioredis-cluster(subscriber))]: wait -> wait +1ms
ioredis:cluster status: [empty] -> connecting +0ms
ioredis:cluster resolved hostname <REDACTED>.serverless.use2.cache.amazonaws.com to IP <REDACTED>.serverless.use2.cache.amazonaws.com +3ms
ioredis:cluster:connectionPool Reset with [
ioredis:cluster:connectionPool {
ioredis:cluster:connectionPool port: 6379,
ioredis:cluster:connectionPool host: '<REDACTED>.serverless.use2.cache.amazonaws.com',
ioredis:cluster:connectionPool tls: {},
ioredis:cluster:connectionPool reconnectOnError: [Function: reconnectOnError]
ioredis:cluster:connectionPool }
ioredis:cluster:connectionPool ] +0ms
ioredis:cluster:connectionPool Connecting to <REDACTED>.serverless.use2.cache.amazonaws.com:6379 as master +3ms
ioredis:redis status[<REDACTED>.serverless.use2.cache.amazonaws.com:6379]: wait -> wait +0ms
ioredis:cluster getting slot cache from <REDACTED>.serverless.use2.cache.amazonaws.com:6379 +7ms
ioredis:redis status[<REDACTED>.serverless.use2.cache.amazonaws.com:6379 (ioredis-cluster(refresher))]: wait -> wait +1ms
ioredis:redis status[<REDACTED>.serverless.use2.cache.amazonaws.com:6379 (ioredis-cluster(refresher))]: wait -> connecting +1ms
ioredis:redis queue command[<REDACTED>.serverless.use2.cache.amazonaws.com:6379 (ioredis-cluster(refresher))]: 0 -> cluster([ 'SLOTS' ]) +1ms
ioredis:cluster:subscriber selected a subscriber <REDACTED>.serverless.use2.cache.amazonaws.com:6379 +0ms
ioredis:redis status[<REDACTED>.serverless.use2.cache.amazonaws.com:6379 (ioredis-cluster(subscriber))]: wait -> wait +1ms
ioredis:cluster:subscriber started +0ms
ioredis:redis status[<REDACTED>:6379 (ioredis-cluster(refresher))]: connecting -> connect +62ms
ioredis:redis write command[<REDACTED>:6379 (ioredis-cluster(refresher))]: 0 -> auth([ 'default', '' ]) +2ms
ioredis:redis status[<REDACTED>:6379 (ioredis-cluster(refresher))]: connect -> ready +1ms
ioredis:connection set the connection name [ioredis-cluster(refresher)] +0ms
ioredis:redis write command[<REDACTED>:6379 (ioredis-cluster(refresher))]: 0 -> client([ 'setname', 'ioredis-cluster(refresher)' ]) +0ms
ioredis:connection send 1 commands in offline queue +0ms
ioredis:redis write command[<REDACTED>:6379 (ioredis-cluster(refresher))]: 0 -> cluster([ 'SLOTS' ]) +1ms
ioredis:cluster cluster slots result count: 1 +86ms
ioredis:cluster cluster slots result [0]: slots 0~16383 served by [
'<REDACTED>.serverless.use2.cache.amazonaws.com:6379',
'<REDACTED>.serverless.use2.cache.amazonaws.com:6380'
] +0ms
ioredis:cluster:connectionPool Reset with [
ioredis:cluster:connectionPool {
ioredis:cluster:connectionPool host: '<REDACTED>.serverless.use2.cache.amazonaws.com',
ioredis:cluster:connectionPool port: 6379,
ioredis:cluster:connectionPool readOnly: false
ioredis:cluster:connectionPool },
ioredis:cluster:connectionPool {
ioredis:cluster:connectionPool host: '<REDACTED>.serverless.use2.cache.amazonaws.com',
ioredis:cluster:connectionPool port: 6380,
ioredis:cluster:connectionPool readOnly: true
ioredis:cluster:connectionPool }
ioredis:cluster:connectionPool ] +103ms
ioredis:cluster:connectionPool Connecting to <REDACTED>.serverless.use2.cache.amazonaws.com:6380 as slave +1ms
ioredis:redis status[<REDACTED>.serverless.use2.cache.amazonaws.com:6380]: wait -> wait +33ms
ioredis:cluster status: connecting -> connect +17ms
ioredis:redis status[<REDACTED>.serverless.use2.cache.amazonaws.com:6379]: wait -> connecting +2ms
ioredis:redis queue command[<REDACTED>.serverless.use2.cache.amazonaws.com:6379]: 0 -> cluster([ 'INFO' ]) +0ms
ioredis:redis status[<REDACTED>:6379 (ioredis-cluster(refresher))]: ready -> close +3ms
ioredis:connection skip reconnecting since the connection is manually closed. +39ms
ioredis:redis status[<REDACTED>:6379 (ioredis-cluster(refresher))]: close -> end +0ms
ioredis:redis status[<REDACTED>:6379]: connecting -> connect +14ms
ioredis:redis write command[<REDACTED>:6379]: 0 -> auth([ 'default', '' ]) +1ms
ioredis:redis write command[<REDACTED>:6379]: 0 -> info([]) +0ms
ioredis:redis status[<REDACTED>:6379]: connect -> ready +3ms
ioredis:connection send 1 commands in offline queue +19ms
ioredis:redis write command[<REDACTED>:6379]: 0 -> cluster([ 'INFO' ]) +1ms
ioredis:cluster status: connect -> ready +25ms
On further investigation, we decided to move to node-redis; it seems like this project is now in maintenance mode.
@adamshugar Thanks for the comment, having a similar problem and came to the same conclusion. Wondering if you'd be willing to post your working node-redis client config for serverless elasticache in this thread.
I'm using bullmq which unfortunately has decided not to move: https://github.com/taskforcesh/bullmq/issues/2490
wondering if anyone has solved this one...
Occasionally the following error is reported:
When the app container is restarted (and a new Redis.Cluster client instance is created) the problem does not occur anymore.
The client is created as follows:
I added the
retryDelayOnMoved
setting after this was reported for the first time but this does not seem to have effect.Can this be an issue of the ioredis client? The message
ERRO MOVED 7572 url:post
seems completely valid - see Redis docs section Moved Redirection.Can this be related to this note: Aws Elasticache Clusters with TLS.
Any help is welcome. This issue is hard for us to debug since we are not able to reproduce it.