Closed JoeNyland closed 1 year ago
@JoeNyland
Interestingly I came across a very similar error today (I am also running an aws lambda, different redis host though). I took a look at the redis.status property between invocations. The first time it runs with a status: 'connecting' however, on subsequent invocations (after I had called .disconnect()) the status was 'end'. I think lambdas are caching some state between runs. Just a thought, best of luck with your issue.
@philipgloyne The behaviour you describe (I think) is expected. If your Lambda invocation ends before the connection(s) to the Redis node(s) are closed, Lambda will freeze the state at that point in time and thaw it out on the next invocation of the Lambda.
This is exactly the reason why we create a new connection at the start of the function, so that every Lambda invocation uses a new (not the cached and possibly closed) connection. This is shown in the example above.
I'd like to point out that we have been able to recreate this issue outside of AWS. This was on a local Redis 3.2.10 cluster running under Docker using the above script.
This means that the issue is not isolated to AWS Lambda or Elasticache and is either with our code or ioredis.
I've also been able to reproduce this problem but only in AWS.
I believe the problem is related to the offline queue. The error originates when the close()
method is called from the event_handler. The error eventually bubbles up in the redis class when flushQueue()
is executed with a non-empty offline queue.
The commandQueue also occasionally causes this problem but it's much less frequent.
@elliotttf I also encountered how to solve
@ipengyo Sorry, I don't see a solution here. Have you found a solution to this problem? If so, would you mind sharing it?
In my case, there were a couple of things that helped to mitigate the error (although it's still not completely gone):
I also meet this problem, my original code is almost the same as @JoeNyland
setInterval(function() {
var cluster = new Redis.Cluster(...);
cluster.nodes('all').forEach(function(node) {
node.bgsave(...);
});
}, timeInterval);
However, I tried something like this:
setInterval(function() {
var cluster = new redis.Cluster(...);
// delay the command for 3 seconds
setTimeout(function() {
cluster.nodes('all').forEach(function(node) {
node.bgsave(...);
});
}, 3 * 1000);
}, timeInterval);
Then the code never fails, so I guess there might be some initialization jobs inside new Redis.Cluster(...)
, and it requires some time to complete, so immediate command execution might fail (not quite sure, just my guess).
Probably a good idea to add an event handler and wait for cluster to emit 'ready' before trying to use the cluster. Waiting an arbitrary amount of time may wait most of the time, but waiting until the client reports that it is connected to the cluster and that the cluster is ready to received commands would be much more cleaner.
On Mon, Apr 23, 2018 at 6:31 AM, Kelvin Hu notifications@github.com wrote:
I also meet this problem, my original code is almost the same as @JoeNyland https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JoeNyland&d=DwMFaQ&c=q3cDpHe1hF8lXU5EFjNM_A&r=N1Msa0ZexCqrdtWLgqlbMZlNrFKF4YmyP5otZCNJ7mmERCt9HPx5t646YC5_jvy8&m=VCduhTGKuHv--KC8VF-13L6lAgwTPLMuUA03k9Sj3Eo&s=5Inpdg1nCYNvrJ1Z7C8DZPrNGZtUPl5YLs7Ogi_ZPEs&e=
setInterval(function() { var cluster = new Redis.Cluster(...); cluster.nodes('all').forEach(function(node) { node.bgsave(...); }); }, timeInterval);
However, I tried something like this:
setInterval(function() { var cluster = new redis.Cluster(...);
// delay the command for 3 seconds setTimeout(function() { cluster.nodes('all').forEach(function(node) { node.bgsave(...); }); }, 3 * 1000);
}, timeInterval);
Then the code never fails, so I guess there might be some initialization jobs inside new Redis.Cluster(...), and it requires some time to complete, so immediate command execution might fail (not quite sure, just my guess).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_luin_ioredis_issues_597-23issuecomment-2D383543860&d=DwMFaQ&c=q3cDpHe1hF8lXU5EFjNM_A&r=N1Msa0ZexCqrdtWLgqlbMZlNrFKF4YmyP5otZCNJ7mmERCt9HPx5t646YC5_jvy8&m=VCduhTGKuHv--KC8VF-13L6lAgwTPLMuUA03k9Sj3Eo&s=f4dSpU2NWE7sal6B_ZVsyLWRnhEVGEcu57jVKNBNNmM&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AVxAa7YXiwFSHZ8ucQOq-5FIWYxKx-5FD7FVks5trbupgaJpZM4SkXZf&d=DwMFaQ&c=q3cDpHe1hF8lXU5EFjNM_A&r=N1Msa0ZexCqrdtWLgqlbMZlNrFKF4YmyP5otZCNJ7mmERCt9HPx5t646YC5_jvy8&m=VCduhTGKuHv--KC8VF-13L6lAgwTPLMuUA03k9Sj3Eo&s=j0jWfi7UGL935PaXFMZCc5IDndX5Mg-SFXLNv0YQeOw&e= .
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed.
I'm commenting here to confirm that this issue is still cropping up for us.
I'm not really sure why the bot above adds a "wontfix" label to an issue that hasn't had any recent activity 🤔
Ioredis just does not work on AWS Lambda!
Having this issue as well. We are running on AWS, but not using Lambda.
@luin I see that this has a milestone set of v4, but this issue is still open. Is this issue meant to be fixed in v4.0.0 and you want us to test it or is it still being worked on? We're hitting this issue quite a lot at the moment and we're desperate to find a fix.
Sorry for the late response. I omitted this issue. My apologies.
Running conn.nodes
immediately after conn = new Redis.Cluster()
doesn't work since ioredis haven't fetched the slot info from the Redis cluster. The recommended way to do this is to call Cluster#nodes()
after the ready event:
conn.on('ready', () => {
conn.nodes("master").map((node) => {
return node.keys("*:*");
})
})
Does this solve your issue @JoeNyland ?
Hi @luin.
I work with @JoeNyland so I'll answer while he isn't here.
We made the change you just posted a few days ago and while it's reduced the frequency of the Connection is closed.
error it's not stopped it from happening.
Our current method looks like:
const nodes = clusterConn.nodes('master');
Promise.all(nodes.map((
return node.keys('*:*');
})).then((responses) => {
...
})
@kierxn Could you enable showFriendlyErrorStack
option to see which line of the code cause the Connection is closed.
error?:
new Redis.Cluster(["cluster.id.clustercfg.euw1.cache.amazonaws.com"], {redisOptions: {showFriendlyErrorStack: true}});
@luin Thanks for your input so far. We've had the following in place today:
Redis.Cluster
instanceshowFriendlyErrorStack
optionWe've got the following output from two errors in the last hour:
2018-09-11T16:34:02.317Z 7db8e0df-b5e0-11e8-9f45-27657abca6d1 Error: Connection is closed.
at Socket.g (events.js:292:16)
at emitNone (events.js:91:20)
at Socket.emit (events.js:185:7)
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1073:10)
To confirm, we're also logging out the current node
connection when this issue is hit and it looks like so:
{
"options": {
"retryStrategy": null,
"readOnly": false,
"host": "cluster-id-0001-001.sdbngd.0001.euw1.cache.amazonaws.com",
"port": 6379,
"key": "cluster-id-0001-001.sdbngd.0001.euw1.cache.amazonaws.com:6379",
"showFriendlyErrorStack": true,
"lazyConnect": true,
"family": 4,
"connectTimeout": 10000,
"keepAlive": 0,
"noDelay": true,
"connectionName": null,
"sentinels": null,
"name": null,
"role": "master",
"password": null,
"db": 0,
"dropBufferSupport": false,
"enableOfflineQueue": true,
"enableReadyCheck": true,
"autoResubscribe": true,
"autoResendUnfulfilledCommands": true,
"keyPrefix": "",
"reconnectOnError": null,
"stringNumbers": false
},
"domain": null,
"_events": {},
"_eventsCount": 5,
"scriptsSet": {},
"commandQueue": {
"_head": 0,
"_tail": 0,
"_capacityMask": 3,
"_list": [
null,
null,
null,
null
]
},
"offlineQueue": {
"_head": 2,
"_tail": 2,
"_capacityMask": 3,
"_list": [
null,
null,
null,
null
]
},
"connector": {
"options": {
"retryStrategy": null,
"readOnly": false,
"host": "cluster-id-0001-001.sdbngd.0001.euw1.cache.amazonaws.com",
"port": 6379,
"key": "cluster-id-0001-001.sdbngd.0001.euw1.cache.amazonaws.com:6379",
"showFriendlyErrorStack": true,
"lazyConnect": true,
"family": 4,
"connectTimeout": 10000,
"keepAlive": 0,
"noDelay": true,
"connectionName": null,
"sentinels": null,
"name": null,
"role": "master",
"password": null,
"db": 0,
"dropBufferSupport": false,
"enableOfflineQueue": true,
"enableReadyCheck": true,
"autoResubscribe": true,
"autoResendUnfulfilledCommands": true,
"keyPrefix": "",
"reconnectOnError": null,
"stringNumbers": false
},
"connecting": false,
"stream": {
"connecting": false,
"_hadError": false,
"_handle": null,
"_parent": null,
"_host": "cluster-id-0001-001.sdbngd.0001.euw1.cache.amazonaws.com",
"_readableState": {
"objectMode": false,
"highWaterMark": 16384,
"buffer": {
"head": null,
"tail": null,
"length": 0
},
"length": 0,
"pipes": null,
"pipesCount": 0,
"flowing": true,
"ended": true,
"endEmitted": true,
"reading": false,
"sync": false,
"needReadable": false,
"emittedReadable": false,
"readableListening": false,
"resumeScheduled": false,
"defaultEncoding": "utf8",
"ranOut": false,
"awaitDrain": 0,
"readingMore": false,
"decoder": null,
"encoding": null
},
"readable": false,
"domain": null,
"_events": {
"_socketEnd": [
null
]
},
"_eventsCount": 5,
"_writableState": {
"objectMode": false,
"highWaterMark": 16384,
"needDrain": false,
"ending": true,
"ended": true,
"finished": true,
"decodeStrings": false,
"defaultEncoding": "utf8",
"length": 0,
"writing": false,
"corked": 0,
"sync": true,
"bufferProcessing": false,
"writecb": null,
"writelen": 0,
"bufferedRequest": null,
"lastBufferedRequest": null,
"pendingcb": 0,
"prefinished": true,
"errorEmitted": false,
"bufferedRequestCount": 0,
"corkedRequestsFree": {
"next": null,
"entry": null
}
},
"writable": false,
"allowHalfOpen": false,
"destroyed": true,
"_bytesDispatched": 0,
"_sockname": null,
"_pendingData": null,
"_pendingEncoding": "",
"server": null,
"_server": null,
"_idleTimeout": -1,
"_idleNext": null,
"_idlePrev": null,
"_idleStart": 5699454,
"_consuming": true,
"_peername": {
"address": "10.1.1.131",
"family": "IPv4",
"port": 6379
}
}
},
"retryAttempts": 0,
"status": "end",
"condition": {
"select": 0,
"auth": null,
"subscriber": false
},
"stream": {
"connecting": false,
"_hadError": false,
"_handle": null,
"_parent": null,
"_host": "cluster-id-0001-001.sdbngd.0001.euw1.cache.amazonaws.com",
"_readableState": {
"objectMode": false,
"highWaterMark": 16384,
"buffer": {
"head": null,
"tail": null,
"length": 0
},
"length": 0,
"pipes": null,
"pipesCount": 0,
"flowing": true,
"ended": true,
"endEmitted": true,
"reading": false,
"sync": false,
"needReadable": false,
"emittedReadable": false,
"readableListening": false,
"resumeScheduled": false,
"defaultEncoding": "utf8",
"ranOut": false,
"awaitDrain": 0,
"readingMore": false,
"decoder": null,
"encoding": null
},
"readable": false,
"domain": null,
"_events": {
"_socketEnd": [
null
]
},
"_eventsCount": 5,
"_writableState": {
"objectMode": false,
"highWaterMark": 16384,
"needDrain": false,
"ending": true,
"ended": true,
"finished": true,
"decodeStrings": false,
"defaultEncoding": "utf8",
"length": 0,
"writing": false,
"corked": 0,
"sync": true,
"bufferProcessing": false,
"writecb": null,
"writelen": 0,
"bufferedRequest": null,
"lastBufferedRequest": null,
"pendingcb": 0,
"prefinished": true,
"errorEmitted": false,
"bufferedRequestCount": 0,
"corkedRequestsFree": {
"next": null,
"entry": null
}
},
"writable": false,
"allowHalfOpen": false,
"destroyed": true,
"_bytesDispatched": 0,
"_sockname": null,
"_pendingData": null,
"_pendingEncoding": "",
"server": null,
"_server": null,
"_idleTimeout": -1,
"_idleNext": null,
"_idlePrev": null,
"_idleStart": 5699454,
"_consuming": true,
"_peername": {
"address": "10.1.1.131",
"family": "IPv4",
"port": 6379
}
},
"manuallyClosing": false,
"replyParser": {
"optionReturnBuffers": true,
"optionStringNumbers": false,
"name": "javascript",
"offset": 0,
"buffer": null,
"bigStrSize": 0,
"bigOffset": 0,
"totalChunkSize": 0,
"bufferCache": [],
"arrayCache": [],
"arrayPos": []
},
"prevCondition": {
"select": 0,
"auth": null,
"subscriber": false
}
}
Connection is closed.
happens when a command has been sent to a node
whose status is "end".
There are several reasons that a node may come to the "end" status. The most common one is when a failover happens in the cluster (a master is down and a slave become the master). For your case, a KEYS
command may block the Redis server for seconds, which may be considered to be down, and a failover process will be triggered.
Thanks for that. I've migrated the code to use the scanStream()
function on each node instead of keys()
and we're still hitting the same intermittent Connection is closed
error. Here's an example:
const Redis = require('ioredis');
const collectNodeKeys = (node) => {
const stream = node.scanStream({
match: '*:*',
count: process.env.SCAN_PAGE_SIZE || 10,
});
return new Promise((resolve) => {
let keys = [];
stream.on('data', (page) => {
keys = keys.concat(page);
});
stream.on('end', () => {
resolve(keys);
});
});
};
const collectClusterKeys = (cluster) => {
return new Promise((resolve, reject) => {
cluster.on('ready', () => {
const nodes = cluster.nodes('master');
Promise
.all(nodes.map(collectNodeKeys))
.then((responses) => {
let r = [];
for (let response of responses) {
r = r.concat(response);
}
resolve(r);
})
.catch(reject);
});
});
};
const interval = setInterval(() => {
const cluster = new Redis.Cluster([process.env.REDIS_HOST], {
slotsRefreshTimeout: parseInt(process.env.REDIS_SLOTS_REFRESH_TIMEOUT) || 1000,
enableReadyCheck: true,
redisOptions: {
showFriendlyErrorStack: true,
},
});
collectClusterKeys(cluster).then((keys) => {
console.log(`Collected ${keys.length} keys`);
cluster.disconnect();
}).catch((e) => {
clearInterval(interval);
cluster.disconnect();
console.log(e);
})
}, 1000);
Collected 1386 keys
Collected 1387 keys
... snip ...
Collected 1423 keys
/app/node_modules/bluebird/js/release/async.js:61
fn = function () { throw arg; };
^
Error: Connection is closed.
at ScanStream.Readable.read (_stream_readable.js:348:10)
at resume_ (_stream_readable.js:737:12)
at _combinedTickCallback (internal/process/next_tick.js:80:11)
at process._tickCallback (internal/process/next_tick.js:104:9)
In addition to the above, if I monitor CLUSTER NODES
whilst recreating this issue, I'm not seeing any failover events in the cluster.
@JoeNyland are you setting context.callbackWaitsForEmptyEventLoop = false
? (AWS Lambda Context Object in Node.js - AWS Lambda)
@mfulton26 that's not a good idea. That was not how we solved this. Setting that to false basically makes some redis writes pending until the next http request.
@heri16, you are correct, it is not a good idea. I wasn't proposing that it was. If @JoeNyland or others who are getting intermittent "Connection is closed" errors is setting context.callbackWaitsForEmptyEventLoop = false
then such may be the cause of said errors.
Not sure if this helps (or is even related to the cause), but I started seeing this error when upgrading from ~3.1.4
to ~4.3.0
. None of our infrastructure is on AWS.
@mfulton26 In production, yes we are setting context.callbackWaitsForEmptyEventLoop = false
. With the default of this being set to true
, we couldn't get the Lambda invocation to stop cleanly without a timeout, even though we were closing the connection after the callbacks for all calls had finished. It was as if there was still stuff in Node's event loop that was keeping the process open, even though ioredis' .quit()
resolves.
We are only calling Lambda's callback()
(which ends the invocation) once all callbacks for all Redis calls have completed, so I'm not sure I see the issue with what we are doing.
However, as I said above this issue has been recreated with the default Lambda config of context.callbackWaitsForEmptyEventLoop = true
so it looks like that's not the issue.
My case is using ioredis
while doing integration tests. Every test opens and closes a connection in its run. I'm doing .quit()
at the end of every test and it successfully resolves, but I still get the error for some reason.
The solution for me was what @elliotttf suggested: switching enableOfflineQueue
to false
.
For anyone still having issues with Promise.all()
, I was able to get it working without having to do a setTimeout
seen below.
const Redis = require('ioredis')
let cluster = new Redis.Cluster(...)
let scans = []
cluster.nodes('all').forEach(node => {
scans.push(createMyPromise(node))
})
Promise.all(scans).then(keys => {
console.log('All keys', keys)
})
----------------------------------------------
function createMyPromise(node) {
return new Promise((resolve, reject) => {
let results = []
var stream = node.scanStream({
match: 'sessions:*',
count: 1000
})
stream.on('data', keys => {
keys.forEach(key => {
results.push(key)
})
})
stream.on('end', () => {
resolve(results)
})
})
}
Is there a recommended alternative client to ioredis
that does not have this issue?
It seems that if you quit()/disconnect() before SELECT command finishes the ioredis
will throw Error. In my case this failed some E2E tests that finished so quickly that ioredis
didn't get chance to initialize properly.
Quick hack solution where flushQueue
doesn't reject SELECT commands promises seems to work at the very least. Obviously the real solution would (probably) be somehow handling rejection of these promises in ioredis code that calls select.
We don't seem to getting this error anymore, so I'm going to close this.
We are currently working on a Lambda function, which connects to a Redis 3.2.10 cluster on AWS Elasticache.
This Lambda function will connect to the Redis cluster, run
KEYS
on each master node, collect the responses from each node and return an array of keys. We then publish an SNS message for each key in this array, then close the cluster connection, before the Lambda ends.AWS Lambda freezes and thaws the container in which programs run. So, ideally we would create a connection once then re-use it on every invocation. However, we have found that for the Lambda to end, we must explicitly end the client connection to the cluster as Lambda waits for the Node event loop to empty before the Lambda ends. This is why we create the connection at the start of the function (representing a Lambda invocation) run our queries and then when this completes we attempt to gracefully
.quit()
theRedis.Cluster
connection.I can't share the actual code that we're working on, but I've been able to extract the logic and create a simple example of the issue we're facing:
test.js
Example output:
Why would we be getting the
Connection is closed
rejection error? This feels like a bug, as I think we are going about this in the correct way, but I'm happy to be proved wrong!