Uncaught Exception in tls when using aws lambda with elastic redis cache

mpalomera commented 4 years ago

We are using ioredis with elastic redis cache, most of the time it works fine, however from time to time lambda functions fail with the following error:

@message 2020-08-18T13:35:07.727Z 06faa96f-4fd3-43ce-8b8c-e369ec7b7aae ERROR Uncaught Exception {"errorType":"Error","errorMessage":"connect ETIMEDOUT","code":"ETIMEDOUT","errorno":"ETIMEDOUT","syscall":"connect","stack":["Error: connect ETIMEDOUT"," at TLSSocket.<anonymous> (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)"," at Object.onceWrapper (events.js:421:28)"," at TLSSocket.emit (events.js:315:20)"," at TLSSocket.EventEmitter.emit (domain.js:482:12)"," at TLSSocket.Socket._onTimeout (net.js:481:8)"," at listOnTimeout (internal/timers.js:549:17)"," at processTimers (internal/timers.js:492:7)"]}
@requestId 06faa96f-4fd3-43ce-8b8c-e369ec7b7aae
@timestamp 1597757707732
code ETIMEDOUT
errorMessage connect ETIMEDOUT
errorno ETIMEDOUT
errorType Error
stack.0 Error: connect ETIMEDOUT
stack.1 at TLSSocket.<anonymous> (/opt/nodejs/node_modules/ioredis/built/redis/index.js:285:37)
stack.2 at Object.onceWrapper (events.js:421:28)
stack.3 at TLSSocket.emit (events.js:315:20)
stack.4 at TLSSocket.EventEmitter.emit (domain.js:482:12)
stack.5 at TLSSocket.Socket._onTimeout (net.js:481:8)
stack.6 at listOnTimeout (internal/timers.js:549:17)
stack.7 at processTimers (internal/timers.js:492:7)
syscall connect

main problem is that since this is an Uncaught Exception the lambda is terminated without calling to the retry strategy, neither to the reconnect in error. Here is our setting:

const redisParams = {
  host: redisHost,
  port: redisPort,
  maxRetriesPerRequest: 4,
  connectTimeout: 60000,
  showFriendlyErrorStack: true,
  retryStrategy(times) {
    console.error('IORedis retry error', { action: 'reconneting' });
    return Math.min(times * 30, 1000);
  },
  reconnectOnError(error) {
    console.error('IORedis connection error', { error });
    const targetErrors = ['READONLY', 'ETIMEDOUT'];
    for (let i = 0; i < targetErrors.length; i += 1) {
      const targetError = targetErrors[i];
      if (error.message.includes(targetError)) {
        return true;
      }
    }
    console.error('IORedis connection error', { action: 'terminating' });
    return false;
  },
}

const redisClient = new IORedis(rredisParams);
  redisClient.on('error', (error) => {
    console.error('IORedis error', { error });
  });

Notice that the 'error' event is called without problem however neither the reconnectOnError or the retryStrategy are called. Looks like the error occurs asynchronously in the TLS package.

Expected behavior: The exception must be catch and the reconnectOnError be called.

mpalomera commented 4 years ago

More over we enable debugging, here is an example when the error happens (Split due to to comment size limit)

@timestamp	@message
2020-08-17 14:43:10.383
2020-08-17 14:43:10.384

mpalomera commented 4 years ago

@timestamp	@message
2020-08-17 14:43:10.394
2020-08-17 14:44:23.059

mpalomera commented 4 years ago

@timestamp	@message
2020-08-17 14:44:26.863
2020-08-17 14:44:26.863
2020-08-17 14:44:26.865

yurik94 commented 4 years ago

Same "ETIMEDOUT" happening for us on a few thousand provisioned lambdas, it seems to disconnect after about 10 minutes after being provisioned, reconnection slows down the lambda execution of about 5 seconds, so our lambdas are all stuck...

xuliangs commented 4 years ago

Saw this error happens randomly in lambdas also version, "ioredis": "^4.16.2" connecting to an Elasicache Redis cluster's primary node endpoint

sandipmohod commented 4 years ago

Same issue during heroku redis addon failover. The issue happens randomly.

d3s4x commented 4 years ago

Same happens in our Lambda. Most of the time it works fine but time to time randomly happens connection timeout

darrinholst commented 4 years ago

Seeing this in our lambda that's configured in a vpc. The failure rate seems correlated to the concurrent executions...concurrency high = timeouts high. I did switch to standard node-redis and saw the same random timeouts so I'm not sure it has anything to do with ioredis, but would appreciate any report backs of anyone figuring this out.

selected-pixel-jameson commented 3 years ago

I'm also seeing this same issue.

dsandi commented 3 years ago

Same issue

hilkeheremans commented 2 years ago

Same issue here.

mukundrv commented 2 years ago

Is the service client connecting to the right redis host?
Is the redis instance reachable from the network that you are trying to access
Is the redis connection closed properly? I have observed timeout and connection issues without proper closure from containers, Lambdas e
If it is on Cloud, is the server firewall/SG allowing connections from the network u r trying to access.

quyetinforma commented 1 year ago

hey guys, I am facing the same issue in v5.2.3. Is there any workaround or suggestion to switch to an other Redis client? Thanks

ojousima commented 1 year ago

I'm seeing this one too occasionally, commenting here to get notified if someone happens to find out why

evandroduarte commented 1 year ago

Does anyone ever got a fix for this?

redis / ioredis

Uncaught Exception in tls when using aws lambda with elastic redis cache #1183