redis / node-redis

Redis Node.js client
https://redis.js.org/
MIT License
16.86k stars 1.87k forks source link

Node-redis Client Refuses to Buffer Commands Upon An ECONNRESET Error From Socket Connection #2785

Open sohhaofeng1989 opened 2 months ago

sohhaofeng1989 commented 2 months ago

Description

Recently, we noticed that the node-redis client refuses to buffer redis commands upon encountering an ECONNRESET Error from the underlying socket connection. This is demonstrate-able by doing the following.

  1. Set up a node-redis client that connects to a locally spun-up redis-server
  2. Perform the following hardcoded 10s sleep/delay before a SET command gets executed
    try {
        console.log(
          'Execute iptables command to reject TCP connections with RST: iptables -A OUTPUT -p tcp --dport 6379 -j REJECT --reject-with tcp-reset\n' +
            'Execute iptables command to drop packets received on TCP connection (simulating request timeouts): iptables -A OUTPUT -p tcp --dport 6379 -j DROP\n',
          'Execute iptables command to simulate host unreachable (icmp-host-unreachable): iptables -A OUTPUT -p tcp --dport 6379 -j REJECT --reject-with icmp-host-unreachable',
        )
       await new Promise((resolve) => setTimeout(resolve, 10000))
       await this.client.set(key, value, setConfig)
    } catch (error) {
      console.log(error)
    }
  3. Execute the script/server and once the 10s timer starts, run the following iptables command to simulate an ECONNRESET.
    iptables -A OUTPUT -p tcp --dport 6379 -j REJECT --reject-with tcp-reset
  4. Wait for the 10s to be up and notice that the SET command did not get buffered into the offline queue (to be re-sent once the socket connection gets re-established). Instead, it throws an ECONNRESET error right away like the below
    2024-07-05 19:40:22     error: {
    2024-07-05 19:40:22       "errno": -104,
    2024-07-05 19:40:22       "code": "ECONNRESET",
    2024-07-05 19:40:22       "syscall": "read",
    2024-07-05 19:40:22       "name": "Error",
    2024-07-05 19:40:22       "message": "read ECONNRESET",
    2024-07-05 19:40:22       "stack":
    2024-07-05 19:40:22           Error: read ECONNRESET
    2024-07-05 19:40:22               at TCP.onStreamRead (node:internal/stream_base_commons:217:20)
    2024-07-05 19:40:22               at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
    2024-07-05 19:40:22     }
  5. Corresponding logs that were captured by subscribing to the error Connection Events.
    2024-07-05 19:40:22 [11:40:22.131] INFO (410): redis store client disconnected Error: read ECONNRESET
    2024-07-05 19:40:22     status: "INFO"
    2024-07-05 19:40:22     scope: "RedisStore"
    2024-07-05 19:40:22 [11:40:22.128] INFO (410): redis store client attempting to reconnect to redis server
    2024-07-05 19:40:22     status: "INFO"
    2024-07-05 19:40:22     scope: "RedisStore"
    2024-07-05 19:40:22 [11:40:22.134] INFO (410): redis store client disconnected Error: connect ECONNREFUSED 172.18.0.3:6379
    2024-07-05 19:40:22     status: "INFO"
    2024-07-05 19:40:22     scope: "RedisStore"
    2024-07-05 19:40:22 [11:40:22.136] INFO (410): redis store client attempting to reconnect to redis server
    2024-07-05 19:40:22     status: "INFO"
    2024-07-05 19:40:22     scope: "RedisStore"
    2024-07-05 19:40:22 [11:40:22.137] INFO (410): redis store client disconnected Error: connect ECONNREFUSED 172.18.0.3:6379
    2024-07-05 19:40:22     status: "INFO"
    2024-07-05 19:40:22     scope: "RedisStore"

It's noteworthy to point out 3 things

  1. node-redis does buffer redis commands for all other socket connection errors like ECONNREFUSED , EADDRNOTAVAIL etc. Only ECONNRESET doesn't trigger the offline queue buffering mechanism
  2. Just to make sure that we ain't trippin or this isn't some Redis-prescribed agreement to handle ECONNRESET Errors. We did a side-by-side comparison with the other popular redis client library ioredis and found out that ioredis does buffer redis commands into its offline queue upon ECONNRESET.
  3. Some context here is that we're using AWS Elasticache Serverless for Redis. From time to time, AWS does some Service Updates that are transparent to its users. These Service Updates runs upgrades on their Elasticache Serverless proxies every 1 to 3 weeks, and whenever the proxy is upgraded, there will be disconnections that result in the aforementioned ECONNRESET errors. Whenever this happens, we sometimes drop user traffic which is definitely undesirable.

Node.js Version

20.8.1

Redis Server Version

7.0.7-alpine

Node Redis Version

4.6.12

Platform

Alpine Linux

Logs

2024-07-05 19:40:22     error: {
2024-07-05 19:40:22       "errno": -104,
2024-07-05 19:40:22       "code": "ECONNRESET",
2024-07-05 19:40:22       "syscall": "read",
2024-07-05 19:40:22       "name": "Error",
2024-07-05 19:40:22       "message": "read ECONNRESET",
2024-07-05 19:40:22       "stack":
2024-07-05 19:40:22           Error: read ECONNRESET
2024-07-05 19:40:22               at TCP.onStreamRead (node:internal/stream_base_commons:217:20)
2024-07-05 19:40:22               at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
2024-07-05 19:40:22     }
leibale commented 1 month ago

I'll need to debug a bit more, but for now I'll at least explain what should happen: node-redis have 2 queues: waiting to be sent, and waiting for reply. When you execute a command it starts in the "waiting to be sent" queue, and after writing it to the socket (using nodejs socket.write) it moves to the "waiting for reply" queue. When a socket drops, the client rejects all "waiting for reply" commands since they might have been executed and might not, and then if the "offline queue" is enabled (see disabledOfflineQueue), it'll keep the "waiting to be sent" queue, and executed them once a new connection established. maybe socket.write does not write the data immediately in some cases (?)