redis / ioredis

🚀 A robust, performance-focused, and full-featured Redis client for Node.js.
MIT License
14.07k stars 1.19k forks source link

Read Only Error when Same Ioredis Queue is used in two separate services connected with elasticache(Redis) cluster #1835

Open divysts opened 8 months ago

divysts commented 8 months ago

"When using ioredis with an AWS Redis cluster (cluster mode enabled) and configuration endpoint with no primary endpoint, there is no permanent primary endpoint. I have noticed that incase when primary node changes which is fairly common When this configuration enpoint is used to create a Queue, and this Queue with the same topic and prefix is created and processed in two different services connected over an AWS ElastiCache cluster, it causes READONLY You can't write against a read only ? after adding and processing some requests

Is there any way we can catch these primary node change events and update cluster information incase of failover

const Queue = require('bull');
const Redis = require('ioredis');
 const options = {
        dnsLookup: (address, callback) => callback(null, address),
        redisOptions: {
            tls: tls,
            password: 'password',
            failoverDetector: true
        },
        scaleReads: "all",
        retryStrategy: (times) => {
            if (times <= 10) {
                return Math.min(times * 100, 2000);
            } else {
                return null;
            }
        }
    }

   var cluster = new Redis.Cluster(nodes, options);

Use aws cache with clustermode enabled with a cluster configuration Endpoint and try to declare these queues in two different services with add and process after some time you wiil be able to observe Error Read only You can't write against a Read Only Replica

Additional information "ioredis": "^5.3.2",

RESPONSE FROM AWS

Elasticache triggers a failover to switch to new primary and updates the DNS record to point to new primary. However, if the client has cached the old nodes IP and connects to the old primary, you will face the error "READONLY You can't write against a read only replica" as the old primary is no longer accepting writes.

Usual tendency of client application is to resolve the primary endpoint once and cache the primary IP address locally. During the failover the IP address on primary endpoint changes to new primary node. If your client does not detect this change, it will still try to connect to old IP and this will result in prolonged downtime.

To ensure less interruption, you should have proper retry configuration in place at client side to detect failover and pick new IPs. Also, you can consider to reduce the DNS caching TTL values (if used).

divysts commented 8 months ago

@luin can you please have a look at this