Open mkdewidar opened 1 year ago
What version of Nchan are you running? What does your Nginx conf look like? (edit out the private details)
Currently on 1.2.10, and our resultant config looks something along the lines of:
...
http {
...
upstream redis_cluster {
nchan_redis_server redis://clustername.clustercfg.region.cache.amazonaws.com:6379;
nchan_redis_storage_mode nostore;
nchan_redis_nostore_fastpublish on;
nchan_redis_subscribe_weights master=1 slave=1000;
}
nchan_shared_memory_size 256M;
nchan_message_timeout 1m;
nchan_message_buffer_length 3;
...
server {
...
location ~ /someurl {
internal;
nchan_subscriber;
nchan_subscriber_first_message newest;
nchan_channel_id $1;
nchan_redis_pass redis_cluster;
nchan_eventsource_ping_interval 60;
nchan_eventsource_ping_comment "ping";
nchan_eventsource_ping_event "";
nchan_subscribe_request /notifyurl;
nchan_unsubscribe_request /othernotifyurl;
}
location ~ /someotherurl {
internal;
nchan_publisher;
nchan_channel_id $1;
nchan_redis_pass redis_cluster;
nchan_channel_id_split_delimiter ",";
nchan_max_channel_id_length 32768;
}
}
}
1.2.10 is over 2 years and 10 releases behind. Please try the latest version (1.3.6). Elasticache should work just fine.
Sorry, I think I might've made things a bit confusing. 1.2.10 is working fine. We are facing these issues when we try to upgrade to 1.3.6. We are seeing this issue only when using Elasticache with clustered Redis. Our other service that uses non-clustered Redis is working fine with 1.3.6.
I think I misunderstood what you meant by "what version are you running", sorry!
Hi @slact, have you had a chance to look into this further?
What is your ElastiCache configuration? Is TLS enabled? AUTH?
No TLS or Auth in this case no. Just a cluster with a couple shards and replicas running on Redis 7.
Strange. I have no problem whatsoever using ElastiCache, any version, on any modern Nchan version.
Please try the following (separately):
nchan_redis_server
to just the config endpoint, no "redis://", no portnchan_redis_subscribe_weights
nchan_redis_server
to one of the cluster's nodes DNS address directly (from the node listing)Let me know which of these work, if any
How is nchan discovering nodes from ElastiCache? I would like to simulate it locally with DNS-based HAProxy setup.
@slact Sorry for the delay in getting back to you. I tried these separately as you suggested:
nchan_redis_server
to just the config endpoint (i.e nchan_redis_server clustername.clustercfg.region.cache.amazonaws.com;
) did not work.nchan_redis_subscribe_weights
also didn't work.nchan_redis_server
to one of the nodes directly (i.e to "cluster-name-0001-001.randomchars.0001.region.cache.amazonaws.com") works. But I should point out that AWS recommends using the configuration endpoint https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Endpoints.html for Elasticache Redis clusters with cluster mode enabled. In fact, some tools, like Terraform, don't expose those per-node endpoints, and so one would have to construct the URL manually.You mentioned you had no issues reproducing the issue, were you using the configuration endpoint or that of the individual nodes? The issue is specific to using the configuration endpoint due to its use of some form of a round-robin DNS. The configuration endpoint worked with NChan until v1.2.15.
Yeah, I had no problem using the shared config endpoint with Roundrobin DNS.
Please try the following: set the logging level to 'debug'', and grep through the log for anything with redis
. Please post the results -- or email it to me.
Hi,
It seems that starting from v1.2.15 (technically, v1.2.13 but that was withdrawn), NChan can no longer be used with AWS Elasticache Redis clusters. It fails to establish connections with the cluster citing (with debug logs enabled):
Elasticache Redis clusters are not behind a proxy, though there is some sort of DNS load balancing that happens. Clients use DNS to resolve a fixed hostname (called the "configuration endpoint") to any one of the cluster's nodes, and then discover the IP addresses of the other nodes in the cluster using standard Redis cluster commands.
From what I can tell, the root of the issue here seems to be that as part of the Redis TLS support changes, the pubsub connection now connects to Redis by
cp->hostname
, rather thancp->peername
innode_connector_callback
for theREDIS_NODE_CMD_CONNECTING
state.I applied the following patch and tested and it seemed to fix it, however I don't know enough about TLS or NChan to know if this is a reliable solution or not.