Closed azhurbilo closed 4 years ago
Hello @azhurbilo. I'm sorry to hear that you're having issues with our synchronizer. We'll dig into this and let you know as soon as we can reproduce it and come up with a solution.
Just to be sure. Were you able to verify that the cluster was indeed recreated withe the same ip address?
Thanks.
Were you able to verify that the cluster was indeed recreated withe the same ip address?
we just see message in AWS log about instance recreation. Our monitoring system which connected directly to redis port from another VM host have tracked this downtime and reconnected automatically in 1 minute.
Hello @azhurbilo, we've been taking a look at this issue and we suspect that it's related to your redis cluster being re-created with the same hostname but a different IP. According to your logs, it looks like the IP of your cluster used to be 10.10.100.100
, before amazon re-created your instance. Would you mind checking if the host that you're currently connected to (after being re-created) points to the same address?
On linux you can do this by typing dig <REDIS_HOST>
and you should see something like:
;; ANSWER SECTION:
<REDIS_HOST> 234 IN A XXX.XXX.XXX.XXX
This is just to confirm that the host IP has indeed changed.
Also, do you remember how much time you waited for the synchronizer to re-connect to redis until you restarted it?
There's a chance that our timeouts and tcp keep-alive configurations might be on the high side as well.
In the meantime we will try to replicate the issue with an elasticache instance within our AWS environment.
We will keep you posted with our findings.
Thank you very much for reaching out, and once again, our apologies for the inconveniences caused.
would you mind checking if the host that you're currently connected to (after being re-created) points to the same address?
yes, it's the same 10.10.100.100 IP address
Also, do you remember how much time you waited for the synchronizer to re-connect to redis until you restarted it?
9 min
In the meantime we will try to replicate the issue with an elasticache instance within our AWS environment.
great, thnx!
Today faced with the same issue in Prod :( we use 2.5.2 version @mredolatti any news?
Hi @azhurbilo ,
We apologize for the delay on this. We were never able to reproduce this issue and we eventually had other issues to tackle.
It seems like every time an elasticache is created, a new hostname & ip address are generated, different from the previous one.
How exactly are you generating this? Are you updating a CNAME entry in your DNS records everytime you create an elasticache cluster, and using that CNAME as the redis host?
We want to replicate your environment as much as possible for debugging this.
Thanks! Martin
I can tell that AWS in this case trigger event ElastiCache:CacheNodeReplaceStarted
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/ElastiCacheSNS.html https://aws.amazon.com/elasticache/faqs/
ElastiCache has detected that the host running a cache node is degraded
or unreachable and has started replacing the cache node.
Note
The DNS entry for the replaced cache node is not changed.
In most instances, you do not need to refresh the server-list for your clients
when this event occurs. However, some cache client libraries may stop using
the cache node even after ElastiCache has replaced the cache node;
in this case, the application should refresh the server-list when this event occurs.
And we use additional CNAME (which never changed) to DNS name of AWS redis entrypoint.
may be it could help somehow.
@azhurbilo out of curiousity, does your elasticache cluster have 0 replicas?
yes it's 1 node redis elasticache instance
Hi @azhurbilo,
what's the TTL on your DNS records? We've been doing some tests and there's no DNS cache on our synchronizer, and no cache on the redis library & underlying TCP module. It seems like only OS-level DNS cache might be used.
Hi @azhurbilo,
it's been a while so I wanted to check the status on this. Are you still experiencing any issues with it? Could you investigate into your TTL for DNS records? Were there any other developments from your end?
We'll wait for a few days to hear back from you and then close the issue but feel free to reopen at any time if needed.
Best, Nico.
sorry for not responding. Yes, we can close this ticket as I am not working in the company now where this issue was reproduced.
We have faced with
Split-synchronizer issue in Prod
using AWS Elasticache Redis 3.2.10 (single mode) when Amazon try to recreate node:note: ip address of redis endpoint preserve https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/ClientConfig.DNS.html https://forums.aws.amazon.com/thread.jspa?threadID=140869
But in split-synchronizer logs we were seeing Error log for several minutes till we have redeployed split-synchronizer service: reconnection not handled properly
I've tried these steps with local redis, but it was not reproduced :(
May be you have some idea?