redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.38k stars 968 forks source link

Unable to connect to AWS ElastiCache cluster #3018

Open hudda10 opened 2 days ago

hudda10 commented 2 days ago

Bug Report

Current Behavior

I am using Spring cloudgateway and Spring data redis LettuceConnectionFactory I have created a RedisMessageListenerContainer Bean like this below to listen to Key expiry events

   @Bean
    public RedisMessageListenerContainer redisMessageListenerContainer(LettuceConnectionFactory lettuceConnectionFactory) {
        RedisMessageListenerContainer container = new RedisMessageListenerContainer();
        container.setConnectionFactory(lettuceConnectionFactory);
        return container;
    }

My bean creation is failing because i am getting Connection related error with the AWS ElastiCache cluster using LettuceConnectionFactory.

Stacktrace below

Stack trace ```java io.netty.resolver.dns.DnsResolveContext$SearchDomainUnknownHostException: Failed to resolve 'test-cache-xxx-0001-002.test-cache-xxxx.cache.amazonaws.com' [A(1)] and search domain query for configured domains failed as well: [ai-ml-core.svc.cluster.local, svc.cluster.local, cluster.local, ec2.internal] at io.netty.resolver.dns.DnsResolveContext.finishResolve(DnsResolveContext.java:1151) at io.netty.resolver.dns.DnsResolveContext.tryToFinishResolve(DnsResolveContext.java:1098) at io.netty.resolver.dns.DnsResolveContext.query(DnsResolveContext.java:457) at io.netty.resolver.dns.DnsResolveContext.access$700(DnsResolveContext.java:69) at io.netty.resolver.dns.DnsResolveContext$2.operationComplete(DnsResolveContext.java:526) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492) at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636) at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118) at io.netty.resolver.dns.DnsQueryContext.finishFailure(DnsQueryContext.java:380) at io.netty.resolver.dns.DnsQueryContext$5.run(DnsQueryContext.java:315) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:408) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) Caused by: io.netty.resolver.dns.DnsNameResolverTimeoutException: [7551: /172.20.0.10:53] DefaultDnsQuestion(test-cache-xxx-0001-002.test-cache-xxxx.cache.amazonaws.com. IN A) query '7551' via UDP timed out after 5000 milliseconds (no stack trace available) ```

Can somebody please help ?

additonal message

logger - io.lettuce.core.protocol.ConnectionWatchdog

Cannot reconnect to [test-cache-xxx-0001-002.test-cache-xxxx.cache.amazonaws.com/:6379]: Failed to resolve 'test-cache-xxx-0001-002.test-cache-xxxx.cache.amazonaws.com' [A(1)] and search domain query for configured domains failed as well: [ai-ml-core.svc.cluster.local, svc.cluster.local, cluster.local, ec2.internal]

And deleting the pod sometimes fixes the issue and the connection works just fine.

Environment

hudda10 commented 1 day ago

hi Team, can somebody please suggest what can be done here to resolve the issue ?

tishun commented 13 hours ago

Hey @hudda10 ,

I would be happy to help, but this seems like a connectivity issue on the side of your application, in particular - DNS resolve issue.

Failed to resolve 'test-cache-xxx-0001-002.test-cache-xxxx.cache.amazonaws.com'

All the Lettuce driver could do is use the connection string provided to attempt to connect, but if the OS says that the address provided could not be resolved then: a) either the address is wrong / misspelled b) the container / physical machine could not resolve the domain name to an IP address (network settings)