Closed charalampos-micholias closed 2 months ago
/cc @evanchooly (mongodb), @loicmathieu (mongodb)
Hi, Can you try with the latest Quarkus version, it should be 3.6.1. If it still fails, can you provide a reproducer project?
Hello, good afternoon, After several further tests we have identified the exact pattern that this issue occurs. More specific issue occurs on the below combination:
keep as important note -as a tested workaround- that issue does not appear upon combining Quarkus 3.1.0 and connection string containing exact mongo shards formatted as: mongodb://USERNAME:PASSWORD@XXXXX-shard-00-00.nuler.mongodb.net:27017,XXXXX-shard-00-01-nuler.mongodb.net:27017/
So after many thorough test seems that even Quarkus 3.1.0 version faces inability to establish connection when a SRV mongo connection string is provided.
Exact logs for aforementioned combination:
15:09:53 INFO traceId=, parentId=, spanId=, sampled= [or.mo.dr.cluster] (cluster-6578779b5cf3934a85277080) No server chosen by WritableServerSelector from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out
15:09:56 INFO traceId=0caf882656bffbdbd9cae0ca6fbfeb3f, parentId=, spanId=c9077c52e865326b, sampled=false [or.mo.dr.cluster] (executor-thread-2) No server chosen by com.mongodb.reactivestreams.client.internal.ClientSessionHelper$$Lambda$790/0x00000000903f70f8@4180c2b from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out
15:09:57 INFO traceId=, parentId=, spanId=, sampled= [or.mo.dr.cluster] (cluster-6578779b5cf3934a85277080) No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out_
_15:10:23 ERROR traceId=ad5511d672b3e0370c637d1fb918f448, parentId=, spanId=12c2d61cc5444467, sampled=true [io.qu.mu.ru.MutinyInfrastructure] (vert.x-eventloop-thread-0) Mutiny had to drop the following exception: com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.dxl-gr-preprod-nuler.mongodb.net'., servers=[]
at com.mongodb.internal.connection.BaseCluster.createTimeoutException(BaseCluster.java:428)
at com.mongodb.internal.connection.BaseCluster.handleServerSelectionRequest(BaseCluster.java:324)
at com.mongodb.internal.connection.BaseCluster.access$700(BaseCluster.java:69)
at com.mongodb.internal.connection.BaseCluster$WaitQueueHandler.run(BaseCluster.java:503)
at java.base/java.lang.Thread.run(Unknown Source)
As upgrading further in 3.6.1 will require extended effort in a series of services, would you please update us if there are implemented enhancements in mongo connection establishment area in Quarkus 3.6.1 that could resolve the above issue and which are they in high level?
Thank you all for your assistance
Hello, we are experiencing similar issues with Quarkus 3.5.3, we have these stacktraces showing every once in a while when connected to a MongoDB Atlas cluster:
Unable to look up SRV record for host _mongodb._tcp.option-pl-0.91p5n.mongodb.net
we are using the reactive client. Sorry I don't have th full stacktrace, can't find a way to export it cleanly for datadog.
hello all! @loicmathieu what kind of corrections are available on 3.6.1?
Hello @loicmathieu, we would like to ask if there is an ongoing investigation on the above issue. Thank you,
Facing the same problem here
@va1erian @roiar @charalampos-micholias in JVM mode we use the default MongoDB DNS resolver, nothing specific to Quarkus. However, in native mode there is an experimental DNS resolver as the MongoDB DNS resolver uses JNI which is not supported in native, this is explained here: https://quarkus.io/guides/mongodb#using-mongosrv-urls
Did you deploy to native and use the experimental DNS resolver?
@loicmathieu we deploy a regular JVM without touching the dns resolver configuration. However note that this issue disappeared at one point when we upgraded Quarkus, I can't recall the exact version but it has been a couple of month already. We are currently running Quarkus 3.10
We're facing exactly the same issue running Quarkus 3.10. Don't know if this is relevant, but we're using JVM Images for Java 17, so, if I read the documentation correctly, it should be this base image: registry.access.redhat.com/ubi8/openjdk-17-runtime:1.19
Relevant Log Messages:
Exception while resolving SRV records
No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.some.name.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out
If we remove the srv Record from the config and add all the single database entries, it is working.
See https://github.com/quarkusio/quarkus/discussions/31971#discussioncomment-10613900 for a possible workaround
We believe that #43222 fixes this, so I will close it for now, but if the issue persists with 3.15
(which will contain the fix), we can reopen
Hello @geoand , we are still facing the issue with Quarkus 3.14.4 (native), the only workaround we were able to apply with major effects, was ditching mongo+srv:// in favor of simple mongo:// connection string. (ref. we used: https://stackoverflow.com/questions/78713411/com-mongodb-mongoconfigurationexception-failed-looking-up-txt-record-for-host)
We inspected the solution in #43222, and concluded that to our case the retry is not effective, since that once the error is emitted by the Mongo connection attempt, nothing will work until the pod / application is restarted and some period (1 - 10min) passes.
We contacted Mongo Atlas support to little help, except an upsell that we bought, and the problem persisted.
Additional info: We have at least 4 Quarkus microservices running 3.8.6 (LTS) (Native) and 3.14 that are not affected by the issue.
We inspected the solution in https://github.com/quarkusio/quarkus/pull/43222, and concluded that to our case the retry is not effective, since that once the error is emitted by the Mongo connection attempt, nothing will work until the pod / application is restarted and some period (1 - 10min) passes.
Can you elaborate on this a little more please?
We inspected the solution in #43222, and concluded that to our case the retry is not effective, since that once the error is emitted by the Mongo connection attempt, nothing will work until the pod / application is restarted and some period (1 - 10min) passes.
Can you elaborate on this a little more please?
Thanks for the reply, I'll try to elaborate a bit more:
Application will emit the following exception for a couple o seconds, then health-check will fail and app restarted, then we will randomly face the same problem for a few minutes, until connection is successful, then the cycle repeats, so in this case it seems the retry (part of the solution of #43222 ) doesn't work because the srv resolution will persistently fail for minutes at times.
Mutiny had to drop the following exception [Error Occurred After Shutdown]: com.mongodb.MongoTimeoutException: Timed out while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.<our-mongo-sub-domain>.mongodb.net'., servers=[]
at com.mongodb.internal.connection.BaseCluster.createAndLogTimeoutException(BaseCluster.java:392)
at com.mongodb.internal.connection.BaseCluster.selectServer(BaseCluster.java:148)
at com.mongodb.internal.connection.AbstractMultiServerCluster.selectServer(AbstractMultiServerCluster.java:54)
at com.mongodb.internal.binding.ClusterBinding.getReadConnectionSource(ClusterBinding.java:108)
at com.mongodb.client.internal.ClientSessionBinding.getConnectionSource(ClientSessionBinding.java:128)
at com.mongodb.client.internal.ClientSessionBinding.getReadConnectionSource(ClientSessionBinding.java:92)
at com.mongodb.internal.operation.SyncOperationHelper.withSuppliedResource(SyncOperationHelper.java:141)
at com.mongodb.internal.operation.SyncOperationHelper.withSourceAndConnection(SyncOperationHelper.java:122)
at com.mongodb.internal.operation.SyncOperationHelper.lambda$executeRetryableRead$4(SyncOperationHelper.java:186)
at com.mongodb.internal.operation.SyncOperationHelper.lambda$decorateReadWithRetries$12(SyncOperationHelper.java:289)
at com.mongodb.internal.async.function.RetryingSyncSupplier.get(RetryingSyncSupplier.java:67)
at com.mongodb.internal.operation.SyncOperationHelper.executeRetryableRead(SyncOperationHelper.java:191)
at com.mongodb.internal.operation.SyncOperationHelper.executeRetryableRead(SyncOperationHelper.java:173)
at com.mongodb.internal.operation.CommandReadOperation.execute(CommandReadOperation.java:48)
at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:153)
at com.mongodb.client.internal.MongoDatabaseImpl.executeCommand(MongoDatabaseImpl.java:196)
at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:165)
at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:160)
at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:150)
at io.quarkus.mongodb.health.MongoHealthCheck$MongoClientCheck$1.get(MongoHealthCheck.java:185)
at io.quarkus.mongodb.health.MongoHealthCheck$MongoClientCheck$1.get(MongoHealthCheck.java:182)
at io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21)
at io.smallrye.mutiny.operators.uni.builders.UniCreateFromItemSupplier.subscribe(UniCreateFromItemSupplier.java:28)
at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36)
at io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27)
at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:635)
at org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2516)
at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2495)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1495)
at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:11)
at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:11)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base@17.0.12/java.lang.Thread.run(Thread.java:840)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:885)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:860)
this shorter stack trace is also logged:
Mutiny had to drop the following exception: com.mongodb.MongoTimeoutException: Timed out while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.<our-mongo-sub-domain>.mongodb.net'., servers=[]
at com.mongodb.internal.connection.BaseCluster.createAndLogTimeoutException(BaseCluster.java:392)
at com.mongodb.internal.connection.BaseCluster.handleServerSelectionRequest(BaseCluster.java:293)
at com.mongodb.internal.connection.BaseCluster.access$800(BaseCluster.java:85)
at com.mongodb.internal.connection.BaseCluster$WaitQueueHandler.run(BaseCluster.java:475)
at java.base@17.0.12/java.lang.Thread.run(Thread.java:840)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:885)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:860)
We are also trying find if there are any problems with DNS resolution in our infrastructure, but so far, nothing.
Edit: seems to be related to https://github.com/quarkusio/quarkus/issues/32313
I wonder if quarkus.mongodb.dns.lookup-timeout
will help in your case.
I wonder if
quarkus.mongodb.dns.lookup-timeout
will help in your case.
Great, we will try-it out and report the results, thank you very much!
🙏🏽
Describe the bug
Issue using Quarkus 2.15.0 version
When trying a connection attempt in a mongo shard through a Quarkus application version >= 2.15.0 below error is propagated:
Issue using Quarkus 3.1.0 version
As described on #32991 this issue is resolved in "Quarkus 3.0.2.Final" version
Based on the aforementioned proposal, upon upgrading on Quarkus "3.1.0.Final" a new error (resulting to same mongo connection inability) takes place.
Below you may find exact logs:
Dedicated Script for further logs: In attempt to have further view on Mongo SRV attributes we have created a dedicated script - triggered each time mongo data retrieval- is attempted that provides below logs:
Expected behavior
Quarkus should connect to Mongo Atlas without this SRV lookup or MongoSocketOpenException error
Actual behavior
Connection failure takes place when reaching the established time thresshold "Exception opening socket}, caused by {java.net.ConnectException: Connection refused}}]}. Waiting for 30000 ms before timing out"
How to Reproduce?
Using the free tier mongo atlas account,
Output of
uname -a
orver
MINGW64_NT-10.0-22621 E0624331 3.3.3-341.x86_64 2022-01-17 11:45 UTC x86_64 Msys
Output of
java -version
java version "17.0.7" 2023-04-18 LTS
Quarkus version or git rev
2.15.0 and 3.1.0
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
Dedicated script getMongoDebugLogs() :