quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.76k stars 2.68k forks source link

MongoDB SRV Record not resolved inside WebSocket OnMessage method #32901

Closed amoscatelli closed 1 month ago

amoscatelli commented 1 year ago

Describe the bug

This is the follow-up of the investigation/discussion here :

31971

The issue is that MongoDB SRV records are not resolved properly ... in some cases.

I have been investigating this issue for weeks now, and finally I discovered it is related to WebSocket only, not Rest endpoints, and this is why other quarkus devs couldn't reproduce it. Also when I run "quarkus dev", locally everything works. The issue is triggered only on my remote Okteto Kubernetes environment.

This is the current state of my investigation :

Inside Rest Resource Inside WebSocket OnMessage
quarkus dev works works
K8s (Okteto) JVM works error (*)
K8s (Okteto) Native works error (*)

(*) Failed looking up SRV record / io.smallrye.mutiny.TimeoutException

I managend to identify the real culprit and created a small reproducer.

Also I quote @cescoffier reply :

Ok, that's interesting! I may have an idea. onMessage at the moment is not invoked on a duplicated context. I can see a few > reasons why this could affect the DNS resolution.

Expected behavior

MongoDB SRV records should be resolved normally

Actual behavior

MongoDB SRV records are not resolved :

2023-03-31 12:28:00.87 UTCchat-5bbc5746fc-9g6dzchat12:28:00 INFO traceId=, parentId=, spanId=, sampled= [co.te.we.ChatSocket_Subclass] (vert.x-eventloop-thread-0) 34.27.203.79
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat12:28:30 INFO traceId=, parentId=, spanId=, sampled= [co.te.co.AuditedInterceptor] (vert.x-eventloop-thread-0) alessandro.moscatelli called method: publish
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat12:28:30 INFO traceId=, parentId=, spanId=, sampled= [or.mo.dr.cluster] (cluster-ClusterId{value='6426d1d0c11d2a13b94ccbaa', description='null'}-srv-quarkus.uvyzu7h.mongodb.net) Exception while resolving SRV records: com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.quarkus.uvyzu7h.mongodb.net'.
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat at com.mongodb.internal.dns.DefaultDnsResolver.resolveHostFromSrvRecords(DefaultDnsResolver.java:92)
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat at com.mongodb.internal.connection.DefaultDnsSrvRecordMonitor$DnsSrvRecordMonitorRunnable.run(DefaultDnsSrvRecordMonitor.java:80)
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat at java.base@17.0.6/java.lang.Thread.run(Thread.java:833)
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:775)
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat at org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine(PosixPlatformThreads.java:203)
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchatCaused by: com.mongodb.MongoConfigurationException: Unable to look up SRV record for host _mongodb._tcp.quarkus.uvyzu7h.mongodb.net
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat at io.quarkus.mongodb.runtime.dns.MongoDnsClient.resolveSrvRequest(MongoDnsClient.java:152)
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat at io.quarkus.mongodb.runtime.dns.MongoDnsClient.getResourceRecordData(MongoDnsClient.java:104)
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat at com.mongodb.internal.dns.DefaultDnsResolver.resolveHostFromSrvRecords(DefaultDnsResolver.java:74)
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat ... 4 more
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchatCaused by: io.smallrye.mutiny.TimeoutException
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat at io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:64)
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat at io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65)
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat at io.quarkus.mongodb.runtime.dns.MongoDnsClient.resolveSrvRequest(MongoDnsClient.java:138)
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat ... 6 more
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat
2023-03-31 12:28:30.88 UTCchat-5bbc5746fc-9g6dzchat12:28:30 INFO traceId=, parentId=, spanId=, sampled= [or.mo.dr.cluster] (vert.x-eventloop-thread-0) No server chosen by com.mongodb.client.internal.MongoClientDelegate$$Lambda$2a9a67aed79fbe983935f57d61b3ec1607afa5d6@33c8af87 from cluster description ClusterDescription{type=REPLICA_SET, srvResolutionException=com.mongodb.MongoConfigurationException: Failed looking up SRV record for '_mongodb._tcp.quarkus.uvyzu7h.mongodb.net'., connectionMode=MULTIPLE, serverDescriptions=[]}. Waiting for 30000 ms before timing out

How to Reproduce?

https://github.com/amoscatelli/quarkus-test

These are the env variables I set to build and deploy to Okteto Kubernetes :

QUARKUS_PACKAGE_TYPE=native
TESTCONTAINERS_RYUK_DISABLED=true
QUARKUS_CONTAINER_IMAGE_REGISTRY=registry.cloud.okteto.net
QUARKUS_KUBERNETES_DEPLOY=true
QUARKUS_KUBERNETES_INGRESS_EXPOSE=true
QUARKUS_KUBERNETES_INGRESS_HOST=amoscatelli.cloud.okteto.net
QUARKUS_KUBERNETES_REPLICAS=1
QUARKUS_CONTAINER_IMAGE_BUILD=true
QUARKUS_CONTAINER_IMAGE_PUSH=true
QUARKUS_CONTAINER_IMAGE_GROUP=amoscatelli
QUARKUS_CONTAINER_IMAGE_USERNAME=amoscatelli
QUARKUS_CONTAINER_IMAGE_PASSWORD=
QUARKUS_NATIVE_CONTAINER_BUILD=true
QUARKUS_KUBERNETES_ENV_CONFIGMAPS=mongodb
quarkus.kubernetes.env.mapping.quarkus-mongodb-connection-string.from-configmap=mongodb
quarkus.kubernetes.env.mapping.quarkus-mongodb-connection-string.with-key=url
QUARKUS_KUBERNETES_ENV_SECRETS=mongodb
quarkus.kubernetes.env.mapping.quarkus-mongodb-credentials-username.from-secret=mongodb
quarkus.kubernetes.env.mapping.quarkus-mongodb-credentials-username.with-key=username
quarkus.kubernetes.env.mapping.quarkus-mongodb-credentials-password.from-secret=mongodb
quarkus.kubernetes.env.mapping.quarkus-mongodb-credentials-password.with-key=password 

Output of uname -a or ver

No response

Output of java -version

No response

GraalVM version (if different from Java)

No response

Quarkus version or git rev

3.0.0.Final

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

quarkus-bot[bot] commented 1 year ago

/cc @Sgitario (kubernetes), @evanchooly (mongodb), @geoand (kubernetes), @iocanel (kubernetes), @loicmathieu (mongodb)

amoscatelli commented 1 year ago

Any update on this ? I think it would be better to add the websocket area to the issue, is it possible ?

Thank you in advance

f0nZ commented 11 months ago

Any update on this topic? Having the same error while resolving SRV records.

Which is strange because if I decouple to a simple REST endpoint the same DNS resolution works.

It seems something with websockets dependency is behaving differently

cescoffier commented 11 months ago

Yes, it's because of the web socket. The work has not started yet.

shrikanthkr commented 11 months ago

Any workarounds on this? I have the same issue.

cescoffier commented 11 months ago

@shrikanthkr just connect to mongo outside of the web socket, that should workaround the issue.

gsmet commented 2 months ago

@mkouba is the issue solved with the new WebSockets next?

If so, I would be in favor of closing this one and document it in the MongoDB guide.

mkouba commented 2 months ago

@mkouba is the issue solved with the new WebSockets next?

If so, I would be in favor of closing this one and document it in the MongoDB guide.

I have no idea. @amoscatelli would you care to try your reproducer with WS Next instead?

geoand commented 1 month ago

Let's close this for now and we can reopen if necessary