ReactiveMongoClient performance on large result sets

tomas1885 commented 1 year ago

Describe the bug

When using ReactiveMongoClient for getting large result sets there's a huge performance penalty (orders of magnitude) compared to using the blocking client.

The issue stems from the usage of io.quarkus.mongodb.impl.Wrappers#toMulti, since it executes eachonItem` on the current Vertx context if possible.

The difference for our use case is 2s vs 28s. I understand the need to emit each item on the calling Vertx context, we need to find a different way as the overhead is just too high. Simple using AdaptersToFlow.publisher without the call to vertx results in low latency but the item is not emitted on the calling Vertx context. The only way I found to be able to keep the item emitting on the Vertx Context without suffering from the performance penalty is to use AdaptersToFlow.publisher and wrap it in a replying Multi with emitOn either a MutinyHelper.executor(currentContext) or simply emitOn(cmd->ctx.runOnContext(cmd)). I'm not sure if there are any side effects and there might be better ways of solving this issue, but for now we're stuck on using the underlaying MongoClient (unwrap) with this custom solution.

Any help would be appreciated

Expected behavior

ReactiveMongoClient performance should be up par with the blocking client or with using the Reactive Mongo driver from mongodb.

Actual behavior

The performance for large result set is worse by orders of magnitude.

How to Reproduce?

Query a large collection with large result set.

Output of `uname -a` or `ver`

Darwin Kernel Version 22.6.0: Fri Sep 15 13:41:28 PDT 2023; root:xnu-8796.141.3.700.8~1/RELEASE_ARM64_T6020 arm64

Output of `java -version`

openjdk version "17.0.7" 2023-04-18

GraalVM version (if different from Java)

No response

Quarkus version or git rev

3.3.3

Build tool (ie. output of `mvnw --version` or `gradlew --version`)

Build time: 2023-08-17 07:06:47 UTC Revision: 8afbf24b469158b714b36e84c6f4d4976c86fcd5 Kotlin: 1.9.0 Groovy: 3.0.17 Ant: Apache Ant(TM) version 1.10.13 compiled on January 4 2023 JVM: 17.0.7 (Eclipse Adoptium 17.0.7+7) OS: Mac OS X 13.6 aarch64

Additional information

No response

quarkus-bot[bot] commented 1 year ago

/cc @evanchooly (kotlin,mongodb), @geoand (kotlin), @loicmathieu (mongodb)

geoand commented 1 year ago

@tomas1885 thanks a lot for reporting.

It would be super useful to us if you either:

Profile the application and attach the results
Attach a sample that we can use to profile ourselves

cc @cescoffier @franz1981

tomas1885 commented 1 year ago

I tried using the AsyncProfiler but I don't think this is the profile you're looking for, do you have any specific modes (profiler flags) you want me to use? Because providing a sample app would require filling a local MongoDB with lot's of data

geoand commented 1 year ago

If the difference in the two styles is so large,I would assume that a simple CPU profile would show it (and I am pretty sure a memory profile would also show huge allocations of Mutiny types).

That said, @franz1981 is the expert at async-profiler settings :)

tomas1885 commented 1 year ago

profiling.zip Here's a Zip containing the profile html (I followed the instructions) of the reactive and blocking calls accordingly, If you want it in a different format please let me know

franz1981 commented 1 year ago

Got a couple of questions:

what's the duration of the profiling data?
have you added the -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints JVM args? I cannot see many expected inlined methods (eg where is mutiny?) so I suppose it was missing.
how many cores the server running the app is?

I suppose it is using macOS but should still able to observer the issue, correct?

tomas1885 commented 1 year ago

I did add these JVM args, I also didn't see much valuable information there, so I took the time to write a simple reproducer. Attached here as a zip file, there are 3 endpoint, one for blocking, one for reactive and another one with the hacky workaround. You can clearly see the issue. mongoreproducer.zip

geoand commented 1 year ago

Thanks a lot, I'll have when I can.

cescoffier commented 1 year ago

(Adding @jponge)

Thanks for this report. The mongo facade was implemented a very long time ago (before we had mutiny, actually). Many improvements can be done, and I think you found one. Using emitOn and using a trampoline would provide much better performances.

WDYT @jponge ?

jponge commented 1 year ago

Perhaps ContextAwareScheduler could be used to simplify pushing back execution on the correct Vert.x duplicated context

jponge commented 1 year ago

Ah my bad, this doesn't require a scheduler 🤦

tomas1885 commented 1 year ago

FYI, for now, as a workaround replace aggregate and find calls with something like this: var collection =db.unwrap.getCollection(PayoutsMongoRepository.COLLECTION); Context context = Vertx.currentContext(); if (context != null) { return Multi.createBy().replaying() .ofMulti(Multi.createFrom().publisher(AdaptersToFlow.publisher(collection.aggregate(pipeline,clazz)))) .emitOn(cmd->context.runOnContext(x->cmd.run())); } else { return Multi.createFrom().publisher(AdaptersToFlow.publisher(collection.aggregate(pipeline,clazz))); }

This doesn't seem to impact the performance and the item emission still happens on the same calling Vertx Context

quarkusio / quarkus