spring-projects / spring-data-mongodb

Provides support to increase developer productivity in Java when using MongoDB. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
https://spring.io/projects/spring-data-mongodb/
Apache License 2.0
1.59k stars 1.07k forks source link

Misleading Change Streams documentation #4724

Open michaldo opened 3 weeks ago

michaldo commented 3 weeks ago

Change Streams can be consumed with both, the imperative and the reactive MongoDB Java driver. It is highly recommended to use the reactive variant, as it is less resource-intensive.

src/main/antora/modules/ROOT/pages/mongodb/change-streams.adoc

I do not understand why reactive variant is highly recommended. I follow example give in documentation part interactive, and I see one thread is started and listen changes on Mongo socket. Is one extra thread resource intensive? What about virtual threads?

If I use standard Mongo client in application and I want use Change Stream, should I a) rewrite application b) use 2 clients in single application c) choose not recommended variant ?

IMHO documentation preface presents false statement or rough opinion without justification/explanation. The preface is misleading and may cause wrong developers decision.

I propose remove the sentence at all or clarify resource utilization subject

mp911de commented 3 weeks ago

Reactive Change Streams start their activity from a push-based I/O mechanism while imperative Change Streams (also, tailable cursors) require constant polling. Polling is a repetitive activity that consumes computation time while there might be no data at all. Additionally, each imperative change stream requires one thread resulting in a linear relationship between the number of change streams being consumed vs. CPU and memory cost.

Reactive Change streams are constrained in the number of their I/O threads to roughly the number of CPU cores regardless the number of concurrent change streams being consumed.

We introduced change stream and tailable cursor support in 2018. I think in late 2018, the JDK team started considering lightweight threads if I'm not mistaken. I think it is worth exploring how Virtual Threads behave if you enable these on your Executor, let us know how this goes.

michaldo commented 3 weeks ago

Thanks Mark for answer. I was focused on thread issue and I miss poll issue.

My findings:

  1. Both standard and reactive solution capture changes by calling one per second query: DEBUG o.m.driver.protocol.command - Command "getMore" started... At this point my reactor skill ends. For standard client, when I used platform thread, the thread dump is:
"my-platform-cdc" #58 [34188] prio=5 os_prio=0 cpu=0.00ms elapsed=4.93s tid=0x000001e6153831b0 nid=34188 runnable  
[0x0000003f225fe000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.Net.poll(java.base@21.0.3/Native Method)
    at sun.nio.ch.NioSocketImpl.park(java.base@21.0.3/NioSocketImpl.java:191)
    at sun.nio.ch.NioSocketImpl.park(java.base@21.0.3/NioSocketImpl.java:201)
    at sun.nio.ch.NioSocketImpl.implRead(java.base@21.0.3/NioSocketImpl.java:309)
    at sun.nio.ch.NioSocketImpl.read(java.base@21.0.3/NioSocketImpl.java:346)
    at sun.nio.ch.NioSocketImpl$1.read(java.base@21.0.3/NioSocketImpl.java:796)
    at java.net.Socket$SocketInputStream.read(java.base@21.0.3/Socket.java:1109)
    at com.mongodb.internal.connection.SocketStream.read(SocketStream.java:176)

When I use virtual thread, thred dump is:

#56 "my-virt-cdc" virtual
      java.base/java.lang.VirtualThread.park(VirtualThread.java:582)
      java.base/java.lang.System$2.parkVirtualThread(System.java:2643)
      java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:54)
      java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:369)
      java.base/sun.nio.ch.Poller.pollIndirect(Poller.java:139)
      java.base/sun.nio.ch.Poller.poll(Poller.java:102)
      java.base/sun.nio.ch.Poller.poll(Poller.java:87)
      java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:175)
      java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:201)
      java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:309)
      java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:346)
      java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:796)
      java.base/java.net.Socket$SocketInputStream.read(Socket.java:1109)
      com.mongodb.internal.connection.SocketStream.read(SocketStream.java:176)

My conclusion is:

  1. Both standard and reactive are effectively poll solution, because Command "getMore" is executed one per second.
  2. I guess it is not good that platform thread is RUNNABLE - otherwise you will not prefer reactive in the documentation preface.
  3. I guess it is good that virtual thread is parked (java.lang.VirtualThread.park(VirtualThread.java:582))