Improving MongoDB Oplog Tailing Mode Scalability with minResultFetchIntervalMs

vlasky commented 4 years ago

This is my proposal for an enhancement to the Meteor MongoDB code to improve the scalability of Meteor apps that use MongoDB in Oplog tailing mode

This would further add to the excellent work done by @benjamin and @theodorDiaconu

The Meteor MongoDB code needs a throttling feature to set a minimum time interval between successive result set fetches for a given reactive query. This allows the developer to impose a hard limit on the maximum update rate of a given reactive query.

This would improve scalability by greatly reducing CPU usage and memory used for reactive queries and network bandwidth when those reactive queries are published and subscribed to by clients.

In many cases, it is not necessary to fetch updates at top speed in response to every change. For example, if we are using Mongo publication to update a user interface component like a table or map or chart on a web browser page, we gain nothing from updating it more than say once per second.

A new option minResultFetchIntervalMs would be added to Mongo.Collection.find(), which represents the minimum allowable time delay in milliseconds between successive result set fetches for a given reactive query.

For example, a publication that can send reactive updates at a maximum rate of once per second would have a minResultFetchIntervalMs of 1000. A maximum rate of twice per second would be a minResultFetchIntervalMs of 500 and a maximum rate of once every 5 seconds would be a minResultFetchIntervalMs of 5000 and so on.

Equivalent functionality has existed in the mysql-live-select package, the key component of the Meteor MySQL integration since the beginning. It has been crucial in enabling our reactive Meteor MySQL apps to scale.

The lack of this feature in the Meteor MongoDB code is the biggest remaining obstacle to making reactive Meteor MongoDB apps scale. It should be incorporated without delay.

Example:

Let's imagine our Meteor application displays a map with real-time vehicle locations which are stored in a MongoDB collection published by the server.

    Meteor.publish('vehicleLocations', function() {
        return Locations.find();
    });

Let's imagine that this collection receives 100 separate vehicle position updates in one second. That would result in 100 extra entries inserted into the MongoDB oplog.

In the current Meteor MongoDB code, that would result in the publication potentially being triggered by each oplog entry, sending up to 100 updates to each subscribed client, resulting in lots of network bandwidth, CPU time and memory being needlessly consumed.

How this would be improved with minInteval:

Instead, let's imagine that we could publish the collection and specify a minResultFetchIntervalMs of 1000ms (1 second):

    Meteor.publish('vehicleLocations', function() {
        return Locations.find({},{minResultFetchIntervalMs: 1000});
    });

At time=0, oplog entry 1 causes the result set to be fetched, but then no further result fetch is allowed to take place until 1000ms (1 second) has elapsed.

Between time=0 and time=1, Meteor's MongoDB observer code notices oplog entries 2-100, but will not take any immediate action. Instead, it will schedule the next result set fetch to occur at t=1.

The same scenario repeats itself for the remaining 9 seconds of activity.

At the end of the 1 second, only 2 result set fetches would have been performed instead of 100.

Answers to Expected Questions:

Q. So you are ignoring events in the oplog. How is that good?

A: They are not being ignored - we just don't react to each one of them - kind of like when someone rings your doorbell multiple times - the first ring is enough to set you in motion towards the door.

Q: How is this more efficient than just using poll and diff?

A: This approach avoids needless polling and provides predictable response times to events.
Q: What is the scalability limit with this approach?

A: How quickly the Meteor MongoDB observer code can read the oplog. For best performance, one would store their MongoDB database & oplog on an SSD (preferably NVMe).

mitar commented 4 years ago

Yes, throttling would be cool.

Even more cool would be if DDP supported backpressure. Then client could simply decide how quickly it wants to consume changes and that would slow down the whole thing. Then the client would control the FPS to render and that would then push all the way down to how often you want refreshes from the database.

reactive-postgres supports that through node stream interface and support for backpressure.

But yes, manual throttling would be useful as well.

The problem is that it breaks semantic of DDP and observeChanges which currently works on assumption that all changes are seen.

vlasky commented 4 years ago

The problem is that it breaks semantic of DDP and observeChanges which currently works on assumption that all changes are seen.

I think people will see that it really isn't a significant issue for most use cases - especially those involving updating GUI components.

Not all intermediate changes need to be seen - what's important is usually just the cumulative difference between before and after the configured result fetch interval has elapsed.

In our 4 years using reactive MySQL queries, we have never had a use case on either the client side or the server side that required every single MySQL event to cause the reactive query to be triggered.

meteor / meteor-feature-requests

Improving MongoDB Oplog Tailing Mode Scalability with minResultFetchIntervalMs #367