Optionally disable merge box in Meteor for publish functions

mitar commented 7 years ago

Migrated from: meteor/meteor#5645

I think there should be multiple levels of disabling this:

disable field level merging, but keeping information which documents are published (will mostly retain existing semantics)
disable even keeping information which documents are published (makes reasoning tricky, especially if you are removing documents, and have multiple publish functions)

evolross commented 6 years ago

Question related to this... say I have 5000 simultaneous users all subscribing to the same publication identically. They only need the data, no reactivity, so I only implement this.added in the publication. I've read that Meteor caches the data and doesn't requery mongo for each user. Great. But I've also read that the server keeps an instance of the shared data in memory for every user in SessionCollectionView.

It sounds like disabling the merge box would possibly solve the duplicated memory data in SessionCollectionView but that it would also lose any caching of the data with the shared observer. Could anyone shed any light?

dr-dimitru commented 6 years ago

@evolross ain't it better to use Meteor.Methods in your case to return non-reactive portion of data from MongoDB. Actually it's a right approach: "Use Meteor.Methods to retrieve data from Mongo as much as possible, unless you're really need this data to be reactive" .

mitar commented 6 years ago

If you do not need your publication to be reactive, then just not make it reactive, but still use pub/sub. Like:

Meteor.publish('nonReactive', function () {
  MyCollection.find({}).observeChanges({
    added: (id, fields) => {
      this.added('MyCollection', id, fields);
    },
  }).stop();
  this.ready();
});

This still uses mergebox, so it means that only diff of data of what client already has is sent over. I mean, this depends what you are optimizing, of course. The whole pub/sub is about bandwidth optimization and low latency. If you do not care about bandwidth, then you can just fetch data every time it changes (with a bit higher latency). But if you want low latency, then use such non-reactive pub/sub.

And if you do want reactivity, but do not want to use memory on the server, then you can disable merge box, to keep low latency, but use more bandwidth.

aadamsx commented 6 years ago

If you do not need your publication to be reactive, then just not make it reactive, but still use pub/sub.

Taking your advice, and using Meteor Publish for both reactive and non-reactive situations, under what condition would you ever need to use Meteor Methods?

mitar commented 6 years ago

I use Meteor methods only to send data from client to server, so for mutation operations on data. So mutations go over Meteor methods, while data flows back to client over pub/sub. But I do use only server-side Meteor methods because I found out that latency compensation confuses users more than benefits them. Also, I never do anything in response to Meteor method call returning on the client, I always wait for data to arrive over the pub/sub back from the server before I continue (so generally I love declarative coding style, where my code reactively responds to data state, and not how that data changed; so it does not matter if it was a method call which changed data, or some server-side script or job).

paulincai commented 6 years ago

@aadamsx I use 95% of my queries via methods. When you use pub/sub, you "extend" a version of your Mongo to the client. However, with Methods you can inject data in other places like Redux. In large React flavors of Meteor with Redux as a data pipe you would most likely use methods. Reactivity for most is a limited case unless you only have a couple of features with reactivity at the core (chat for instance).

Methods secure your data because they (mostly) run on the server. Example: get some users tokens from Mongo in an array to query for those users. Your tokens never make it to the client though the method is called by the client and results go to the client.

With methods you can acquire data from other APIs on a local network. For instance, qet fixtures from a server in Amazon when your Meteor runs in same region in Amazon. As opposed to getting data into the client and sending to Meteor for processing. You cannot pub/sub all data that you are going to use.

dr-dimitru commented 6 years ago

This still uses mergebox, so it means that only diff of data of what client already has is sent over. I mean, this depends what you are optimizing, of course. The whole pub/sub is about bandwidth optimization and low latency. If you do not care about bandwidth, then you can just fetch data every time it changes (with a bit higher latency).

@mitar sometimes data is not expected to be changed, or if changed shouldn't be updated on Client's side. As I mentioned use pub/sub only when you need reactivity, and little more - or access that data in style of mini-mongo api.

mitar commented 6 years ago

OK, I do not want to make this issue a discussion, but it is interesting, so I cannot resist. :-)

Because I think it really just depends on the coding style you want to use. So it is more like a debate of which programming language you prefer. :-)

However, with Methods you can inject data in other places like Redux.

With pub/sub + tracker autorun you can do that as well. Every time data from server comes, you update Redux state. And when you want to modify data, you call through Redux a Meteor method.

Methods secure your data because they (mostly) run on the server.

You can do that in pub/sub handler on the server as well?

You cannot pub/sub all data that you are going to use.

Oh, you definitely can. There is nothing requiring you to do a MongoDB query inside the pub/sub handler on the server. Example: I have done queries to ElasticSearch and it works great. You get JSON-structure out of ElasticSearch and do this.added and publish it to the client, and then on the client you can search over it using MiniMongo. Pretty nice.

But I just wanted to point out that all this can be done, not open a fight. So, it is not really a question what can be done with what, but what more aligns with what you are used to and how you think about your app. And of course, what your app is doing. Almost all my apps are highly collaborative apps so reactivity for me is really useful.

Use pub/sub only when you need reactivity.

Yea, and my suggestion is that you can use it always, if you prefer this style of thinking about your app. So that reactivity vs non-reactivity is not changing how you get data. But it is always the same, I get data through pub/sub. I only control how often you want your data to updated. No reactivity and full reactivity are just two extremes of the spectrum. By using pub/sub you can then also decide where on the spectrum you want to be. So this is why I prefer this approach.

So I disagree that "using methods to get data is the right approach". There is no right approach. But it is an approach. :-)

paulincai commented 6 years ago

@mitar very interesting. Question for a more specific case: you have friends and friends of friends and all their feeds/posts. Each post has a like (blue if I liked it - like Facebook), a forward and counters for like and forward, infinite scroll in batches of 25. What would you chose? Method, non-reactive pub/sub, or pub/sub? Or better said, leaving convenience aside, what is ... "healthier" for the ecosystem (server, DB, bandwidth). (I feel this whole thing needs to fly into forums to avoid cluttering here :))))

dr-dimitru commented 6 years ago

So I disagree that "using methods to get data is the right approach". There is no right approach. But it is an approach. :-)

I agree, "right" for what case? I meant only in terms of efficiency and server resources consumption. I can tell what I don't really understand what happening under the hood in both cases, but having 16K+ users per day, moving from pub/sub (we've tried non-reactive too) to Methods improved efficiency and decreased server load on our end.

@mitar if you could give more info about what happening under the hood - this will help a lot to take "right" approach in different cases.

OK, I do not want to make this issue a discussion, but it is interesting, so I cannot resist. :-)

If three of us having this discussion it means some of us don't understand what to use and in what case.

Because I think it really just depends on the coding style you want to use.

I believe in comes to the specific case, same can be applied to programming languages and libs/frameworks, - we should pick right tool/approach to solve task we have in most efficient (time, resources, etc.) way

mitar commented 6 years ago

I believe in comes to the specific case, same can be applied to programming languages and libs/frameworks, - we should pick right tool/approach to solve task we have in most efficient (time, resources, etc.) way

+1000

I just wanted to point out that you can do it through pub/sub. Now, if this is good for your use case, determining this is why programming is not trivial and why different approaches exist.

Let's finish here and somebody can move the rest of the discussion to the forum.

evolross commented 6 years ago

One last chime-in:

My use-case was related to serving 5000 simultaneous users the exact same non-reactive data. I was contemplating using non-reactive pub-sub because of it's cursor-sharing so I wasn't querying the database 5000 for the same data like I would be with a plain Meteor Method for each user. But I discussed this with Theo of redis-oplog and he said there's a lot of overhead with using pub-sub. Both the 5000 copies of sessionCollectionView and a lot of other server overhead that gets built up for each pub-sub.

My solution ended up being a Meteor Method that serves the data from a memory cache. This is lightning fast, only hits Mongo once, and has the smallest amount of overhead.

paulincai commented 6 years ago

@evolross 'serves the data from a memory cache' - did you use redis?

evolross commented 6 years ago

Nope, just made a global dictionary-like {} object (e.g. gamesCache = {}) right in the Meteor collection code that hashes a dataset based on a _id of a parent object that relates to my app (e.g. a game document that has references to lots of other collections). The cache is ephemeral, which is fine in my use-case, and each dataset in the cache has a TTL. If there isn't an entry in the cache for a particular game's _id, it queries Mongo, populates it, and sets a new TTL. I also set a querying flag while a query is happening so the thousands of other Meteor.call requests pouring in all get a response to callback shortly (250ms later) if the cache is in the process of querying Mongo for a particular _id's dataset. This makes it so only one Mongo query ever happens, even with 5000+ users. If they all request within three or four seconds. It's obviously a few more Mongo calls if the requests are more spread out.

It's pretty slick. I've applied caches all over my app and its done wonders for performance and response time. It's amazing to see a Kadira chart that shows less response time as more users hit the app! But you have to be careful, as caches get tricky when data changes. I use redis-oplog's Vent functionality to send around cacheReset messages from other parts of the app if my data changes. This then dumps a particular id's dataset in the cache causing the next query to get fresh data. EDIT: So I guess I do use redis a little bit with the Vent messages, but it doesn't deliver any data to clients. They only send reset messages to the server(s) from other parts of the app.

mitar commented 6 years ago

And just to summarize now the initial issue here: the idea here is that for those who do want to use pub/sub and would like to keep low latency, but would like to make memory consumption smaller at the expense of more bandwidth consumption, there could be a way to do so by disabling merge box.

crapthings commented 5 years ago

@mitar

what is SessionCollectionView? is this equal to merge box ?

sebakerckhof commented 4 years ago

@crapthings yes, it's the same. SessionCollectionView is what it is called in the code.

I want to implement the case where we don't save anything. This is mainly useful for implementing new use cases like volatile message queues etc.

All the other cases are very tricky as soon as you have more than 1 publication. Couple of cases where this is very hard:

When switching users, we re-run all subscriptions and send the diff. For this we still need to remember the fields.
When stopping one subscription, you may still have another subscription on the same collection with different fields. So there's no correct way of acting here if we don't know the fields in the collection view.

If all publications had the same fields, it would've been easy, but when they have different fields it becomes impossible to do in a way that is meaningful to the client (and secure).

Therefore I'll limit my PR to not saving anything at all, which is useful for the volatile data use case explained above.

sebakerckhof commented 4 years ago

However, one thing that would be possible is to add the subscriptionId to the send messages and have the mergebox on the client side. Then semantics would be the same and the trade-off is really server memory vs bandwidth.

vlasky commented 4 years ago

The ability to disable the mergebox needs to be urgently added to Meteor as it is proving to be an enemy of Meteor app scalability.

Any Meteor app utilising pub/sub will rapidly hit the Node.js memory limit on the server given enough simultaneous subscribers and/or sufficiently large published collections.

I have been testing the package peerlibrary:control-mergebox as a stop-gap solution, but it needs further enhancement to work correctly when setUserId() is called - any app that performs user authentication and secures publications with user-based permissions.

I have written up the issue here: https://github.com/peerlibrary/meteor-control-mergebox/issues/11

sebakerckhof commented 4 years ago

@vlasky My PR here: https://github.com/meteor/meteor/pull/11151 handles the case when a userId changes. However, this does change the semantics, since there is no mergebox anymore. A next step, if that PR makes it in, would be to allow mergebox on the client side. I have a PoC of that. Then you can keep the current semantics but really trade-off bandwidth for memory.

linegel commented 3 years ago

@sebakerckhof this feature seems like a real must-have for any application with a growing user base and who plans to scale. Do you have any plan to finish this PR in the future?

filipenevola commented 3 years ago

Hi @linegel I believe this PR is moving forward soon even if Seba is busy, or with @vlasky or with me.

Let's check back in a few days.

meteor / meteor-feature-requests

Optionally disable merge box in Meteor for publish functions #79