Open javaknight opened 10 years ago
hey @javaknight do you have any more info on the crash? and how many records are in the collection?
If the collection is truly large, it might be better to do something a little different:
Meteor.publish('count', function() {
var self = this, first = true;
var count = function() {
var thisCount = Collection.find().count();
if (first) {
self.added('counts', 'X', {count: thisCount});
} else {
self.changed('counts', 'X', {count: thisCount});
}
first = false;
}
var timeout = Meteor.setInterval(count, 1000); // every 1s
count();
self.ready();
self.onStop(function() {
Meteor.clearTimeout(timeout)
});
});
Of course ideally you'd share the timeout between multiple users subbing to the same publication. Sounds like a whole package of it's own :)
(This package pulls down an caches the _id
from every record. If there are a lot of them, this is a terrible idea. But it allows it to be truly realtime).
@tmeasday: I think you mean Meteor.setInterval
instead of Meteor.setTimeout
.
Ahh, thanks @chhib - I updated the code so people aren't confused.
@tmeasday Why is it necessary to cache the _id
from every record? Compared to just incrementing on added
and decrementing on removed
?
@colllin if you are talking about livedata's added
and removed
--well it'll need to cache the _id
to work properly anyway. The underlying reason is basically timing issues on the oplog -- if the server sees an oplog inserted message it needs to check that it hasn't already counted that document (thus the cached _id
) otherwise there are edge cases in which double counting could happen.
That's my understanding of it anyway. Possibly someone could figure out a way to use the low-level oplog driver and deal with these issues, not sure.
@tmeasday Yes, that's what I was talking about. I didn't realize observe()
ing added and removed documents was imperfect (could send duplicate events)... interesting. Thank you for the explanation.
To be clear it's the oplog that is imperfect (I think there are a bunch of issues around the exact timing of doing your initial query vs where you start observing the oplog from).
.added()
and .removed()
in livedata are "perfect", but have the aforementioned performance caveat (you don't want to do them on a huge cursor).
@javaknight, I had this same problem because my collection at 100,000+ rows - I am implementing a "scrollbox" that loads a sliding window over a collection to emulate the browser loading the entire collection. I implemented the solution @tmeasday posted above at https://github.com/jchristman/meteor-collection-scroller/blob/master/lib/collections.js if you wanna check it out (also at http://scroller.meteor.com). Atmosphere Link
@tmeasday regarding your setInterval example, instead couldn't you just make the observer based on a cursor that finds a limit of one row, sorted by newest to oldest, and then increment the count only when needed rather than by interval. And of course call Collection.find().count()
just once at the beginning. And then set the removed observer as usual. You'd just need to accept the collection as an argument instead of a cursor, perhaps a collection plus selector plus dateColumn.
Counts.publish = function(self, name, collection, selector, dateColumn, options) {
var sort = {};
sort[dateColumn] = -1;
var count = collection.find(selector, {sort: sort, limit: 1}).count();
var observers = {
added: function(id, fields) {
count += 1;
if(!initializing) self.changed('counts', name, { count: count });
},
removed: function(id, fields) {
count -= 1;
self.changed('counts', name, { count: count });
}
};
//etc
};
@faceyspacey - Seems like a good idea for collections where you do have a date field to work with.
I'm not sure that the removed
will work however? What if I remove a document that isn't the latest?
is there really no way for meteor's observers to skip calling all the added
handlers on first run. Like a way internal to how meteor's observeChange's work. It seems everyone is doing the !initializing
thing. ...I guess another cursor without a limit could be created just for the removed observer. Collection.remove
could be overwritten to somehow notify this code--obviously that won't address direct changes to the mongo collection outside meteor code. The first solution seems fine to me. Whatchu think?
then i guess overwriting collection.remove
is the only answer, coupled with a rest API endpoint to ping if you remove rows outside of meteor. You just have to ping that API every time you directly remove rows. For me--and I'm willing to bet the vast majority of meteor developers--we wouldn't even need that. Maybe just a simple reset()
method to call from time to time.
so I guess collection.remove
would store in another collection the name of the collection (only if a count was published for the collection). No more than the collection name would need to be stored. And then in Count.publish
we just observe this collection for new added documents (selecting only documents that have the appropriate collection name), and when found decrement the count. We would also remove the row from this auxiliary collection after we decrement the count so it too is not very large (never more than one row lol).
@faceyspacey if you are going to think about wacky solutions like this, I'd suggest just denormalizing the count somewhere.
well then just resetting the count on removes would be the solution. using a counts collection with the count from one publication denormalized into one row there.
@tmeasday using setInterval can also be improved by keeping track of the previous count and only sending data to the client if the current (thisCount) is different from the previous count.
Knowing the current limitation of oplog in combination with the exisitng observer api, I think the best solution for scale is to compute and store counts (in a mongodb collection or in the relevant doc) on insert and remove. Ex. On adding or removing comments from a post, update comments counter storage (possible on post, ie. post.commentsCount). This might look silly but works and scale very well.
I think we could make this better if we do the following steps:
publish-counts
but not store any cachecursor#count
method.Just one point I'm a bit unclear on. If I have a large collection but only want to count a small number of these (like unread notifications for a particular online user not all notifications for all users). Then I am only caching the documents in the cursor right not the entire collection so this package would work very well for counting small numbers of things.
However I suppose if I had 500 online users each with only 10 unread notifications I'd still be caching 5000 documents right?
I need this to watch a big collection, but I don't want to actually publish the big collection to the client. I just need the count of the collection to be published, and if that count changes, then I need the change to be reactive and sent to the client.
Currently when I fire this up on a large collection, it just crashes my server after showing me the initial count.