big collections redux - Githubissues

javaknight commented 10 years ago

I need this to watch a big collection, but I don't want to actually publish the big collection to the client. I just need the count of the collection to be published, and if that count changes, then I need the change to be reactive and sent to the client.

Currently when I fire this up on a large collection, it just crashes my server after showing me the initial count.

dburles commented 10 years ago

hey @javaknight do you have any more info on the crash? and how many records are in the collection?

tmeasday commented 10 years ago

If the collection is truly large, it might be better to do something a little different:

Meteor.publish('count', function() {
  var self = this, first = true;
  var count = function() {
    var thisCount = Collection.find().count();
    if (first) {
      self.added('counts', 'X', {count: thisCount});
    } else {
      self.changed('counts', 'X', {count: thisCount});
    }
    first = false;
  }
  var timeout = Meteor.setInterval(count, 1000); // every 1s
  count();
  self.ready();

  self.onStop(function() {
    Meteor.clearTimeout(timeout)
  });
});

Of course ideally you'd share the timeout between multiple users subbing to the same publication. Sounds like a whole package of it's own :)

tmeasday commented 10 years ago

(This package pulls down an caches the _id from every record. If there are a lot of them, this is a terrible idea. But it allows it to be truly realtime).

chhib commented 10 years ago

@tmeasday: I think you mean Meteor.setInterval instead of Meteor.setTimeout.

tmeasday commented 10 years ago

Ahh, thanks @chhib - I updated the code so people aren't confused.

colllin commented 9 years ago

@tmeasday Why is it necessary to cache the _id from every record? Compared to just incrementing on added and decrementing on removed?

tmeasday commented 9 years ago

@colllin if you are talking about livedata's added and removed--well it'll need to cache the _id to work properly anyway. The underlying reason is basically timing issues on the oplog -- if the server sees an oplog inserted message it needs to check that it hasn't already counted that document (thus the cached _id) otherwise there are edge cases in which double counting could happen.

That's my understanding of it anyway. Possibly someone could figure out a way to use the low-level oplog driver and deal with these issues, not sure.

colllin commented 9 years ago

@tmeasday Yes, that's what I was talking about. I didn't realize observe()ing added and removed documents was imperfect (could send duplicate events)... interesting. Thank you for the explanation.

tmeasday commented 9 years ago

To be clear it's the oplog that is imperfect (I think there are a bunch of issues around the exact timing of doing your initial query vs where you start observing the oplog from).

.added() and .removed() in livedata are "perfect", but have the aforementioned performance caveat (you don't want to do them on a huge cursor).

jchristman commented 9 years ago

@javaknight, I had this same problem because my collection at 100,000+ rows - I am implementing a "scrollbox" that loads a sliding window over a collection to emulate the browser loading the entire collection. I implemented the solution @tmeasday posted above at https://github.com/jchristman/meteor-collection-scroller/blob/master/lib/collections.js if you wanna check it out (also at http://scroller.meteor.com). Atmosphere Link