Closed elie222 closed 8 years ago
First thing first, have you read the source for publish-counts?
No I haven't. I read the article. Has the problem been fixed since? On 6 Dec 2015 18:28, "boxofrox" notifications@github.com wrote:
First thing first, have you read the source for publish-counts?
— Reply to this email directly or view it on GitHub https://github.com/percolatestudio/publish-counts/issues/76#issuecomment-162327138 .
So why don't we fix this package so that it gets Mongo to count instead of fetching all the docs itself?
I don't know exactly whether you're question was meant to be sincere or pejorative. You imply that this package is broken (why don't we fix this) and suggest a succint solution ([get] Mongo to count instead). I like that you do say we to suggest that we work together on this, since we're all volunteers, and none of us are paid for the work we do (as far as I know), but you don't appear to be earnest in including yourself in this endeavor to fix that which is broken, because you neglected the opportunity to read the source code (it's very short) which implies you expect someone else to do the work. The question reads terribly and definitely so by cranky developers already in a bad mood.
The short answer to your question is, "because it's not broken." Though there may be room for improvement.
If you're curious, "Why doesn't publish-counts use Mongo to count instead of fetching all the documents." is a much better and well-received question.
And the answer is, "It does! [1] Sort of... we ask Meteor who asks Mongo. And sometimes."
Why doesn't it do it all the time? Well, for two reasons.
First, is Kadira correct when they say, "...it is always a better idea to count inside the DB." After Mongo counts 50k+ documents for us, do we really want to ask Mongo to count 50k again when we add or remove documents? To the development team behind publish-counts, it seemed obvious to ask Mongo for the first count, then observe changes to the collection, and add/subtract one from the count, and resend the new count to the client. I don't have a large dataset to test this on, so I don't know if this worked out or not. Do you disagree with the choice made here?
Second, not everyone using publish-counts was happy counting documents, what the publish-counts refers to as basic counts. Some users wanted to count totals of fields or the number of items in a field's list. I'm not aware of any capability--except aggregates--in Mongo to offload countByField
[2] or countByFieldLength
[3]. But again, we're not working directly with Mongo, we're using Meteor's Collection API, which presents a few problems.
monbro:mongodb-mapreduce-aggregation
to supplement, but...
Meteor.Collection
,countByField
or countByFieldLength
to a Mongo aggregate/map-reduce query trivial or complicated? If trivial, then mayhap that publish-counts could do it. If complicated, then the publish-counts API would defer generating Mongo queries to the user, in which case, toss out publish-counts and write your own aggregate/map-reduce queries against Mongo directly. Though it would be nice to piggy-back off publish-counts channel for reactive updates on the client. We could add an option to set the initial count, so publish-counts doesn't have to bother there.Now, there are some problems I have with that article from Kadira.
countByField
or countByFieldLength
, neither of which are trivial to implement in Meteor. The CPU usage is not sustained. It only occurs when setting up a new connection for a user, not for reactive updates of the count.countByField
and countByFieldLength
, it's partially correct. publish-counts will fetch all records to calculate the initial count, then anything reactive after that point should not spike the CPU (for that particular user/connection/client).When attaching an observer to a Meteor.Collection
, the Meteor docs specify [4]:
Before observe returns, added (or addedAt) will be called zero or more times to deliver the initial results of the query.
and that's where the initial fetch of all 50k+ documents comes into play. Since publish-counts always use an observer, this probably affects basic counts even though collection.count
is used instead.
publish-counts is not a perfect solution to counting in all instances. Software development is mostly about compromises. In order to keep it simple, we risk pegging the CPU for a short while (admittedly this repeats from each new connection).
So where does that leave us? I'm quite busy these days, and don't have enough time to spend on this (hence part of my fustration with the initial question). A solution isn't as simple as "use Mongo", we need to address the concerns I mentioned above.
What would be great is having a large dataset to run benchmarks against, then at least there's a way to measure the effect of throwing collection.observe
out the window and letting Mongo count 50k+ records reactively. Setting up the dataset may be as trivial as running Collection.insert({abc: 123})
50k times. I don't know, but you're welcome to take a stab at this.
As always, pull requests are welcome. I think quite a bit of work is required to get where you think publish-counts should be. I think @tmeasday will likely need some convincing on the technical merits of a new approach, but benchmarks demonstrating better performance should make it easy.
@tmeasday is welcome to backhand me if my response has been unduly indignant.
Thank you for the detailed response. I apologise for offending you. I realise there are no direct benefits for creating and maintaining this package and I certainly don't give you any sort of payment.
I used the word "we" because I see myself as part of the Meteor community and although I haven't contributed with a package of my own, I have contributed to the community in general; with articles, answering questions on SO and contributing to other packages (which you will probably make use of/have already made use of). Not that I feel a need to justify my contributions to society/the Meteor community by asking a question on GitHub repo.
Thank you for the detailed response. I'm sure I'll learn from it when I fully digest the contents. I didn't realise it was such a problem to count documents inside MongoDB and didn't realise all the issues that arise from doing so, but now I do.
My pleasure. It's usually pretty quiet around here, so I overlook that we are part of a larger community. I'm happy :smiley: to answer questions, less happy :unamused: to answer insinuating questions, and generally happy to see effort as it pertains to the problem at hand.
If one replaces collection.observe
with Mongo queries these are the basic hurdles I see:
Mongo.Cursor
into a Mongo query for...
countByField
counts. Map query into Mongo map/reduce query.countByFieldLength
counts. Map query into Mongo map/reduce query.In writing my previous response, I did notice the Meteor docs offer collection.rawCollection
and collection.rawDatabase
. Either of these may be a decent method of accessing Mongo directly which would solve (1). These functions were introduced in March 2015 [1] and will require a bump of the supported Meteor version from 0.9.2 to [I think] 1.2. These functions also come with the warning from MDG:
You can use MongoInternals.NpmModules.mongodb.version to tell what version of the mongodb npm module is the backend for HTTP.call. This version may change incompatibly from version to version of Meteor; use at your own risk. (For example, we expect to upgrade from the 1.4.x series to the 2.x series in the not-too-distant future.)
Which should be expected to break this patch to publish-counts
anytime Meteor upgrades its underlying major version of mongodb npm module.
Cheers.
updates: mentioned pitfalls with new Meteor functions, add item (4) to hurdles list
I think this is just a documentation issue.
publish-counts is not intended to be used for counting over that kind of number of documents (50k?).
It's designed (for a fairly common use-case) of counting a small number of documents associated with a given other document. Typically this is in the order of dozens to hundreds.
I'm not sure what @arunoda's advice is for this use case (as you pointed out, his solution is behind a paywall), but I don't see a problem with using an observer for it.
I think we should just add a caveat to the top of the README explaining what the purpose of the package is. We could add a note somewhere saying "think we should reimplement this using .count()
? Here's why not..."
As I've explained a few times before, AFAIK it's difficult if not impossible to get counts exactly right without using an observer. However if you are counting over 50k documents, it's very possible that you don't need it to be exact and a different technique should apply. I encourage someone else to publish a package that does this.
Hi guys,
This is a great package and my intention was not to discourage people from using this package. But use it with care.
In order to meteor to observe a query. It needs to fetch all the related documents. When the app is at initial stage there won't be much data. But later on, there could be issues.
Our solution is to check it carefully. You can do it with Kadira's live query monitoring. It's FREE
@tmeasday I got it wrong when you mentioned it's behind a paywall. It was about the bulletproof meteor lesson.
Here's the solution we are proposed there when you got a lot of data in the DB. See: https://github.com/bulletproof-meteor/bullet-counter/blob/solution/lib/server.js
I would like to see, someone could publish it as a package or so. But it's fairly easy to use without a package.
I think this solution is great for the large document set use-case. I think someone should make a package for it.
Yeah! I have a lot of packages to manage. That's why I didn't publish this. Every code we open source is a liability.
From an article a few weeks ago: https://kadira.io/blog/meteor/success-stories-meteor-live-query-monitoring
So why don't we fix this package so that it gets Mongo to count instead of fetching all the docs itself?