Open jywarren opened 4 years ago
The explanatory text currently says:
The graphs above are stacked, and questions are counted both on their own as well as part of the tally for notes (because they are a form of note).
So the text could be expanded to:
The graphs above are stacked, and questions are counted both on their own as well as part of the tally for notes (because they are a form of note). Additional discrepancies may come from the tag page also listing questions tagged with "question:_____" but lacking the base tag, and also listing notes with only "child tags" of the base tag, in a system we are planning to slowly deprecate.
Link to "deprecating tag aliasing" - https://github.com/publiclab/plots2/issues/6367
Jeanette from the PL staff noted a discrepancy - when downloading a CSV and summing notes, questions, and wikis, the totals Jeanette got are:
From /stats: notes = 206; questions = 97; wikis = 42
However this was for a range of: https://publiclab.org/tag/air-quality/stats?utf8=%E2%9C%93&start=01-01-2010&end=14-10-2020&commit=Go
These don't match the tab totals shown at https://publiclab.org/tag/air-quality, of:
247 notes, 140 questions, 53 wikis
(note one more question was shown since Jeanette's screenshot)Exact discrepancy
A full date range CSV i got showed:
303 notes | 97 questions | 42 wikis
that means we are showing discrepancies of
-56 notes | 42 questions | 11 wikis
(where the /tag page has this # MORE than the stats CSV)Known sources of discrepancy
First, noting that some of the questions are for notes tagged with
question:air-quality
but which lackair-quality
- this accounts for some or all of the139-97 = 42
questions discrepancy.Second, the stats pages do not count notes, questions, or wikis which bear tags which have a
parent tag
(a system we are trying to phase out) ofair-quality
. The last line of this section of code shows those extra nodes getting included for the/tag/air-quality
page.I was able to find
61 notes
and11 wikis
that bear achild tag
ofair-quality
, which has affected this count. That seems to account for the wikis discrepancy.After accounting for 61 extra notes, we actually have
61 + 56 = 117
notes shown on the CSV which were not shown on the /tag page.But, according to these lines, we exclude all questions of any kind from this note count. Let's see how that affects the count:
So, that took us from 365 to 247, if we are including parent tags. That's the number shown on
/tags/air-quality
.Without counting parent tags OR questions, we get
206 notes
- that's vs.303
in the CSV.Let's look at where the CSV is being compiled:
https://github.com/publiclab/plots2/blob/27a3839154e0cec071860448f99697f9c831042c/app/models/tag.rb#L216-L239
This is a little convoluted, but i traced through it and it seems OK.
Running
Tag.nodes_for_period()
on the whole 10 year span returned 248, which is only 1 off:That's for the same
nids
collection as we got for the tags page - with parent tags, and excluding questions. Let's try running it without the parent tags, but leaving the questions in...OK, so the discrepancy seems to be (within an error of 2 notes) that the stats are excluding parent tags and including questions.
Takeaway
I believe this means that we don't need to change any queries, but we should add some of these caveats to the stats pages for those wondering. I can make an FTO once we settle on explanatory text!
Linking this thread to this explanation of questions counts on tag pages: https://github.com/publiclab/plots2/issues/8246