Open AndrewHunterRedGate opened 10 years ago
My immediate suspicion is that something is wrong with the way that we delete user data. I think nearly any other kind of issue would result in the numbers soaring - returning something like number of users multiplied by the number of times the query has run.
This can be worked around by recreating the query (which flushes out the results). We get this behaviour because we use actualised query views, and presumably there's an error somewhere when we update state more than once.
Hm. One possibility is that there's 'residual' data that hasn't been ingested by a query when a deletion is done that hasn't been processed yet. I think if the logic is broken in the right way, then it'll get removed before it gets added, re-adding a previously deleted user. This would break queries that aren't using CountUniqueValues as well, though.
I'm looking at the more likely possibility that something is broken in the chaining logic at the moment.
Hm, so I've found some broken behaviour relating to deletions but it doesn't have an obvious tie in to this issue: the query above results in a user count of 0 immediately after a deletion. However, as soon as any new event arrives that situation is resolved.
I can get the count to go wrong if there are both documents waiting to be added and documents waiting to be removed for the same user. The add happens after the remove, but the remove stops when the document count reaches 0.
That is, if you add and remove the same document in the same operation and also remove all the other documents for a user simultaneously, the remove does not cancel the add and the user is receives a count of one.
I think this is fixed now: I've moved the unreduction stage to after the reduction stage so add/remove operations cancel properly. Queries should automatically recalculate, so any that were affected by this bug should now be OK. I've also added a test case that reproduces the issue I found so it won't reoccur.
I'll assign this to David for now in case there's some other cause in the wild I didn't spot.
Looks like there's a new failure condition that sometimes results in a negative count
Turned out to be down to an error in the way data caching was handled, so a later stage of query evaluation was using out of date data.
This query:
{ "verb":"CountUniqueValues", "key":"user-id", "name":"value", "applies-to":{ "verb":"AllEvents", "applies-to":{
} }
creeps up over time when multiple queries are made but new users aren't being added.