Consider counting duplicates differently

miketaylr commented 5 years ago

Talking with @andreasbovens and Sean Voisen, they suggested we count the first instance of a dupe as 1, and then each dupe after that as a fraction (say, 0.1).

It sounds interesting, I'm curious how hard it would be to implement, because we want to make sure we're capturing unique dupes (if that makes sense).

To clarify: for google.com, we report 81 dupes on 6/29, but there's only 13 Bugzilla bugs:

975444 -> 63 dupes 1524772 -> 1 1409257 -> 28 dupes (this shouldn't appear... it's a bug! ) 1552124 -> 1 1545659 -> 1 1545703 -> 1 (sci-exclude shouldn't appear) 1488334 -> 1 (sci-exclude shouldn't appear) 1503241 -> 3 1558362 -> 1 (sci-exclude, shouldn't appear) 1521655 -> 3 1494623 -> 1 1494623 - 5 (meta bug, shouldn't appear) 1392460 -> 3

So rather than counting as 81 (I'm confused how this math works, because there are more than 81 dupes there..., but I'll file a bug to not count dupes for sci-exclude and meta bugs), we would have something like:

975444 -> 7.2 1524772 -> 1 1552124 -> 1 1545659 -> 1 1503241 -> 1.2 1521655 -> 1.2 1494623 -> 1 1392460 -> 1.2

total: 14.8

miketaylr commented 5 years ago

@wisniewskit @past thoughts on this?

Andreas had the concern that you could "game the system" by just filing a ton of duplicate bugs.

past commented 5 years ago

(I'm confused how this math works, because there are more than 81 dupes there...)

I think I had verified in the past that after putting all the dupes in a set we end up with 81 distinct bugs.

past commented 5 years ago

Implementation seems straightforward: newScore = (oldScore / 10) + 0.9 I'm not sure how important this is, given that we don't prioritize automatically through this list, but if the main purpose is to prevent gaming, then this formula only requires one order of magnitude more bugs to be filed. We could go even further and make the weight a power law function (e.g. Zipf's law) to ensure a steepest descent.

There is also the point that we have discussed in the past about the relative importance of duplicates versus criticals and I think that the latter should carry more weight, which this change supports.

The other thing I've been contemplating, is what these numbers stand for. Counting duplicates separately makes the unweighted TSCI similar to a call center load metric, like "number of calls for support". This change would make the numbers more like "tens of calls for support", which is not that different.

miketaylr commented 5 years ago

I'm not sure how important this is

That's what I'm trying to figure out. I think if we do this, we should experiment with what it might look like and decide after the fact. As-is, not more important than #38.

miketaylr commented 5 years ago

Note: https://github.com/mozilla/tsci/pull/73 looks good, if we decide to explore this further.

miketaylr commented 5 years ago

[Reduce the weight of dupes beyond the first to 10% (](/mozilla/tsci/commit/07f497f394b99b8e8411e873fd803b67609d3968)[fixes](/mozilla/tsci/commit/07f497f394b99b8e8411e873fd803b67609d3968) #63[)](/mozilla/tsci/commit/07f497f394b99b8e8411e873fd803b67609d3968)

This commit is obsolete now, but I have one locally that makes it work.

miketaylr commented 4 years ago

We do this now!

mozilla / tsci

Consider counting duplicates differently #63