Closed nmalkin closed 12 years ago
@ozten, I feel like I brought this issue up at some point, but I don't remember if we decided anything about it.
@ozten suggests keeping the metric as-is, but giving it a "cartoony" name, like "Persona adoption," to emphasize that the value itself is meaningless (or, more charitably, inaccurate).
The report has been renamed as discussed (with a clarification message added to the description). If more drastic action is required, reopen this issue.
Report #1 presents the average number of sites a user logs into with Persona (as of #29, it is the mean, not the median). The value is computed as the mean of all values of the
number_sites_signed_in
KPI across all data points on a given day.Problem
Consider this sequence of operations by a single user:
To compute the mean value, we will take the sum of all values for
number_sites_signed_in
(0+0+0+1+2+3=6) and divide by the total number of data points (6) to get a mean value of 1, while the correct value is, of course, 4.In general, the problem is that multiple interactions by a single user are treated as equivalent to a single interaction by multiple users.
One way to account for this would be to try to aggregate the data points by user (i.e., figure out which data points came from the same person) and use only the maximum value. However, this is costly and has undesirable privacy implications.
Another way to handle it would be to weight higher values of
number_sites_signed_in
(e.g.,2
=2×1
, ...). This is equivalent to saying, "oh, I just saw a 2. That means I also saw a 1, but that 1 shouldn't count." This a more sensible approach, but note that it wouldn't fully correct the bias in the example above; nor (for the same reason) can it account for repeated sign-ins to the same site.One more possibility is to do nothing, since we keep saying that this is not a very meaningful metric and we only care about its derivative. This is obviously the easiest, though we would probably have to stop calling it "average number of sites logged in."