Closed kevinkreiser closed 7 years ago
Why not just emit a histogram? A vector of count/duration?
@dnesbitt61 because then non-anonymised data would leave the reporter, for example when the count is 1 for a given slot in the histogram
just to be clear i think things would be vastly simplified if we do what @dnesbitt61 is suggesting, so yeah i hope when we get clarification the answer is make it so
:smile:
Currently the reporter if it gets 5 observations for a given segment-next-segment pair it averages those all together and reports that when its time. This means that we lose some of the ability to measure variance unless we get some observations for this pair later on in wall time (but for the same point in gps time). So what we'll want to do is not just average all the measurements together. We'll want to at the point when we go to emit these measurements group them in such a way as to still be able to measure variance but also not skew the averages.
Say you have 5 observations for a given segment-next-segment pair. You have your privacy setting to 2 which means you have enough data to emit these observations in some form. Today we average all of these into one measurement with a count of 5. But to preserve the ability to measure variance we should probably emit 2 measurements, one with a count of 2 and one with a count of 3. We need an heuristic to do that though. Lets say of the 5 observations we have durations: 10, 12, 20, 25, 65
How do we group these observations so that we most accurately represent the data?