Interpolate percentiles when appropriate

hotsphink commented 1 month ago

This also came up in #1639 and #2054. The latter discussed two different solutions to two mostly different problems, and so the issue ended up being about visual smoothing over time. I didn't reuse #1639 because I didn't totally follow the discussion there (particularly @chutten's second point), and that issue might have different solutions.

We have many probes that produce bucketed samples of continuous values (eg pretty much anything that is measuring time). The percentile appears to be using a nearest-rank method. As can be seen in @jonco3's example in #2054, this produces choppy results and does not correspond to the underlying continuous distribution. (As the wikipedia article puts it, "The simplest are nearest-rank methods... although compared to interpolation methods, results can be a bit crude.") This is especially problematic for high percentiles (95th and 99th) that are more likely to flip flop between buckets. Interpolation would make the display more sensitive to the changes we care about.

Given that we know the underlying distribution is continuous, we'd like to be able to see an interpolation between closest ranks. We're fine if it must be selected manually, or if it requires labeling the probe as continuous or whatever.

edugfilho commented 1 month ago

I've been working on a PoC for interpolation. My first try was using a nearest-rank method and that didn't change much (the percentile lines were still choppy), then I experimented with a moving average in which the window size is 1% of the dataset on the percentiles, and that did yield some results, as expected (see video below). I invite you to check it out on GLAM dev, pick a different probe and tell me what you think. In the meantime I'll implement between closest ranks and put together a side-by-side comparison.

hotsphink commented 1 month ago

Oh oops, I misread your comment as saying you had already done an interpolation between closest ranks, which was confusing because that's not what the result looked like to me. But I now see you're talking about the time-based smoothing.

I would prefer interpolation between closest buckets and not having any smoothing over time. Smoothing over time (the moving average) makes for nicer graphs and is a good option to have available, but it adds delay and attenuation. Given that most of the choppiness is artificial for samples of continuous values, I'd rather see a pixel based only on the buckets from the time range corresponding to that pixel, if that makes sense? If there were to be a code change that legitimately changed a value, that should show up and be right at the time when the change occurred. There will still be lots of wiggles and variance that will obscure any changes, but at least that's innate.

jonco3 commented 1 month ago

As Steve said, smoothing over time hides information and makes the graphs harder to interpret. Ideally we would do something to help us extract more information from the large amount of data we collect. Some kind of interpolation between buckets would be a great help.

hotsphink commented 1 month ago

Also, thank you very much for looking into this! It's one of main things that we continued using the legacy telemetry displays and custom queries for.

edugfilho commented 1 month ago

I really appreciate all the input!

edugfilho commented 1 month ago

https://dev.glam.nonprod.dataops.mozgcp.net can you please test and give feedback?

hotsphink commented 1 month ago

I just checked it out, and it looks exactly how I was expecting it to. Though @jonco3 has been using his own custom query for this, and would probably have a better sense for it.

Thank you!

edugfilho commented 1 month ago

I'm really glad this is finally coming out and it's the expectation. It wasn't my intention to close this with the last PR. I'll leave this open for feedback and discussion until tomorrow when I'll promote it to prod :)

mozilla / glam

Interpolate percentiles when appropriate #2985