In this issue, I want to start a conversation about how we can make descriptive plots when the data series we are making are quite noisy.

Problem

I am currently using synthetic data, but this can be revisited once we finish the remaining experiment scripts. The synthetic data has the form

y = signal(x) + noise(x)

where

signal(x) := L / (1 + exp(-k(x-x0)))

is the generalized logistic function (which is—roughly—what our models give us) and

noise(x) ~ N(mu, sigma)

Suppose I have two sets of signal/noise parameters, with a fixed number of realizations of each. Below, I plot the mean realizations for both sets, as well as the 95% (student) confidence intervals for each.

plot

Obviously there are stylistic questions to be addressed, but the plot actually looks pretty good. But look at the number of steps in the iterations. Let me show what happens when we go from 50 to 500 steps.

plot

It's starting to get hard to read this figure. Of course, we are actually working on the scale of 50,000 runs or even 500,000. Let's see those too.

plot plot

These figures are basically junk.

Some solutions

One option on these large series is to do direct downsampling. Set a ds threshold, and then only plot the series like series[::ds]. This gives us something like (50,000 steps, ds=500):

plot

Alternatively, we could do moving averages. As an example, here is a simple moving average (window of 500):

plot

But it's unclear to me which of these is preferable, or if we should take an entirely different approach.

wessle / costaware

Plotting noisy data #8

Problem

Some solutions