In this issue, I want to start a conversation about how we can make descriptive plots when the data series we are making are quite noisy.
Problem
I am currently using synthetic data, but this can be revisited once we finish the remaining experiment scripts. The synthetic data has the form
y = signal(x) + noise(x)
where
signal(x) := L / (1 + exp(-k(x-x0)))
is the generalized logistic function (which is—roughly—what our models give us) and
noise(x) ~ N(mu, sigma)
Suppose I have two sets of signal/noise parameters, with a fixed number of realizations of each. Below, I plot the mean realizations for both sets, as well as the 95% (student) confidence intervals for each.
Obviously there are stylistic questions to be addressed, but the plot actually looks pretty good. But look at the number of steps in the iterations. Let me show what happens when we go from 50 to 500 steps.
It's starting to get hard to read this figure. Of course, we are actually working on the scale of 50,000 runs or even 500,000. Let's see those too.
These figures are basically junk.
Some solutions
One option on these large series is to do direct downsampling. Set a ds threshold, and then only plot the series like series[::ds]. This gives us something like (50,000 steps, ds=500):
Alternatively, we could do moving averages. As an example, here is a simple moving average (window of 500):
But it's unclear to me which of these is preferable, or if we should take an entirely different approach.
In this issue, I want to start a conversation about how we can make descriptive plots when the data series we are making are quite noisy.
Problem
I am currently using synthetic data, but this can be revisited once we finish the remaining experiment scripts. The synthetic data has the form
where
is the generalized logistic function (which is—roughly—what our models give us) and
Suppose I have two sets of signal/noise parameters, with a fixed number of realizations of each. Below, I plot the mean realizations for both sets, as well as the 95% (student) confidence intervals for each.
Obviously there are stylistic questions to be addressed, but the plot actually looks pretty good. But look at the number of steps in the iterations. Let me show what happens when we go from 50 to 500 steps.
It's starting to get hard to read this figure. Of course, we are actually working on the scale of 50,000 runs or even 500,000. Let's see those too.
These figures are basically junk.
Some solutions
One option on these large series is to do direct downsampling. Set a
ds
threshold, and then only plot the series likeseries[::ds]
. This gives us something like (50,000 steps,ds=500
):Alternatively, we could do moving averages. As an example, here is a simple moving average (window of 500):
But it's unclear to me which of these is preferable, or if we should take an entirely different approach.