tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.54k stars 2.03k forks source link

Bounded densities have discontinuities due to incomplete estimation of tails in the reflection method #5641

Closed mjskay closed 6 months ago

mjskay commented 10 months ago

I believe the current implementation of the reflection method for bounded densities has a minor bug which causes discontinuities in the density estimate.

For example, see this plot of a density estimate on uniform quantiles:

x = ppoints(100)
p = ggplot(data.frame(x), aes(x)) + 
    geom_density(bounds = c(0,1))
p

image

The discontinuity is visible, but small. If you zoom in:

p + coord_cartesian(ylim = c(.99, 1.01))

image

I believe this is happening because the tails are not being estimated to their full length, but only to 3*bw beyond the boundary, so when those tails are reflected back there is a discontinuity at 3*bw inside the boundary when the density of the reflected tail drops to exactly 0.

One solution to this is to estimate the tail density out to diff(range(x)) beyond the boundary. This is what ggdist's implementation of the reflection method does, so it does not have the discontinuity:

plot(ggdist::density_bounded(x, bounds = c(0,1)), ylim = c(0.99, 1.01))

image

Looking at the implementation in ggplot2, I'm not sure the best way to suggest fixing this issue since by the time reflect_density() is called the unbounded estimator has already been run. You'd probably at least have to change the bounds used by the unbounded estimator prior to the call to reflect_density().

teunbrand commented 10 months ago

Thanks for the report Matthew! I trust your density knowledge over mine any day, so if widening the range before running the estimator is the way to go, I'm on board.