Closed dkahle closed 3 years ago
I've been digging into this, it's a very strange bug. It looks like uniroot
is the culprit, it is not finding the correct density values which yield the desired HDRs. Below, I have included a reprex which matches what's going on internally (I've verified through the use of browser
). Notice, the output of find_cutoff
makes no sense -- it should be on the same scale as df$fhat
and should be non-zero.
I've also included the full output of uniroot
for the 50% HDR. Notice, f.root
is nowhere near 0 and iter
is small. (I have tried with higher values of maxiter
, but it hasn't fixed the issue -- it still stops at 27). The value of estim.prec
indicates to me that uniroot
"knows" something is wrong, but I don't know how to fix it.
Have you run into something like this before? Do you know what might be causing this?
library("tidyverse"); theme_set(theme_minimal())
library("ggdensity")
N <- 1e3
df <- tibble(x = rnorm(N, 0, 200), y = rnorm(N, 0, 200))
h <- c(MASS::bandwidth.nrd(df$x), MASS::bandwidth.nrd(df$y))
kdeout <- MASS::kde2d(
x = df$x, y = df$y, n = c(100, 100), h = h,
lims = c(
scales::expand_range(range(df$x), .10),
scales::expand_range(range(df$y), .10)
)
)
df <- expand.grid(x = kdeout$x, y = kdeout$y)
normalize <- function(v) v / sum(v)
df$fhat <- as.vector(kdeout$z)
df$fhat_discretized <- normalize(df$fhat)
prob_above_c <- function(df, c) {
if (length(c) > 1) return(sapply(c, prob_above_c))
sum(df$fhat_discretized[df$fhat >= c])
}
find_cutoff <- function(df, conf) {
if (length(conf) > 1) return(vapply(conf, \(x) find_cutoff(df, x), numeric(1)))
uniroot(\(c) prob_above_c(df, c) - conf, lower = 0, upper = 1e4, check.conv = TRUE)$root
}
find_cutoff(df, c(.99, .95, .80, .5))
#> [1] 0.000000e+00 0.000000e+00 0.000000e+00 7.450581e-05
uniroot(\(c) prob_above_c(df, c) - .5, lower = 0, upper = 1e4, check.conv = TRUE)
#> $root
#> [1] 7.450581e-05
#>
#> $f.root
#> [1] -0.5
#>
#> $iter
#> [1] 27
#>
#> $init.it
#> [1] NA
#>
#> $estim.prec
#> [1] 7.450581e-05
Created on 2021-10-04 by the reprex package (v2.0.0)
Okay, I've fixed the problem. It seems that if df$fhat
is on too small of a scale, uniroot
fails in the find_cutoff
function. I've implemented the fix here: 3ada8f0. I have also included a reprex below, showing that your example no longer fails.
library("tidyverse"); theme_set(theme_minimal())
library("ggdensity")
N <- 1e3
df <- tibble(x = rnorm(N, 0, 200), y = rnorm(N, 0, 200))
ggplot(df, aes(x, y)) + geom_point()
ggplot(df, aes(x, y)) + geom_hdr()
df <- df / 200
ggplot(df, aes(x, y)) + geom_point()
ggplot(df, aes(x, y)) + geom_hdr()
Created on 2021-10-21 by the reprex package (v2.0.0)
Any ideas what's going on here? Seems like the scale of the numbers matters.
Created on 2021-09-25 by the reprex package (v2.0.1)