Closed seananderson closed 3 years ago
Sean, yes, this is a coding error. Sums of beta-binomials or dirichlet-multinomials do not also have the same distribution. Thank you for finding it. We'll add it to the list. Fortunately, this simulation feature hasn't really been used yet as far as I know.
On Thu, Sep 9, 2021 at 2:11 PM Sean Anderson @.***> wrote:
I could be wrong, but should the rdirichlet part here https://github.com/timjmiller/wham/blob/0c53be0043ec67963b1ae1b1519ac5004a68b8b2/src/helper_functions.hpp#L144 in the rdirmultinom function be pulled out of the loop? By sampling from the Dirichlet on every observation, the average works out to the correct proportions (even with low phi values) and you lose the degraded sample-size effect. Here's a graphical demo:
rdirichlet <- function(p, phi) { alpha <- p phi obs <- rgamma(length(p), alpha, 1) obs / sum(obs) } rdirmultinom_wham <- function(N, p, phi) { obs <- rep(0, length(p)) for (i in seq_len(N)) { dp <- rdirichlet(p, phi) # should be outside the loop? obs <- obs + rmultinom(n = 1, prob = dp, size = 1) } obs } rdirmultinom2 <- function(N, p, phi) { dp <- rdirichlet(p, phi) rmultinom(n = 1, prob = dp, size = N) # or add up 1 by 1 as above with dp outside the loop } sim_rdirmult <- function(N, phi, type = c("wham", "2"), p = c(0.1, 0.1, 0.4, 0.4)) { type <- match.arg(type) out <- matrix(ncol = length(p), nrow = 100, data = 0) if (type == "wham") { for (i in seq_len(nrow(out))) out[i, ] <- rdirmultinom_wham(N, p, phi) } else { for (i in seq_len(nrow(out))) out[i, ] <- rdirmultinom2(N, p, phi) } plot(1, ylim = c(1, ncol(out)), xlim = c(1, nrow(out)), type = "n") out <- out / N for (i in seq(1, nrow(out))) { symbols(rep(i, ncol(out)), seq_along(out[1, ]), circles = out[i, ] 4, inches = FALSE, add = TRUE ) } }
look nearly identical because the Dirichlet is resampled for every observation:
sim_rdirmult(100, 0.1)
sim_rdirmult(100, 1e3)
with this version, they look different:
sim_rdirmult(100, 0.1, type = "2")
sim_rdirmult(100, 1e3, type = "2")
get_neff <- function(N, phi) { (N + N * phi) / (N + phi) }
these should be about the same but aren't here:
sim_rdirmult(300, 40)
sim_rdirmult(get_neff(300, 40), 1e9)
with this version they approximately match:
sim_rdirmult(300, 40, type = "2")
sim_rdirmult(get_neff(300, 40), 1e9, type = "2")
Created on 2021-09-09 by the reprex package https://reprex.tidyverse.org (v2.0.0)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/timjmiller/wham/issues/49, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIGN7DDAMZRFOVCVKQHE3TUBD2FDANCNFSM5DXWGAGA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
-- Timothy J. Miller, PhD (he, him, his) Research Fishery Biologist NOAA, Northeast Fisheries Science Center Woods Hole, MA 508-495-2365
Thanks for reporting this bug. Tim fixed this on devel
, see commit 39d356
.
I could be wrong, but should the
rdirichlet
part here in therdirmultinom
function be pulled out of the loop? By sampling from the Dirichlet on every observation, the average works out to the correct proportions (even with lowphi
values) and you lose the degraded sample-size effect. Here's a graphical demo:Created on 2021-09-09 by the reprex package (v2.0.0)