biased estimation of model parameters?

cdiener commented 3 years ago

Hi,

I noticed that the estimation of model parameters seems to struggle a bit with converging when phi >> mu. In particular for the following example:

library(corncob)
library(data.table)
library(parallel)
library(magrittr)
library(patchwork)
library(ggplot2)

theme_minimal() %>% theme_set()

p <- 0.01
rho <- 0.1
depth <- 10000

get_pars <- function(n) {
    M <- VGAM::rbetabinom(n, size = depth, p = p, rho = rho)
    df <- data.frame(M = M, W = depth)
    fit <- bbdml(
        formula = cbind(M, W-M) ~ 1,
        phi.formula = ~ 1,
        data = df
    )
    return(data.frame(mu = fit$mu.resp[1], phi = fit$phi.resp[1, 1], n=n, var = var(M)))
}

estimates <- mclapply(round(10^seq(1, 4, by = 0.25)), get_pars) %>% rbindlist()

mu_plot <- ggplot(estimates, aes(x=n, y=mu)) +
    geom_point() + geom_hline(yintercept = p) +
    scale_x_log10()

phi_plot <- ggplot(estimates, aes(x=n, y=phi)) +
    geom_point() + geom_hline(yintercept = rho) +
    scale_x_log10()

print(mu_plot + phi_plot)

params

You get a good estimation for mu and phi eventually but it requires a very large n (>1000) and seems to be biased before convergence (underestimation). Do you know a strategy to improve on that? We observed that this often hampers the power for some taxa and we get phi>mu quite often for negative samples (estimated from healthy untreated mice fecal samples).

EDIT: had a typo in the code

adw96 commented 3 years ago

Hi @cdiener -- maximum likelihood estimates aren't unbiased (only consistent). So this isn't really surprising to me, and we can't "fix" this without invalidating our hypothesis tests. That's my rapidfire 2c on this :) Amy

cdiener commented 3 years ago

Ok, I see, Thanks! Yeah, I guess that is maybe a limitation of the data.

bryandmartin commented 3 years ago

Hi @cdiener ,

I don't have much to add to that, so I'm going to close this issue. But feel free to re-open if you have a follow up question!

Bryan

statdivlab / corncob

biased estimation of model parameters? #115