meghapsimatrix / simhelpers

Helper package to assist in running simulation studies
10 stars 3 forks source link

Issue with monte carlo error of RMSE? #12

Closed edbonneville closed 3 years ago

edbonneville commented 3 years ago

Hi! I ran into this package when looking to calculate Monte-Carlo standard errors for RMSE estimates in a simulation study I have run.

I was wondering whether there is an issue with the implementation of the approximate jackknife estimator for the MCSE of the RMSE in calc_absolute() ? Below is an illustration comparing it to a bootstrap estimate of the MCSE:

library(simhelpers)

# Create dummy data
set.seed(2021)
dat <- data.frame("est" = rnorm(1000), "true" = 0)

# Calculate with simhelpers
res_simhelpers <- calc_absolute(
  res_dat = dat,
  estimates = "est",
  true = "true",
  perfm_criteria = "rmse"
)

# Estimate MCSE with bootstrap instead
B <- 1000
boots <- replicate(
  n = B,
  expr = {
    boot_sample <- dat[sample(1:nrow(dat), size = nrow(dat), replace = TRUE), ]
    mse <- mean((boot_sample$est - boot_sample$true)^2)
    return(sqrt(mse))
  }
)

# Check bootstrap RMSE is correct
paste0(
  "RMSE = ", round(res_simhelpers$rmse, 3), 
  "; RMSE (boot) = ", round(mean(boots), 3)
)
#> [1] "RMSE = 1.019; RMSE (boot) = 1.018"

# Compare MCSE's
paste0(
  "RMSE MCSE = ", round(res_simhelpers$rmse_mcse, 3), 
  "; RMSE MCSE (boot) = ", round(sd(boots), 3)
)
#> [1] "RMSE MCSE = 0.001; RMSE MCSE (boot) = 0.022"

So the MCSE estimate with calc_absolute() is a whole order of magnitude smaller that the one obtained by bootstrap - these should normally more or less coincide?

Thanks :-)

meghapsimatrix commented 3 years ago

@edbonneville thank you very much for your comment! I will look further into this soon.

meghapsimatrix commented 3 years ago

@jepusto flagging this. There were differences in the mcse's derived from the jackknife way and the delta method too! I'm going to check through the formulas once I have some mental bandwidth.

jepusto commented 3 years ago

I think we’re missing a leading (K - 1) term in the jackknife MCSE formula, which would explain the order-of-magnitude discrepancy. See equation 1.2 here: https://projecteuclid.org/download/pdf_1/euclid.aos/1176345462

meghapsimatrix commented 3 years ago

@edbonneville I incorporated @jepusto 's fix and the mcse of rmse from the development version of package should produce mcse that is similar to the one from bootstrapping. @jepusto I've fixed the other jacknife mcses as well. will double check soon.

@edbonneville thank you very much for catching this! @jepusto thanks for looking up the difference in the formula 😸

#devtools::install_github("meghapsimatrix/simhelpers")
library(simhelpers)

# Create dummy data
set.seed(2021)
dat <- data.frame("est" = rnorm(1000), "true" = 0)

# Calculate with simhelpers
res_simhelpers <- calc_absolute(
  res_dat = dat,
  estimates = "est",
  true = "true",
  perfm_criteria = "rmse"
)

# Estimate MCSE with bootstrap instead
B <- 10000
boots <- replicate(
  n = B,
  expr = {
    boot_sample <- dat[sample(1:nrow(dat), size = nrow(dat), replace = TRUE), ]
    mse <- mean((boot_sample$est - boot_sample$true)^2)
    return(sqrt(mse))
  }
)

# Check bootstrap RMSE is correct
paste0(
  "RMSE = ", round(res_simhelpers$rmse, 3),
  "; RMSE (boot) = ", round(mean(boots), 3)
)
#> [1] "RMSE = 1.019; RMSE (boot) = 1.019"

# Compare MCSE's
paste0(
  "RMSE MCSE = ", round(res_simhelpers$rmse_mcse, 3),
  "; RMSE MCSE (boot) = ", round(sd(boots), 3)
)
#> [1] "RMSE MCSE = 0.027; RMSE MCSE (boot) = 0.022"
edbonneville commented 3 years ago

@meghapsimatrix looks good - glad it could be easily fixed! Feel free to close this issue once you have finished any double checking for code/formulas :-)

jepusto commented 3 years ago

Thanks Megha! Pretty sure this was my error, from hastily writing out notes on jackknifing.

On Feb 13, 2021, at 7:50 AM, Edouard Bonneville notifications@github.com wrote:

 @meghapsimatrix looks good - glad it could be easily fixed! Feel free to close this issue once you have finished any double checking for code/formulas :-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

meghapsimatrix commented 3 years ago

@jepusto no worries! thanks for looking up the formula :)