s3alfisc / fwildclusterboot

Fast Wild Cluster Bootstrap Inference for Regression Models / OLS in R. Additionally, R port to WildBootTests.jl via the JuliaConnectoR.
https://s3alfisc.github.io/fwildclusterboot/
GNU General Public License v3.0
22 stars 4 forks source link

uniroot error in boottest when r is too close to coefficient #138

Closed zeileis closed 10 months ago

zeileis commented 10 months ago

When running some simulations we stumbled upon the following problem: When the value under the null hypothesis r is too close to the estimated coefficient param to be tested, then a uniroot() call in boottest() fails.

A minimal example using the PetersenCL data from sandwich with R 4.3.1 and fwildclusterboot 0.13.0 is:

data("PetersenCL", package = "sandwich")
m <- lm(y ~ x, data = PetersenCL)
boottest(m, param = "x", B = 199, clustid = "firm", r = 1.03)
## Error in if (!is.numeric(lower) || !is.numeric(upper) || lower >= upper) stop("lower < upper  is not fulfilled") : 
##   missing value where TRUE/FALSE needed

When digging into the code, I observed that the error occurs in invert_p_val(). This sets up grid_vals around point_estimate and computes p_val_null2_x() for each of these. But as all of these return -0.05 here, there are no crossings which leads to NAs being passed to uniroot(). Maybe this is helpful for you to see where this comes from.

s3alfisc commented 10 months ago

Hi @zeileis , thanks for reporting!

I think I know what's going on: in order to invert the bootstrap test, I evaluate a function that produces p-values under different nulls. This gives me a set of p-values. You can inspect them via the plot() method. E.g. for a null of beta = 1.01, boottest produces results and the following plot below:

data("PetersenCL", package = "sandwich")
m <- lm(y ~ x, data = PetersenCL)
rm(boot)
boot <- boottest(m, param = "x", B = 199, clustid = "firm", r = 1.01)
plot(boot)

image

What happens in the example that you describe is that the algorithms likely does not find proper starting values (as marked by the blue lines) - or in other words, there is only one computed bootstrapped p-value above the significance level of 0.05.

I'll try to tackle this early next week - reworking the p-value inversion has been on my to do list for a while =) I also have a discussion of how to obtain p-values by CI inversion here - maybe it helps if by "we" you are referring to one of your students =)

If you do not need CIs for your simulations, you can set the conf_int = FALSE function argument, and then you'll get rid of this error.

Btw, I'm looking forward to learn more about your simulation results! =)

Best, Alex

s3alfisc commented 10 months ago

Hi, I have put a fix in the dev branch. In a nutshell, I add a user option to overwrite the default starting value for finding a starting value (😅 ). This is how you get around the error:

library(fwildclusterboot)

data("PetersenCL", package = "sandwich")
m <- lm(y ~ x, data = PetersenCL)
coef(m) #  0.02967972  1.03483344 

dqrng::dqset.seed(3)
# throws an error
boot <- boottest(m, param = "x", B = 199, clustid = "firm", r = 1.03)
# runs with better starting value
boot <- boottest(m, param = "x", B = 1999, clustid = "firm", r = 1.03, se_guess = 0.001)
plot(boot)
summary(boot)
# boottest.lm(object = m, param = "x", B = 1999, clustid = "firm", 
#             r = 1.03, se_guess = 0.001)
# 
# Hypothesis: 1*x = 1.03
# Observations: 5000
# Bootstr. Type: rademacher
# Clustering: 1-way
# Confidence Sets: 95%
# Number of Clusters: 500
# 
# term estimate statistic p.value conf.low conf.high
# 1 1*x = 1.03    1.035     0.096   0.925    0.934     1.134

image

I still have to fix some issues in the dev branch before merging, but this feature should be good to use.

zeileis commented 10 months ago

Thanks for looking into this!

In fact, we are only using boottest() for computing the confidence interval (because we want to compare it to Wald-type confidence intervals with different bootstrap standard errors). As the confidence interval is almost invariant to the value of r, we simply make sure that the value of r we use is not too close to the empirical estimate.

This made me wonder whether it is not possible to find a better workaround that does not require the user to change a rather technical argument. Wouldn't it be possible to just shift or extend the search grid if needed?

s3alfisc commented 10 months ago

This made me wonder whether it is not possible to find a better workaround that does not require the user to change a rather technical argument. Wouldn't it be possible to just shift or extend the search grid if needed?

Yes, you're right - it should not even be that hard. I was just somewhat lazy yesterday, and so I went for the lazy solution 😅 . I'll try to get to it tomorrow evening!

s3alfisc commented 10 months ago

This should be fixed with #142. Out of the box, you can now impose r = 1.0348 and get sensible results:

library(fwildclusterboot)

data("PetersenCL", package = "sandwich")
m <- lm(y ~ x, data = PetersenCL)
summary(m)
boot <- boottest(m, param = "x", B = 199, clustid = "firm", r = 1.0348)
summary(boot)
# boottest.lm(object = m, param = "x", B = 199, clustid = "firm", 
#             r = 1.0348)
# 
# Hypothesis: 1*x = 1.0348
# Observations: 5000
# Bootstr. Type: rademacher
# Clustering: 1-way
# Confidence Sets: 95%
# Number of Clusters: 500
# 
# term estimate statistic p.value conf.low conf.high
# 1 1*x = 1.0348    1.035     0.001       1    0.951     1.117

plot(boot)

image

You should be able to install a compiled version from r-universe by running

install.packages('fwildclusterboot', repos ='https://s3alfisc.r-universe.dev')

in a bit =) It will be a while before I submit to CRAN - there are a couple of changes in the dev branch that I still need to finalize.

zeileis commented 10 months ago

Nice, thanks for your quick fix!