Bug: CI not calculated with "R-lean" engine and not working with "R" engine

kv9898 commented 11 months ago

Hi fwildclusterboot team!

I'm trying to replicate Eckhouse (2021) with the fwildclusterboot package. I've attached the relevant code and data. However, with the default "R" engine, it fails to generate a result and reports an error "Error in solve.default(crossprod(weights_sq * X)) : system is computationally singular: reciprocal condition number = 1.92859e-17"

However, it does generate a result when I use the "R-lean" engine. Nevertheless, no confidence intervals are calculated. I noted that I could generate confidence intervals using boottest in Stata, so theoretically this should work with R! Also, I'm looking forward to the confidence interval/weighted regression functionalities for heteroskedastic wild boot!

Many thanks!

bugreport.zip

kv9898 commented 11 months ago

Also, every time I run heteroskedastic (no cluster) wild bootstrap with the Julia engine, I get an error of "Error in integer(nrow_df) : invalid 'length' argument" - I tried it with different datasets and this error repeated.

s3alfisc commented 11 months ago

Hi @kv9898 - some of this is a bug! Thanks for reporting =)

If you try to run the heteroskedastic bootstrap via the R engine, it should throw an error - the heteroskedastic bootstrap is only implemented via the "R-lean" algo. It is also not supported via the Julia engine, even though WildBootTests.jl can clearly run a heteroskedastic wild bootstrap. Sorry! If I find the time, I will add support for both.

Confidence intervals are not supported via the heteroskedastic bootstrap. If you really need CIs, and your sample is not too large, there is a quick 'hack' that consists of creating singletons clusters (as the CRV1 and HC1 bootstrap collapse when all clusters are singletons):

library(fwildclusterboot)
data(voters)
voters$clustid = 1:nrow(voters)

fit = lm(proposition_vote ~ treatment, data = voters)

hc = boottest(fit, param = ~ treatment, B = 999)
hc_hack = boottest(fit, param = ~ treatment, clustid = ~clustid, B = 999, ssc = boot_ssc(cluster.adj = FALSE))

summary(hc)
# boottest.lm(object = fit, param = ~treatment, B = 999)
# 
# Hypothesis: 1*treatment = 0
# Observations: 300
# Bootstr. Type: rademacher
# Clustering: 0-way
# Confidence Sets: 95%
# 
# term estimate statistic p.value conf.low conf.high
# 1 1*treatment = 0    0.088     3.144       0       NA        NA

summary(hc_hack)
# boottest.lm(object = fit, param = ~treatment, B = 999, clustid = ~clustid, 
#             ssc = boot_ssc(cluster.adj = FALSE))
# 
# Hypothesis: 1*treatment = 0
# Observations: 300
# Bootstr. Type: rademacher
# Clustering: 1-way
# Confidence Sets: 95%
# Number of Clusters: 300
# 
# term estimate statistic p.value conf.low conf.high
# 1 1*treatment = 0    0.088     3.144   0.002    0.032     0.143

As you can see, both non-bootstrapped t-statistics are identical. P-values differ slightly as different random number generators are used. Both the second option produces CIs! Note that you need to set the cluster.adj = FALSE argument to stop boottest() from setting small sample adjustments for clustering.

I hope this helps!

kv9898 commented 11 months ago

Thank you so much!!!

s3alfisc / fwildclusterboot

Bug: CI not calculated with "R-lean" engine and not working with "R" engine #135