Closed s3alfisc closed 2 years ago
A related questions: if small = FALSE
, then all small sample adjustment are dropped from the covariate calculation, e.g. (N-k)/N (G-1)/G, and not only the (G-1)/G part or the (N-k)/N part?
My Julia and Stata implementations should be just deleting the infeasible test stats. The Julia code seems to be doing that in this example:
using WildBootTests
using StatFiles, StatsModels, DataFrames, DataFramesMeta, BenchmarkTools, Plots, CategoricalArrays, Random, StableRNGs
df = DataFrame(load(raw"d:\OneDrive\Documents\Macros\nlsw88.dta"))
df = df[:, [:wage, :ttl_exp, :collgrad, :tenure, :age, :industry, :occupation, :hours]]
dropmissing!(df)
f = @formula(wage ~ ttl_exp + collgrad + tenure) # (no constant in this model...)
f = apply_schema(f, schema(f, df))
resp, predexog = modelcols(f, df)
test=wildboottest([0 0 1], [0]; resp, predexog, clustid=Matrix(df[:, [:occupation, :age]]), nbootclustvar=2, nerrclustvar=2, reps=15, rng=StableRNG(1231))
dist(test)
Here, I get
WildBootTests.BoottestResult{Float32}
t = -0.774
p = 0.571
CI = Float32[-0.17448837 -0.15408598; -0.13368359 0.049937926]
julia> dist(test)
15×1 Matrix{Float32}:
-4.1656623
-2.953216
-1.9951928
-1.0425929
-0.856438
-0.566741
-0.3556045
-0.14136912
-0.020604081
0.29054025
0.39634967
1.0857376
1.1633673
1.2886974
NaN
So 1 of the 15 replications is infeasible and the p value is 8/14. (Data set is here.)
Related: for one-way clustered se, I fail to replicate equality of t-stats between Julia and R. I check the R t-stat against fixest::feols()
. Maybe I don't understand how the small sample correction works after all?
library(fixest)
library(fwildclusterboot)
library(wildboottestjlr)
N <- 1000
N_G1 <- 5
voters2 <- fwildclusterboot:::create_data(N = N,
N_G1 = N_G1,
icc1 = 0.5,
N_G2 = 2,
icc2 = 0.2,
numb_fe1 = 5,
numb_fe2 = 5,
seed = 411124,
weights = 1:N)
# calculate t-stat without (N-1) / (N - k) adjustment, but with G / (G-1)
# as this is how it is implemented if fwildclusterboot
feols_fit <- feols(proposition_vote ~ treatment,
data = voters2,
cluster = "group_id1",
ssc = ssc(adj = FALSE,
cluster.adj = TRUE,
cluster.df = "conventional"))
dof_tstat <- fixest::coeftable(feols_fit)[c("treatment"), 3]
# need to estimate via lm, as boottest() currently throws an error if it
# detects the ssc() function argument in fixest
lm_fit <- lm(proposition_vote ~ treatment,
data = voters2)
k <- 2 # two estimated parameters
# oneway clustering
boot_r <- fwildclusterboot::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment", type = "rademacher", p_val_type = "two-tailed")
boot_jl_small <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment", type = "rademacher", p_val_type = "two-tailed", small_sample_adjustment = TRUE)
boot_jl_nosmall <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment", type = "rademacher", p_val_type = "two-tailed", small_sample_adjustment = FALSE)
boot_r$p_val; boot_jl1$p_val[1]
k <- 2
G <- N_G1
cat("fwildclusterboot", boot_r$t_stat)
# > cat("fwildclusterboot", boot_r$t_stat)
# fwildclusterboot 1.796745
cat("fixest", dof_tstat)
# > cat("fixest", dof_tstat)
# fixest 1.796745
cat("wildboottestjlr no small adj", boot_jl_small$t_stat * (G -1 ) / G)
# > cat("wildboottestjlr no small adj", boot_jl_small$t_stat * (G -1 ) / G)
# wildboottestjlr no small adj 1.436676
cat("wildboottestjlr small adj", boot_jl_small$t_stat * (N-k) / (N-1))
# > cat("wildboottestjlr small adj", boot_jl_small$t_stat * (N-k) / (N-1))
# wildboottestjlr small adj 1.794048
Thanks for your feedback!
After I've understood why the t-stats don't match and why p-values for multi-way clustering differ for enumeration, I think I can make the repo public and then check the WRE against STATA once I have renewed my license.
After reconsidering, the small sample adjustments based on N and k should not have an effect on invalid test statistics as all covariances are multiplied by the same value. So I can rule this out as the source of the difference.
Nevertheless, either my understanding or something in Julia is off:
I cannot match t-stats for small = FALSE
and small = TRUE
when applying the small sample correction (N-k)/(N-1) * G / (G-1) by hand:
N <- 1000
k <- 2
G <- N_G1
boot_jl_small <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment", type = "rademacher", p_val_type = "two-tailed", small_sample_adjustment = TRUE)
boot_jl_nosmall <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment", type = "rademacher", p_val_type = "two-tailed", small_sample_adjustment = FALSE)
cat("wildboottestjlr no small adj", boot_jl_nosmall$t_stat * G / (G-1) * (N-1) / (N-k))
# wildboottestjlr no small adj 2.513543
cat("wildboottestjlr small adj", boot_jl_small$t_stat )
wildboottestjlr small adj 1.795845
Anyways, the differences between t-stats and bootstrapped t-stats are very small, so maybe it's just numerical differences between R and Julia (e.g. due to different linear algebra libraries)?
cbind(sort(boot_r$t_boot),
c(sort(boot_jl_small$t_boot * 998/999)))
[,1] [,2]
[1,] -2.43926552 -2.43560387
[2,] -1.82433044 -1.82159189
[3,] -1.79674504 -1.79404790
[4,] -1.49384805 -1.49160559
...
Beyond that: I must be doing something wrong with the bootcluster. Update will follow.
I think I fixed a bug that led to the error with multiway clustering. p-values and t-stat under full enumeration are now almost exactly identical (difference 1e-03 for t-stats, much less for p-values).
Good.... I think that G / (G-1) * (N-1) / (N-k)
needs to be square-rooted.
Oh yes, that was a stupid mistake! Thanks for spotting it.
With the correct adjustment, I can now exactly match t-stats and bootstrap t-stats computed via fwildclusterboot
and wildboottestjlr
.
Btw, as we have discussed how the (N-1)/(N-k)
small sample adjustment affects inference: for multi-way clustering, it of course matters, because it influences if the bootstrap covariance matrix is positive semi-definite or not.
Hi David,
I have started to test
wildboottestjlr
againstfwildclusterboot
. In general, results are equivalent for a large number of bootstrap iterations.Testing under full enumeration, both packages lead to exactly identical p-values for oneway-clustering.
But for multi-way clustering & full enumeration, p-values sometimes differ, namely when
fwildclusterboot::boottest()
drops t-statistics due to non-positive definite covariance matrices.E.g. for for N_G1 = 2 & N_G2 = 2 and the bootcluster is the inersection of the two. Then there are 2^4 unique p-values. Now sometimes
fwildclusterboot::boottest
might drop to t-statistics, e.g. there are only 14 valid t-stats computed. The p-value is then computed based on the 14 valid t-stats and is always in 1:14 / 14.wildboottestjlr::boottest()
instead does not seem to drop the t-stats, and p-values are always correctly within 1:16/16.I have checked if this is due to different small sample corrections by setting
small = FALSE
, but these differences remain.How do you handle invalid t-stats? Are they deleted?
If yes, there must be a bug in my code.