Multi-way clustering and invalid test statistics

s3alfisc commented 2 years ago

Hi David,

I have started to test wildboottestjlr against fwildclusterboot. In general, results are equivalent for a large number of bootstrap iterations.

Testing under full enumeration, both packages lead to exactly identical p-values for oneway-clustering.

But for multi-way clustering & full enumeration, p-values sometimes differ, namely when fwildclusterboot::boottest() drops t-statistics due to non-positive definite covariance matrices.

E.g. for for N_G1 = 2 & N_G2 = 2 and the bootcluster is the inersection of the two. Then there are 2^4 unique p-values. Now sometimes fwildclusterboot::boottest might drop to t-statistics, e.g. there are only 14 valid t-stats computed. The p-value is then computed based on the 14 valid t-stats and is always in 1:14 / 14. wildboottestjlr::boottest() instead does not seem to drop the t-stats, and p-values are always correctly within 1:16/16.

I have checked if this is due to different small sample corrections by setting small = FALSE, but these differences remain.

How do you handle invalid t-stats? Are they deleted?

If yes, there must be a bug in my code.

s3alfisc commented 2 years ago

A related questions: if small = FALSE, then all small sample adjustment are dropped from the covariate calculation, e.g. (N-k)/N (G-1)/G, and not only the (G-1)/G part or the (N-k)/N part?

droodman commented 2 years ago

My Julia and Stata implementations should be just deleting the infeasible test stats. The Julia code seems to be doing that in this example:

using WildBootTests
using StatFiles, StatsModels, DataFrames, DataFramesMeta, BenchmarkTools, Plots, CategoricalArrays, Random, StableRNGs
df = DataFrame(load(raw"d:\OneDrive\Documents\Macros\nlsw88.dta"))
df = df[:, [:wage, :ttl_exp, :collgrad, :tenure, :age, :industry, :occupation, :hours]]
dropmissing!(df)
f = @formula(wage ~ ttl_exp + collgrad + tenure)  # (no constant in this model...)
f = apply_schema(f, schema(f, df))
resp, predexog = modelcols(f, df)
test=wildboottest([0 0 1], [0]; resp, predexog, clustid=Matrix(df[:, [:occupation, :age]]), nbootclustvar=2, nerrclustvar=2, reps=15, rng=StableRNG(1231))
dist(test)

Here, I get

WildBootTests.BoottestResult{Float32}

t  = -0.774
p  = 0.571
CI = Float32[-0.17448837 -0.15408598; -0.13368359 0.049937926]

julia> dist(test)
15×1 Matrix{Float32}:
  -4.1656623
  -2.953216
  -1.9951928
  -1.0425929
  -0.856438
  -0.566741
  -0.3556045
  -0.14136912
  -0.020604081
   0.29054025
   0.39634967
   1.0857376
   1.1633673
   1.2886974
 NaN

So 1 of the 15 replications is infeasible and the p value is 8/14. (Data set is here.)

s3alfisc commented 2 years ago

Related: for one-way clustered se, I fail to replicate equality of t-stats between Julia and R. I check the R t-stat against fixest::feols(). Maybe I don't understand how the small sample correction works after all?

library(fixest)
library(fwildclusterboot)
library(wildboottestjlr)

N <- 1000
N_G1 <- 5
voters2 <- fwildclusterboot:::create_data(N = N,
                                          N_G1 = N_G1,
                                          icc1 = 0.5,
                                          N_G2 = 2,
                                          icc2 = 0.2,
                                          numb_fe1 = 5,
                                          numb_fe2 = 5,
                                          seed = 411124,
                                          weights = 1:N)

# calculate t-stat without (N-1) / (N - k) adjustment, but with G / (G-1)
# as this is how it is implemented if fwildclusterboot
feols_fit <- feols(proposition_vote ~ treatment, 
                         data = voters2, 
                         cluster = "group_id1", 
                         ssc = ssc(adj = FALSE,
                                   cluster.adj = TRUE, 
                                   cluster.df = "conventional"))    
dof_tstat <- fixest::coeftable(feols_fit)[c("treatment"), 3]

# need to estimate via lm, as boottest() currently throws an error if it
# detects the ssc() function argument in fixest

lm_fit <- lm(proposition_vote ~ treatment, 
                   data = voters2)

k <- 2 # two estimated parameters

# oneway clustering
boot_r <- fwildclusterboot::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment", type = "rademacher", p_val_type = "two-tailed")
boot_jl_small <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment", type = "rademacher", p_val_type = "two-tailed", small_sample_adjustment = TRUE)
boot_jl_nosmall <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment", type = "rademacher", p_val_type = "two-tailed", small_sample_adjustment = FALSE)

boot_r$p_val; boot_jl1$p_val[1]

k <- 2
G <- N_G1

cat("fwildclusterboot", boot_r$t_stat)
# > cat("fwildclusterboot", boot_r$t_stat)
# fwildclusterboot 1.796745
cat("fixest", dof_tstat)
# > cat("fixest", dof_tstat)
# fixest 1.796745
cat("wildboottestjlr no small adj", boot_jl_small$t_stat *  (G -1 ) / G)
# > cat("wildboottestjlr no small adj", boot_jl_small$t_stat *  (G -1 ) / G)
# wildboottestjlr no small adj 1.436676
cat("wildboottestjlr small adj", boot_jl_small$t_stat * (N-k) / (N-1))
# > cat("wildboottestjlr small adj", boot_jl_small$t_stat * (N-k) / (N-1))
# wildboottestjlr small adj 1.794048

s3alfisc commented 2 years ago

Thanks for your feedback!

After I've understood why the t-stats don't match and why p-values for multi-way clustering differ for enumeration, I think I can make the repo public and then check the WRE against STATA once I have renewed my license.

After reconsidering, the small sample adjustments based on N and k should not have an effect on invalid test statistics as all covariances are multiplied by the same value. So I can rule this out as the source of the difference.

Nevertheless, either my understanding or something in Julia is off:

I cannot match t-stats for small = FALSE and small = TRUE when applying the small sample correction (N-k)/(N-1) * G / (G-1) by hand:

N <- 1000
k <- 2
G <- N_G1

boot_jl_small <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment", type = "rademacher", p_val_type = "two-tailed", small_sample_adjustment = TRUE)
boot_jl_nosmall <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment", type = "rademacher", p_val_type = "two-tailed", small_sample_adjustment = FALSE)

cat("wildboottestjlr no small adj", boot_jl_nosmall$t_stat *  G / (G-1) * (N-1) / (N-k))
# wildboottestjlr no small adj 2.513543
cat("wildboottestjlr small adj", boot_jl_small$t_stat )
wildboottestjlr small adj 1.795845

s3alfisc commented 2 years ago

Anyways, the differences between t-stats and bootstrapped t-stats are very small, so maybe it's just numerical differences between R and Julia (e.g. due to different linear algebra libraries)?

cbind(sort(boot_r$t_boot),
    c(sort(boot_jl_small$t_boot * 998/999)))
             [,1]        [,2]
 [1,] -2.43926552 -2.43560387
 [2,] -1.82433044 -1.82159189
 [3,] -1.79674504 -1.79404790
 [4,] -1.49384805 -1.49160559
...

Beyond that: I must be doing something wrong with the bootcluster. Update will follow.

s3alfisc commented 2 years ago

I think I fixed a bug that led to the error with multiway clustering. p-values and t-stat under full enumeration are now almost exactly identical (difference 1e-03 for t-stats, much less for p-values).

droodman commented 2 years ago

Good.... I think that G / (G-1) * (N-1) / (N-k) needs to be square-rooted.

s3alfisc commented 2 years ago

Oh yes, that was a stupid mistake! Thanks for spotting it.

With the correct adjustment, I can now exactly match t-stats and bootstrap t-stats computed via fwildclusterboot and wildboottestjlr.

Btw, as we have discussed how the (N-1)/(N-k) small sample adjustment affects inference: for multi-way clustering, it of course matters, because it influences if the bootstrap covariance matrix is positive semi-definite or not.

s3alfisc / wildboottestjlr

Multi-way clustering and invalid test statistics #20