s3alfisc / wildboottestjlr

Fast Wild Cluster Bootstrap Inference for OLS/IV in R based on WildBootTests.jl & JuliaConnectoR
Other
0 stars 0 forks source link

Unclear function argument #17

Open s3alfisc opened 2 years ago

s3alfisc commented 2 years ago

Hi, some function arguments are unclear to me:

droodman commented 2 years ago

Good questions.

s3alfisc commented 2 years ago

Hi David, thanks for your answers.

R does not distinguish between fweights and pweights at all. The documentation of lm() states

"Non-NULL weights can be used to indicate that different observations have different variances (with the values in weights being inversely proportional to the variances); or equivalently, when the elements of weights are positive integers (w_i), that each response (y_i) is the mean of (w_i) unit-weight observations (including the case that there are (w_i) observations equal to (y_i) and the data have been summarized). However, in the latter case, notice that within-group variation is not used. Therefore, the sigma estimate and residual degrees of freedom may be suboptimal; in the case of replication weights, even wrong. Hence, standard errors and analysis of variance tables should be treated with care."

Which I translate as: if the weights are frequency weights, lm() will not adjust inference for it.

I have to admit that I am not fully sure that I understand how probability weights and frequency weights will produce different inferences under multi-way clustering.

Following 'fast and wild', the multi-way covariate matrix can be written as

V = m_1 V_1 + m_2 V_2 - m_12 V_12

and m_i = (N - 1) / (N - K) (G_i - 1) / G_i for all i.

In consequence, V = (N - 1) / (N - K) x [(G_1 - 1) / G_1 V_1 + (G_i - 2) / G_2 V_2 - (G_12 - 1) / G_12 V_12].

This holds for both the bootstrapped and non-bootstrapped covariance matrix and in consequence for bootstrapped and non-bootstrapped t-statistics - both are just multiplied by the same factor A = (N - 1) / (N - K). In consequence, the p-value does not depend on A, as does the confidence interval? So my intuition is that only the non-bootstrapped t-statistic will differ, both for one-way and multi-way clustering, but nothing else.

For similar reasons, I also thought that no 'fixed effect small sample corrections' were needed? What am I not seeing?

I'd love to investigate this in boottest.stata, but I just learned that apparently, my license has expired...

s3alfisc commented 2 years ago

Here is a small experiment for the bootstrap with fixed effects and small sample adjustments with the most recent version of wildboottestjlr:

The non-bootstrapped test statistics differ, but not the p-values, neither for one-way nor for multi-way clustering. The confidence intervals differ slightly, which I suppose is due to the fact that you base your starting values on the non-bootstrapped t-stat(that's what I do in fwildclusterboot)?

You can try this on different data sets by changing the seed in create_data().

library(wildboottestjlr)
library(generics) 
library(fixest)

  N <- 500
  voters <- wildboottestjlr:::create_data(N = N,
                                           N_G1 = 40,
                                           icc1 = 0.5,
                                           N_G2 = 20,
                                           icc2 = 0.2,
                                           numb_fe1 = 10,
                                           numb_fe2 = 10,
                                           seed = 1456,
                                           weights = 1:N
  )

  lm_fit <- fixest::feols(proposition_vote ~ treatment  + log_income | state, 
               data = voters)

  boot_jl1 <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment", fe = "state", fedfadj = TRUE,  rng = 1235)
  boot_jl2 <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment",fedfadj = FALSE,  rng = 1235)

  generics::tidy(boot_jl1)
  generics::tidy(boot_jl2)

#>   generics::tidy(boot_jl1)
#            term   estimate statistic   p.value    conf.low  conf.high
# 1 1*treatment = 0 0.04273208  1.555573 0.1501502 -0.01631727 0.09976101
# >   generics::tidy(boot_jl2)
#            term   estimate statistic   p.value    conf.low  conf.high
# 1 1*treatment = 0 0.04273208  1.555572 0.1501502 -0.01631728 0.09976107

  boot_jl3 <- wildboottestjlr::boottest(lm_fit, clustid = c("group_id1", "group_id2"), B = 99999, param = "treatment", fe = "state", rng = 1235)
  boot_jl4 <- wildboottestjlr::boottest(lm_fit, clustid = c("group_id1", "group_id2"), B = 99999, param = "treatment", rng = 1235)

  generics::tidy(boot_jl3)
  generics::tidy(boot_jl4)

#>   generics::tidy(boot_jl3)
#             term   estimate statistic    p.value     conf.low  conf.high
#1 1*treatment = 0 0.04273208   1.92312 0.08508509 -0.005345576 0.08975388
#>   generics::tidy(boot_jl4)
#            term   estimate statistic    p.value   conf.low  conf.high
#1 1*treatment = 0 0.04273208  1.923119 0.08508509 -0.0053456 0.08975396
s3alfisc commented 2 years ago

Btw, if I feed in fweights = TRUE, I get an error:

  lm_fit <- lm(proposition_vote ~ treatment  + log_income , 
               data = voters, 
               weights = weights)

  boot_jl1 <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment",  rng = 1235)
  boot_jl2 <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment",  rng = 1235, fweights = TRUE)

Error: Evaluation in Julia failed.
Original Julia error message:
InexactError: Int64(272.4)
Stacktrace:
  [1] Int64
    @ .\float.jl:723 [inlined]
  [2] convert(#unused#::Type{Int64}, x::Float32)
    @ Base .\number.jl:7
  [3] setproperty!(x::WildBootTests.StrBootTest{Float32}, f::Symbol, v::Float32)
    @ Base .\Base.jl:34
  [4] Init!(o::WildBootTests.StrBootTest{Float32})
    @ WildBootTests C:\Users\alexa\.julia\packages\WildBootTests\gAEql\src\init.jl:32
  [5] boottest!(o::WildBootTests.StrBootTest{Float32})
    @ WildBootTests C:\Users\alexa\.julia\packages\WildBootTests\gAEql\src\WildBootTests.jl:18
  [6] plot(o::WildBootTests.StrBootTest{Float32})
    @ WildBootTests C:\Users\alexa\.julia\packages\WildBootTests\gAEql\src\plot-CI.jl:49
  [7] _getplot(o::WildBootTests.StrBootTest{Float32})
    @ WildBootTests C:\Users\alexa\.julia\packages\WildBootTests\gAEql\src\StrBoottest.jl:331
  [8] _wildboottest(T::DataType, R::Matrix{Float64}, r::Vector{Float64}; resp::Vector{Float64}, pre
In addition: Warning message:
Some names could not be expressed in the native encoding.
(Details see output of printing the returned object.) 
droodman commented 2 years ago

Oh yes you are right, it only affects the test statistic. I think it's useful to exactly match the test statistic from the estimation program, such as lm(), because it is reassuring to both the programmer and the user that everything is working.

At https://wiki.q-researchsoftware.com/wiki/Weights_in_R, I read that "In R, there is no standard way of addressing weights. While many R functions have a weights parameter, there is no consistency in how they are intepreted...Most commonly, weights in R are interpreted as frequency weights." Which is a bit strange because I agree from the description you quoted, it looks like lm() is not interpreting weights as frequency weights.

s3alfisc commented 2 years ago

Thanks for the feedback! I think I will have to check how felm, fixest & ivregress handle weights - maybe I will have to contact the authors.

I agree that it is a great test to compare the estimation t-stat with boottest's internal t-stat. I currently only check this within my unit tests for fwildclusterboot, which only does the '(G-1)/G' adjustment and ignores N, k, and all fixed effects complications.

I think I should provide a function that allows the user to check for t-stat equality. I like this test in particular because the pre-processing of all available data is rather convoluted. E.g. if the user manipulates a variable in the data.frame used for estimation prior to running boottest, the program will not take this into account.

droodman commented 2 years ago

Not certain but I think I fixed the crash in the latest commit. Fractional fweights are now accepted, FWIW.

s3alfisc commented 2 years ago

I have just checked it - after updating WildBootTests.jl, I still receive the error above.

s3alfisc commented 2 years ago

What does fedfadj do exactly? Does it only control whether the fixed effects parameters are counted as 'estimated parameters' when calculating k?

Anyways, the output of models with and without fixed effect is almost identical:

library(wildboottestjlr)
# set a 'global' seed in the Julia session
set_julia_seed(rng = 12313452435)

data(voters)
library(fixest)
library(lfe)
lm_fit <- lm(proposition_vote ~ treatment  + log_income + Q1_immigration, data = voters)
feols_fit <- feols(proposition_vote ~ treatment  + log_income | Q1_immigration, data = voters)
felm_fit <- felm(proposition_vote ~ treatment  + log_income | Q1_immigration, data = voters)

boot_lm <- boottest(lm_fit, clustid = "group_id1", B = 999, param = "treatment", rng = 7651427)
boot_feols <- boottest(feols_fit, clustid = "group_id1", B = 999, param = "treatment", rng = 7651427, fedfadj = TRUE )
boot_feols_fe <- boottest(feols_fit, clustid = "group_id1", fe = "Q1_immigration", B = 999, param = "treatment", rng = 7651427, fedfadj = TRUE )
boot_felm_fe <- boottest(felm_fit, clustid = "group_id1", fe = "Q1_immigration", B = 999, param = "treatment", rng = 7651427, fedfadj = TRUE)

generics::tidy(boot_lm)
generics::tidy(boot_feols)
generics::tidy(boot_feols_fe)
generics::tidy(boot_felm_fe)

# > generics::tidy(boot_lm)
# term   estimate statistic     p.value   conf.low conf.high
# 1 1*treatment = 0 0.07580763  3.796333 0.001001001 0.03610695 0.1168652
# > generics::tidy(boot_feols)
# term   estimate statistic     p.value   conf.low conf.high
# 1 1*treatment = 0 0.07580763  3.796333 0.001001001 0.03610695 0.1168652
# > generics::tidy(boot_feols_fe)
# term   estimate statistic     p.value   conf.low conf.high
# 1 1*treatment = 0 0.07580763  3.796325 0.001001001 0.03610691 0.1168654
# > generics::tidy(boot_felm_fe)
# term   estimate statistic     p.value   conf.low conf.high
# 1 1*treatment = 0 0.07580763  3.796325 0.001001001 0.03610691 0.1168654
droodman commented 2 years ago

Exactly right. I discovered that some of of Stata's panel data commands, like xtreg, don't count the the fixed effects in k, when computing standard errors, so this gives a way achieving exact matches.

droodman commented 2 years ago

I haven't followed all of the details above. Do you think understand the small-sample adjustments made by various R packages? Do these differ from Stata in a way that suggests the need for a new option in wildboottests(), in order to achieve exact matches?