Unclear function argument

s3alfisc / wildboottestjlr

Fast Wild Cluster Bootstrap Inference for OLS/IV in R based on WildBootTests.jl & JuliaConnectoR

Other

0 stars 0 forks source link

Unclear function argument #17

Open s3alfisc opened 2 years ago

s3alfisc commented 2 years ago

Hi, some function arguments are unclear to me:

bootstrapc::Bool=false: true for bootstrap-c - Here I simply don't know what it means
fedfadj::Bool=true: true if small-sample adjustment should reflect number of fixed effects. Is it true that, if there is no fixed effect, fedfadj = true will have no effect on the estimation?
fweights - Is there a difference between sampling and frequency weights in boottest.stata?

droodman commented 2 years ago

Good questions.

The "bootstrap-c" was added after Fast & Wild was written so it's only documented in the Stata boottest help file. "Since the publication of Roodman et al. (2019), boottest has gained two significant features. The first is the option to perform the bootstrap-c, which bootstraps the distribution of the coefficient(s) of interest (or linear combinations thereof) rather t/z/F/chi2 statistics. Standard theory favors the latter, but Young (2019) presents evidence that the bootstrap-c (or "non-studentized" test) is more reliable, at least in instrumental variables estimation. And theory and simulation in Wang (2021) favors the non-studentized test when instruments are weak (but strong in at least one cluster). The option statistic(c) invokes the feature."
yes
I think they are are different, yes. fweights are absolute and sampling weights are relative, in the following sense. Maybe the terminology is different in R? In Stata fweights are a way of compressing data. An fweight of 2 means the observation occurred twice and is stored once. Sampling weights (which might be "pweights" or "aweights") specify relative weight. The difference is in the sample size used to compute standard errors. If a sample has a total of two observations, one with weight 1 and one with weight 2, then the sample size is 3 for fweights and 2 otherwise. In the latter case, the observations actually get weights of 2/3 and 4/3. With one-way clustering, I think this distinction will only affect the reported test statistic, not the p value or CI. With multiway clustering, it can affect the p and CI, as essentially discussed in section 4.2 of Fast & Wild.

s3alfisc commented 2 years ago

Hi David, thanks for your answers.

R does not distinguish between fweights and pweights at all. The documentation of lm() states

"Non-NULL weights can be used to indicate that different observations have different variances (with the values in weights being inversely proportional to the variances); or equivalently, when the elements of weights are positive integers (w_i), that each response (y_i) is the mean of (w_i) unit-weight observations (including the case that there are (w_i) observations equal to (y_i) and the data have been summarized). However, in the latter case, notice that within-group variation is not used. Therefore, the sigma estimate and residual degrees of freedom may be suboptimal; in the case of replication weights, even wrong. Hence, standard errors and analysis of variance tables should be treated with care."

Which I translate as: if the weights are frequency weights, lm() will not adjust inference for it.

I have to admit that I am not fully sure that I understand how probability weights and frequency weights will produce different inferences under multi-way clustering.

Following 'fast and wild', the multi-way covariate matrix can be written as

V = m_1 V_1 + m_2 V_2 - m_12 V_12

and m_i = (N - 1) / (N - K) (G_i - 1) / G_i for all i.

In consequence, V = (N - 1) / (N - K) x [(G_1 - 1) / G_1 V_1 + (G_i - 2) / G_2 V_2 - (G_12 - 1) / G_12 V_12].

This holds for both the bootstrapped and non-bootstrapped covariance matrix and in consequence for bootstrapped and non-bootstrapped t-statistics - both are just multiplied by the same factor A = (N - 1) / (N - K). In consequence, the p-value does not depend on A, as does the confidence interval? So my intuition is that only the non-bootstrapped t-statistic will differ, both for one-way and multi-way clustering, but nothing else.

For similar reasons, I also thought that no 'fixed effect small sample corrections' were needed? What am I not seeing?

I'd love to investigate this in boottest.stata, but I just learned that apparently, my license has expired...

s3alfisc commented 2 years ago

Here is a small experiment for the bootstrap with fixed effects and small sample adjustments with the most recent version of wildboottestjlr:

The non-bootstrapped test statistics differ, but not the p-values, neither for one-way nor for multi-way clustering. The confidence intervals differ slightly, which I suppose is due to the fact that you base your starting values on the non-bootstrapped t-stat(that's what I do in fwildclusterboot)?

You can try this on different data sets by changing the seed in create_data().

library(wildboottestjlr)
library(generics) 
library(fixest)

  N <- 500
  voters <- wildboottestjlr:::create_data(N = N,
                                           N_G1 = 40,
                                           icc1 = 0.5,
                                           N_G2 = 20,
                                           icc2 = 0.2,
                                           numb_fe1 = 10,
                                           numb_fe2 = 10,
                                           seed = 1456,
                                           weights = 1:N
  )

  lm_fit <- fixest::feols(proposition_vote ~ treatment  + log_income | state, 
               data = voters)

  boot_jl1 <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment", fe = "state", fedfadj = TRUE,  rng = 1235)
  boot_jl2 <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment",fedfadj = FALSE,  rng = 1235)

  generics::tidy(boot_jl1)
  generics::tidy(boot_jl2)

#>   generics::tidy(boot_jl1)
#            term   estimate statistic   p.value    conf.low  conf.high
# 1 1*treatment = 0 0.04273208  1.555573 0.1501502 -0.01631727 0.09976101
# >   generics::tidy(boot_jl2)
#            term   estimate statistic   p.value    conf.low  conf.high
# 1 1*treatment = 0 0.04273208  1.555572 0.1501502 -0.01631728 0.09976107

  boot_jl3 <- wildboottestjlr::boottest(lm_fit, clustid = c("group_id1", "group_id2"), B = 99999, param = "treatment", fe = "state", rng = 1235)
  boot_jl4 <- wildboottestjlr::boottest(lm_fit, clustid = c("group_id1", "group_id2"), B = 99999, param = "treatment", rng = 1235)

  generics::tidy(boot_jl3)
  generics::tidy(boot_jl4)

#>   generics::tidy(boot_jl3)
#             term   estimate statistic    p.value     conf.low  conf.high
#1 1*treatment = 0 0.04273208   1.92312 0.08508509 -0.005345576 0.08975388
#>   generics::tidy(boot_jl4)
#            term   estimate statistic    p.value   conf.low  conf.high
#1 1*treatment = 0 0.04273208  1.923119 0.08508509 -0.0053456 0.08975396

s3alfisc commented 2 years ago

Btw, if I feed in fweights = TRUE, I get an error:

  lm_fit <- lm(proposition_vote ~ treatment  + log_income , 
               data = voters, 
               weights = weights)

  boot_jl1 <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment",  rng = 1235)
  boot_jl2 <- wildboottestjlr::boottest(lm_fit, clustid = "group_id1", B = 99999, param = "treatment",  rng = 1235, fweights = TRUE)

Error: Evaluation in Julia failed.
Original Julia error message:
InexactError: Int64(272.4)
Stacktrace:
  [1] Int64
    @ .\float.jl:723 [inlined]
  [2] convert(#unused#::Type{Int64}, x::Float32)
    @ Base .\number.jl:7
  [3] setproperty!(x::WildBootTests.StrBootTest{Float32}, f::Symbol, v::Float32)
    @ Base .\Base.jl:34
  [4] Init!(o::WildBootTests.StrBootTest{Float32})
    @ WildBootTests C:\Users\alexa\.julia\packages\WildBootTests\gAEql\src\init.jl:32
  [5] boottest!(o::WildBootTests.StrBootTest{Float32})
    @ WildBootTests C:\Users\alexa\.julia\packages\WildBootTests\gAEql\src\WildBootTests.jl:18
  [6] plot(o::WildBootTests.StrBootTest{Float32})
    @ WildBootTests C:\Users\alexa\.julia\packages\WildBootTests\gAEql\src\plot-CI.jl:49
  [7] _getplot(o::WildBootTests.StrBootTest{Float32})
    @ WildBootTests C:\Users\alexa\.julia\packages\WildBootTests\gAEql\src\StrBoottest.jl:331
  [8] _wildboottest(T::DataType, R::Matrix{Float64}, r::Vector{Float64}; resp::Vector{Float64}, pre
In addition: Warning message:
Some names could not be expressed in the native encoding.
(Details see output of printing the returned object.)

droodman commented 2 years ago

Oh yes you are right, it only affects the test statistic. I think it's useful to exactly match the test statistic from the estimation program, such as lm(), because it is reassuring to both the programmer and the user that everything is working.

At https://wiki.q-researchsoftware.com/wiki/Weights_in_R, I read that "In R, there is no standard way of addressing weights. While many R functions have a weights parameter, there is no consistency in how they are intepreted...Most commonly, weights in R are interpreted as frequency weights." Which is a bit strange because I agree from the description you quoted, it looks like lm() is not interpreting weights as frequency weights.

s3alfisc commented 2 years ago

Thanks for the feedback! I think I will have to check how felm, fixest & ivregress handle weights - maybe I will have to contact the authors.

I agree that it is a great test to compare the estimation t-stat with boottest's internal t-stat. I currently only check this within my unit tests for fwildclusterboot, which only does the '(G-1)/G' adjustment and ignores N, k, and all fixed effects complications.

I think I should provide a function that allows the user to check for t-stat equality. I like this test in particular because the pre-processing of all available data is rather convoluted. E.g. if the user manipulates a variable in the data.frame used for estimation prior to running boottest, the program will not take this into account.

droodman commented 2 years ago

Not certain but I think I fixed the crash in the latest commit. Fractional fweights are now accepted, FWIW.

s3alfisc commented 2 years ago

I have just checked it - after updating WildBootTests.jl, I still receive the error above.

s3alfisc commented 2 years ago

What does fedfadj do exactly? Does it only control whether the fixed effects parameters are counted as 'estimated parameters' when calculating k?

Anyways, the output of models with and without fixed effect is almost identical:

library(wildboottestjlr)
# set a 'global' seed in the Julia session
set_julia_seed(rng = 12313452435)

data(voters)
library(fixest)
library(lfe)
lm_fit <- lm(proposition_vote ~ treatment  + log_income + Q1_immigration, data = voters)
feols_fit <- feols(proposition_vote ~ treatment  + log_income | Q1_immigration, data = voters)
felm_fit <- felm(proposition_vote ~ treatment  + log_income | Q1_immigration, data = voters)

boot_lm <- boottest(lm_fit, clustid = "group_id1", B = 999, param = "treatment", rng = 7651427)
boot_feols <- boottest(feols_fit, clustid = "group_id1", B = 999, param = "treatment", rng = 7651427, fedfadj = TRUE )
boot_feols_fe <- boottest(feols_fit, clustid = "group_id1", fe = "Q1_immigration", B = 999, param = "treatment", rng = 7651427, fedfadj = TRUE )
boot_felm_fe <- boottest(felm_fit, clustid = "group_id1", fe = "Q1_immigration", B = 999, param = "treatment", rng = 7651427, fedfadj = TRUE)

generics::tidy(boot_lm)
generics::tidy(boot_feols)
generics::tidy(boot_feols_fe)
generics::tidy(boot_felm_fe)

# > generics::tidy(boot_lm)
# term   estimate statistic     p.value   conf.low conf.high
# 1 1*treatment = 0 0.07580763  3.796333 0.001001001 0.03610695 0.1168652
# > generics::tidy(boot_feols)
# term   estimate statistic     p.value   conf.low conf.high
# 1 1*treatment = 0 0.07580763  3.796333 0.001001001 0.03610695 0.1168652
# > generics::tidy(boot_feols_fe)
# term   estimate statistic     p.value   conf.low conf.high
# 1 1*treatment = 0 0.07580763  3.796325 0.001001001 0.03610691 0.1168654
# > generics::tidy(boot_felm_fe)
# term   estimate statistic     p.value   conf.low conf.high
# 1 1*treatment = 0 0.07580763  3.796325 0.001001001 0.03610691 0.1168654

droodman commented 2 years ago

Exactly right. I discovered that some of of Stata's panel data commands, like xtreg, don't count the the fixed effects in k, when computing standard errors, so this gives a way achieving exact matches.

droodman commented 2 years ago

I haven't followed all of the details above. Do you think understand the small-sample adjustments made by various R packages? Do these differ from Stata in a way that suggests the need for a new option in wildboottests(), in order to achieve exact matches?