s3alfisc / fwildclusterboot

Fast Wild Cluster Bootstrap Inference for Regression Models / OLS in R. Additionally, R port to WildBootTests.jl via the JuliaConnectoR.
https://s3alfisc.github.io/fwildclusterboot/
GNU General Public License v3.0
23 stars 4 forks source link

boottest fails when a fixed effect variable is of type "date" #115

Closed 95goo closed 1 year ago

95goo commented 1 year ago

Hi, thank you very much for this amazing package, It's been great to dig into. Below is some sample code. I am clustering by c and also using it as a fixed effect in my model. Note that b is also a categorical variable. model <- feols(Y ~ X |b + c, cluster = "c") I run boottest as follows: boottest(model, clustid = "c", param = "X", B = 9999) The error I receive is: "error in solve.default(crossprod(weights_sq * X)) : system is computationally singular: reciprocal condition number"

Any advices at all will be much appreciated. Thank you very much.

s3alfisc commented 1 year ago

Hi, thanks for the feedback and the nice words!

Unfortunatley (or fortunately 😄 ) I cannot reproduce this error. Which version of the package are you running? What happens if you install the most up to date version from CRAN? If you still run into these issues, can you maybe provide data so that I can reproduce it?

Here is the code that I run:

library(fwildclusterboot)
library(fixest)
data(voters)

sapply(
  voters[,
         c(
           "proposition_vote", 
           "treatment", 
           "Q1_immigration", 
           "Q2_defense"
           )], 
  class
)

# proposition_vote 
# "integer" 
# treatment 
# "integer" 
# Q1_immigration 
# "factor" 
# Q2_defense 
# "factor" 

fit <- feols(
  proposition_vote ~ treatment |
    Q1_immigration + Q2_defense, 
  data = voters
)

boot <- boottest(
  fit, 
  param = "treatment", 
  clustid = "Q1_immigration", 
  B = 999
)
summary(boot)
# boottest.fixest(object = fit, param = "treatment", B = 999, clustid = "Q1_immigration")
# 
# Hypothesis: 1*treatment = 0
# Observations: 300
# Bootstr. Type: rademacher
# Clustering: 1-way
# Confidence Sets: 95%
# Number of Clusters: 10
# 
# term estimate statistic p.value conf.low conf.high
# 1 1*treatment = 0    0.077     1.462   0.108   -0.003     0.213

boot <- boottest(
  fit, 
  param = "treatment", 
  clustid = "Q2_defense", 
  B = 999
)
summary(boot)
# boottest.fixest(object = fit, param = "treatment", B = 999, clustid = "Q2_defense")
# 
# Hypothesis: 1*treatment = 0
# Observations: 300
# Bootstr. Type: rademacher
# Clustering: 1-way
# Confidence Sets: 95%
# Number of Clusters: 10
# 
# term estimate statistic p.value conf.low conf.high
# 1 1*treatment = 0    0.077     2.758   0.033     0.01     0.159
95goo commented 1 year ago

Thanks very much. I updated the package as you recommended. I tried to run with only one fixed effect(categorical variable) to test now: model <- feols(date = reg_data, Y ~ X | c, cluster = "c") boottest(model, clustid = "c", param = "X", B = 200)

This is the error I receive: "Error in 1 | c : operations are possible for numeric, logical, or complex types" If you have any recommendation further please let me know, otherwise I will provide sample data for assistance. When I add the " | c", where c is the categorical variable I am clustering with, to the feols model, that is where the issue begins in the boottest. It runs fine without it.

I am noting the bootstrap function runs fine where I add year fixed effects to the feols model as done below. model <- feols(date = reg_data, Y ~ X | year, cluster = "c") However, there is an issue when I use week fixed effects as opposed to year. model <- feols(date = reg_data, Y ~ X | week, cluster = "c") In that scenario, I get the following error: "Error in Ops.Date(1, week) : | not defined for "Date" objects"

I appreciate your advices:)

s3alfisc commented 1 year ago

What is the exact type of your date and year variables? I.e. what do you get when run running sapply(data, class)?

s3alfisc commented 1 year ago

I am asking as it looks like you are providing a date object?

95goo commented 1 year ago

The year is formatted as 'numeric' while the week is formatted as 'date' object. Is there an issue with this?

s3alfisc commented 1 year ago

Great, maybe the date type causes the problem here - I will check this later today. What happens if you convert the date to a plain factor?

s3alfisc commented 1 year ago

In your first example, were either b or c date variables?

95goo commented 1 year ago

Thank you very much sir. Converting the week variable to the plain factor using "factor" makes the code below work for me!! model <- feols(date = reg_data, Y ~ X | week, cluster = "c") hooray!!!

In my first example b and c were not date variables however. =( I get the following error for the following code where c is a categorical variable (10 distinct values) model <- feols(date = reg_data, Y ~ X | c, cluster = "c") boottest(model, clustid = "c", param = "X", B = 200)

This is the error I receive: "Error in 1 | c : operations are possible for numeric, logical, or complex types"

95goo commented 1 year ago

HOWEVER, changing variable "c" from character to factor using the "factor" function... the boottest function works... I do not totally understand this and would love to learn more about why this is the case. See below for the code used and the added line.

reg_data <-reg_data %>% mutate(c = factor(c)) model <- feols(date = reg_data, Y ~ X | c, cluster = "c") boottest(model, clustid = "c", param = "X", B = 200)

s3alfisc commented 1 year ago

I cannot reproduce the error that you observe with character variables. Can you provide me an example with simulated date where the error occurs? All fixed effects are transformed into factors internally, hence it should not matter if you provide c as a character or factor.

Note that in general, it is dangerous to call variables "c", because c is also a base function. Lots of things that could go wrong there. ChatGPT says the following: "It is generally not recommended to use the name "c" for a variable in R because "c" is a commonly used base function in R for combining or concatenating objects. If you assign a value to the variable "c", you will overwrite the default behavior of the base function, leading to potential confusion and errors in your code."

With the date variable, you have indeed discovered a bug - something fails in the fixed effects preprocessing pipeline. I'll try to fix that asap =)

s3alfisc commented 1 year ago

It looks like Formula and expand.model.frame do not handle date variables in the second part of the formula.

        suppressWarnings(
          expand.model.frame(
            model =
              manipulate_object(object),
            extras = clustid_fml,
            na.expand = FALSE,
            envir = call_env
          )
        )

The error I receive is

Error in Ops.Date(1, date) : | not defined for "Date" objects
Called from: Ops.Date(1, date)
```. 

`sandwich` fails in the same context as well: 

```r
library(sandwich)
library(fwildclusterboot)

data(voters)
date <- sample(1:7, nrow(voters), TRUE)
voters$date <- as.Date(date,origin = "1970-01-01")

sapply(voters, class)

fit <- feols(proposition_vote ~ treatment | date, data = voters)
sandwich::vcovCL(fit, ~date)
# Error in Ops.Date(treatment, date) : | not defined for "Date" objects

fwildclusterboot::boottest(fit, param = "treatment", clustid = "group_id", B = 999)
# Error in Ops.Date(1, date) : | not defined for "Date" objects

Tagging @zeileis here for awareness (in case you are not aware already).

For now, I will label this as "won't fix".

zeileis commented 1 year ago

I think Formula is not involved here, is it? The error message would indicate that standard formula processing (as opposed to Formula) is used. To expand.model.frame the model specification looks like a standard formula and hence treats | in the basic way and not for separating a model part. I would recommend to be explicit and use ... | factor(date) instead. Then expand.model.frame() seems to work again.

s3alfisc commented 1 year ago

Thanks for your feedback, Achim. Indeed you are right that Formula is not involved =) I'll close this issue, as I don't think I will fix this in the nearer future. Hope this is ok with you @95goo ?