lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
361 stars 59 forks source link

compare coefficients between different models #480

Closed hahoangnhan closed 3 months ago

hahoangnhan commented 3 months ago

Hi,

It's not a package issue, but I want to understand the outcomes well. Please correct me if I missed any fixest function that conveniently conducts this test.

I am comparing three coefficients between two models. Method 1: I manually extracted coefficients, se, and nobs and conducted Welch's t-test. Method 2: I simply used the base t.test function.

The results were different between the two methods. I thought that method 1 is more credible as my manual codes consider the se and nobs from the regression outputs while the t.test function only requires inputting two coefficients.

I hope to hear your guidance and suggestions.

Many thanks, HHN

> # Extract the relevant coefficients and standard errors for q1
> coef_q1 <- sapply(reg_future_breach_q1, function(x) coef(x)[["cyber_score1_std"]])
> se_q1 <- sapply(reg_future_breach_q1, function(x) x$se[["cyber_score1_std"]])
> n_q1 <- sapply(reg_future_breach_q1, function(x) x$nobs)
> 
> # Extract the relevant coefficients and standard errors for q4
> coef_q4 <- sapply(reg_future_breach_q4, function(x) coef(x)[["cyber_score1_std"]])
> se_q4 <- sapply(reg_future_breach_q4, function(x) x$se[["cyber_score1_std"]])
> n_q4 <- sapply(reg_future_breach_q4, function(x) x$nobs)
> 
> # Compute the coefficient differences and their standard errors
> diff_coef <- coef_q4 - coef_q1
> 
> # Calculate the standard errors with sample sizes incorporated
> diff_se <- sqrt((se_q4^2 / n_q4) + (se_q1^2 / n_q1)) 
> # Compute the degrees of freedom
> df = (se_q4^2/n_q4 + se_q1^2/n_q1)^2 / ((se_q4^2/n_q4)^2/(n_q4 - 1) + (se_q1^2/n_q1)^2/(n_q1 - 1))
> 
> # Perform the hypothesis tests
> test_results <- data.frame(
+   mean_diff = diff_coef,
+   t_stat = diff_coef / diff_se,
+   p_value = 2 * pt(-abs(diff_coef / diff_se), df = df))
> test_results
                       mean_diff     t_stat p_value
lhs: breach_lead_1 -0.0009619172 -109.90140       0
lhs: breach_lead_2 -0.0004093131  -98.69278       0
lhs: breach_lead_3 -0.0009076730 -146.08656       0

> t.test(x = coef_q4, y = coef_q1, var.equal = FALSE)

    Welch Two Sample t-test

data:  coef_q4 and coef_q1
t = -1.6384, df = 3.7716, p-value = 0.181
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.0020782776  0.0005590088
sample estimates:
    mean of x     mean of y 
-8.148892e-04 -5.525474e-05 
kylebutts commented 3 months ago

I think using t.test is not the correct choice. I believe this is operating as if the coefficient estimates were 3 observations and calculating the standard error based on that.

The package marginaleffects with function hypotheses is your friend here

hahoangnhan commented 3 months ago

I think using t.test is not the correct choice. I believe this is operating as if the coefficient estimates were 3 observations and calculating the standard error based on that.

The package marginaleffects with function hypotheses is your friend here

Hi @kylebutts, thanks for your response. Do my manual hypothesis tests work better than the base function t.test in this case? I extracted all details from two fixest models (reg_future_breach_q1 and reg_future_breach_q1) and then followed the formal equation to conduct the test.

kylebutts commented 3 months ago

It depends if you believe the coefficients are independent of one another. If, for example, they are the same observations but different outcome variables, then the coefficients are likely correlated across models.

I think what you're looking for can be done with vcovSUR

hahoangnhan commented 3 months ago

It depends if you believe the coefficients are independent of one another. If, for example, they are the same observations but different outcome variables, then the coefficients are likely correlated across models.

I think what you're looking for can be done with vcovSUR

reg_future_breach_q1 and reg_future_breach_q1 have the same equation but different samples. Therefore, I wanted to compare some coefficients between the two models simply. I have addressed the serial correlation by clustering standard errors.

kylebutts commented 3 months ago

If you think the samples are independent, then what you have above is correct. Note you are running separate tests (not a single joint test). If you want the latter, you probably are best to use vcovSUR with marginaleffects::hypotheses as in the README

hahoangnhan commented 3 months ago

If you think the samples are independent, then what you have above is correct. Note you are running separate tests (not a single joint test). If you want the latter, you probably are best to use vcovSUR with marginaleffects::hypotheses as in the README

Many thanks @kylebutts! I'll try something with vcovSUR with marginaleffects::hypotheses, seems they offer several interesting analyses.