ycroissant / plm

Panel Data Econometrics with R
GNU General Public License v2.0
49 stars 13 forks source link

HT model - intercept mismatch with respect to Stata #36

Closed sondalex closed 1 year ago

sondalex commented 1 year ago

Hi,

I have noticed that Hausman-Taylor model (1981) estimated intercept coefficient are different from those in Stata xthtaylor command. Is this behaviour intentional ?

Reproducible Example:

library(broom) 
library(plm) # v2.6-2

ht <- plm(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) +bluecol + ind + union + sex + black + ed |
  bluecol + south + smsa + ind + sex + black | 
  wks + married + union + exp + I(exp ^ 2), 
  data = Wages, index = 595, random.method = "ht", 
  model = "random", inst.method = "baltagi"
)

print(tidy(ht), n=13)
# A tibble: 13 × 5
   term         estimate std.error statistic  p.value
   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
 1 (Intercept)  3.38     0.372         9.09  1.00e-19
 2 wks          0.000837 0.000600      1.40  1.63e- 1
 3 south        0.00744  0.0320        0.233 8.16e- 1
 4 smsa        -0.0418   0.0190       -2.21  2.73e- 2
 5 married     -0.0299   0.0190       -1.57  1.16e- 1
 6 exp          0.113    0.00247      45.8   0       
 7 I(exp^2)    -0.000419 0.0000546    -7.67  1.70e-14
 8 bluecol     -0.0207   0.0138       -1.50  1.33e- 1
 9 ind          0.0136   0.0152        0.893 3.72e- 1
10 union        0.0328   0.0149        2.20  2.79e- 2
11 sex         -0.131    0.127        -1.03  3.01e- 1
12 black       -0.286    0.156        -1.84  6.65e- 2
13 ed           0.138    0.0212        6.49  8.47e-11

Intercept ≃ 3.38

And in Stata (v17):

 xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, endo(ed wks ms union exp exp2) constant(ed, fem, blk)

/*
------------------------------------------------------------------------------
       lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
TVexogenous  |
       south |   .0074398    .031955     0.23   0.816    -.0551908    .0700705
        smsa |  -.0418334   .0189581    -2.21   0.027    -.0789906   -.0046761
         occ |  -.0207047   .0137809    -1.50   0.133    -.0477149    .0063055
         ind |   .0136039   .0152374     0.89   0.372    -.0162608    .0434686
TVendogenous |
         wks |   .0008374   .0005997     1.40   0.163    -.0003381    .0020129
          ms |  -.0298508     .01898    -1.57   0.116    -.0670508    .0073493
       union |   .0327714   .0149084     2.20   0.028     .0035514    .0619914
         exp |   .1131328    .002471    45.79   0.000     .1082898    .1179758
        exp2 |  -.0004189   .0000546    -7.67   0.000    -.0005259   -.0003119
TIexogenous  |
         fem |  -.1309236    .126659    -1.03   0.301    -.3791707    .1173234
         blk |  -.2857479   .1557019    -1.84   0.066    -.5909179    .0194221
TIendogenous |
          ed |    .137944   .0212485     6.49   0.000     .0962977    .1795902
             |
       _cons |   2.912726   .2836522    10.27   0.000     2.356778    3.468674
*/

Intercept ≃ 2.91

tappek commented 1 year ago

The R code you provide gives different results compared to the output you supplied on my side, so not reproducible for me. I use plm 2.6-2 and broom 1.0.1.

I do get the same results as Stata gives. Also have a look at the 2nd plm vignette where the model is estimated as well (https://cran.r-project.org/web/packages/plm/vignettes/B_plmFunction.html, section Instrumental variable estimators).

Could you please double check in a clean R session?

library(broom) # v1.0.1
library(plm)   # v2.6-2
data(Wages, package = "plm")
ht <- plm(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) +bluecol + ind + union + sex + black + ed |
            bluecol + south + smsa + ind + sex + black | 
            wks + married + union + exp + I(exp ^ 2), 
          data = Wages, index = 595, random.method = "ht", 
          model = "random", inst.method = "baltagi"
)

print(tidy(ht), n=13)
#> # A tibble: 13 × 5
#>    term         estimate std.error statistic  p.value
#>    <chr>           <dbl>     <dbl>     <dbl>    <dbl>
#>  1 (Intercept)  2.91     0.284        10.3   9.76e-25
#>  2 wks          0.000837 0.000600      1.40  1.63e- 1
#>  3 southyes     0.00744  0.0320        0.233 8.16e- 1
#>  4 smsayes     -0.0418   0.0190       -2.21  2.73e- 2
#>  5 marriedyes  -0.0299   0.0190       -1.57  1.16e- 1
#>  6 exp          0.113    0.00247      45.8   0       
#>  7 I(exp^2)    -0.000419 0.0000546    -7.67  1.70e-14
#>  8 bluecolyes  -0.0207   0.0138       -1.50  1.33e- 1
#>  9 ind          0.0136   0.0152        0.893 3.72e- 1
#> 10 unionyes     0.0328   0.0149        2.20  2.79e- 2
#> 11 sexfemale   -0.131    0.127        -1.03  3.01e- 1
#> 12 blackyes    -0.286    0.156        -1.84  6.65e- 2
#> 13 ed           0.138    0.0212        6.49  8.47e-11
sondalex commented 1 year ago

Thank you for your help.

I did a mistake converting all columns to numeric and created the reproducible example from this "dirty" session. Sorry for raising an unnecessary issue.

tappek commented 1 year ago

No worries - thank you for checking and confirming!