ngreifer / WeightIt

WeightIt: an R package for propensity score weighting
https://ngreifer.github.io/WeightIt/
102 stars 12 forks source link

PSs are different if the covariates are specified in different orders on the right-hand side of the formula when method = "cbps" #16

Closed danielebottigliengo closed 3 years ago

danielebottigliengo commented 3 years ago

Hi!

Thanks for the great package!

I have noticed that when method = "cbps" is used in the weightit function different PSs values are returned if the covariates on the right-hand side of the formula are specified in different orders. However, if PSs are estimated "outside" of the weightit function using CBPS::CBPS they look identical. Here's a reproducible example using lalonde dataset.

library(WeightIt)
library(CBPS)
#> Loading required package: MASS
#> Loading required package: MatchIt
#> Loading required package: nnet
#> Loading required package: numDeriv
#> Loading required package: glmnet
#> Loading required package: Matrix
#> Loaded glmnet 4.0-2
#> CBPS: Covariate Balancing Propensity Score
#> Version: 0.21
#> Authors: Christian Fong [aut, cre],
#>   Marc Ratkovic [aut],
#>   Kosuke Imai [aut],
#>   Chad Hazlett [ctb],
#>   Xiaolin Yang [ctb],
#>   Sida Peng [ctb]

data("lalonde")

# 1) PS estimation using CBPS function ---------------------------------
ps_1 <- CBPS(
  treat ~ age + educ + race + married + nodegree + re74 + re75,
  data = lalonde, ATT = 0
)

ps_2 <- CBPS(
  treat ~ educ + race + nodegree + re74 + re75 + married + age,
  data = lalonde, ATT = 0
)

head(data.frame(
  ps1 = ps_1$fitted.values,
  ps2 = ps_2$fitted.values
))
#>         ps1       ps2
#> 1 0.6283220 0.6283220
#> 2 0.2217905 0.2217905
#> 3 0.6629598 0.6629598
#> 4 0.7595289 0.7595289
#> 5 0.6939144 0.6939144
#> 6 0.6906981 0.6906981

# The PSs are identical

# 2) PS estimation using weightit --------------------------------------
wt_1 <- weightit(
  treat ~ age + educ + race + married + nodegree + re74 + re75,
  data = lalonde,
  method = "cbps",
  estimand = "ATE"
)

wt_2 <- weightit(
  treat ~ educ + race + nodegree + re74 + re75 + married + age,
  data = lalonde,
  method = "cbps",
  estimand = "ATE"
)

head(data.frame(
  ps1 = wt_1$ps,
  ps2 = wt_2$ps
))
#>         ps1       ps2
#> 1 0.6897437 0.5851259
#> 2 0.2157126 0.2275898
#> 3 0.6599305 0.7606147
#> 4 0.7763387 0.7580954
#> 5 0.6329184 0.6942782
#> 6 0.6930129 0.6907725

# The PSs are not identical

Created on 2020-12-28 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.3 (2020-10-10) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate Italian_Italy.1252 #> ctype Italian_Italy.1252 #> tz Europe/Berlin #> date 2020-12-28 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3) #> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.3) #> CBPS * 0.21 2019-08-21 [1] CRAN (R 4.0.0) #> cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.3) #> codetools 0.2-16 2018-12-24 [2] CRAN (R 4.0.3) #> colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.3) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0) #> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) #> dplyr 1.0.2 2020-08-18 [1] CRAN (R 4.0.2) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0) #> foreach 1.5.1 2020-10-15 [1] CRAN (R 4.0.3) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3) #> ggplot2 3.3.2 2020-06-19 [1] CRAN (R 4.0.2) #> glmnet * 4.0-2 2020-06-16 [1] CRAN (R 4.0.2) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0) #> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2) #> iterators 1.0.13 2020-10-15 [1] CRAN (R 4.0.3) #> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.3) #> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) #> MASS * 7.3-53 2020-09-09 [2] CRAN (R 4.0.3) #> MatchIt * 4.1.0 2020-12-15 [1] CRAN (R 4.0.3) #> Matrix * 1.2-18 2019-11-27 [2] CRAN (R 4.0.3) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0) #> nnet * 7.3-14 2020-04-26 [2] CRAN (R 4.0.3) #> numDeriv * 2016.8-1.1 2019-06-06 [1] CRAN (R 4.0.0) #> pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.3) #> pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.3) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0) #> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0) #> processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.3) #> ps 1.5.0 2020-12-05 [1] CRAN (R 4.0.3) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3) #> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2) #> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2) #> rlang 0.4.9 2020-11-26 [1] CRAN (R 4.0.3) #> rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.3) #> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> shape 1.4.5 2020-09-13 [1] CRAN (R 4.0.2) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) #> survival 3.2-7 2020-09-28 [2] CRAN (R 4.0.3) #> testthat 3.0.1 2020-12-17 [1] CRAN (R 4.0.3) #> tibble 3.0.4 2020-10-12 [1] CRAN (R 4.0.3) #> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0) #> usethis 2.0.0 2020-12-10 [1] CRAN (R 4.0.3) #> vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.3) #> WeightIt * 0.10.2 2020-08-21 [1] Github (ngreifer/WeightIt@7811149) #> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.3) #> xfun 0.19 2020-10-30 [1] CRAN (R 4.0.3) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) #> #> [1] C:/Users/Daniele/Documents/R/win-library/4.0 #> [2] C:/Program Files/R/R-4.0.3/library ```

I also tried using method = "ps" and method = "gbm", but I didn't notice any differences in the PSs values. I wasn't expecting different PSs values when covariates are specified in different orders, but maybe there is something that I am missing.

Thank you very much!

ngreifer commented 3 years ago

Thank you so much for letting me know about this! It should e fixed now.

The problem was due to WegithIt splitting the race factor variable into three categories and supplying all of them to CBPS(), which meant the design matrix was not full rank, leading to erratic behavior. You can actually replicate this using CBPS() in the following way:

lalonde_split <- cobalt::splitfactor(lalonde, drop.first = FALSE)

ps_1 <- CBPS(
  treat ~ age + educ + race_black + race_hispan + race_white + married + nodegree + re74 + re75,
  data = lalonde_split, ATT = 0
)

ps_2 <- CBPS(
  treat ~ educ + race_black + race_hispan + race_white + nodegree + re74 + re75 + married + age,
  data = lalonde_split, ATT = 0
)

head(data.frame(
  ps1 = ps_1$fitted.values,
  ps2 = ps_2$fitted.values
))

I've fixed weightit() so that it ensures the matrix is full rank, yielding the same propensity scores every time.

danielebottigliengo commented 3 years ago

Thank you so much @ngreifer for your quick answer!