ngreifer / WeightIt

WeightIt: an R package for propensity score weighting
https://ngreifer.github.io/WeightIt/
102 stars 12 forks source link

Error with missing data when using "gbm" as method #8

Closed fakeacct3000 closed 4 years ago

fakeacct3000 commented 4 years ago

I'm having problems with cobalt and weightit (using gbm) due to error messages regarding missing data. I have no issues with twang using the same data, so I tried to use weightit with the lalonde_mis dataset. I trigger the same error message using the lalonde data. How does one get weightit to work with missing data on covariates? I've only tried the gbm method, which I thought the manual said worked with missing data.

weightit_0 <- weightit(treat ~ age + educ + race + married + nodegree, # + re74 + re75,
data = lalonde_mis, method = "gbm")

Missing values are present in the covariates. See ?weightit for information on how these are handled.Error in if (is.factor(x) || is.character(x) || all_the_same(x)) return(x) else if (is_binary(x)) { : missing value where TRUE/FALSE needed

I get this error regardless of whether I remove continuous vars with missing data (as above).

ngreifer commented 4 years ago

Thank you for letting me know about this. Can you let me know which version fo WeightIt you're using?

fakeacct3000 commented 4 years ago

0.7.1 (just installed today). The cobalt package appears to be the most recent as well (3.9.0).

In case it’s relevant, I’d thought that my errors on my own data related to the data structure, as I’d used a data_frame object when fitting the model with twang. But I get the same error even when using the original data, which was imported in through haven from Stata. I’d also already removed missing values from the treatment variable, so my data are only missing data on the covariates.

From: Noah Greifer notifications@github.com Sent: 06 December 2019 07:12 To: ngreifer/WeightIt WeightIt@noreply.github.com Subject: Re: [ngreifer/WeightIt] Error with missing data when using "gbm" as method (#8)

Thank you for letting me know about this. Can you let me know which version fo WeightIt you're using?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ngreifer/WeightIt/issues/8?email_source=notifications&email_token=AN67FHK3XS3ZB6HDIYQTW43QXH3KLA5CNFSM4JWOA5HKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGDHPLI#issuecomment-562460589, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AN67FHM7Y3ET6LWEZIK6TV3QXH3KLANCNFSM4JWOA5HA.

ngreifer commented 4 years ago

Thanks for letting me know. I figured out the bug, and I'll let you know when a development version of WeightIt that is fixed is ready. I'll be publishing an updated version to CRAN soon, too, with some new features.

Although this is no excuse for the bugs, it has been demonstrated that using GBM directly on missing data yields far poorer performance than using GBM in multiply imputed datasets. See Penning de Vries et al. (2018) for a simulation study demonstrating this. Coffman et al. (2018) also found the same phenomenon with continuous treatments. This is to say that you should not be relying on the features of twang or WeightIt to automatically address missing data; rather, you should use multiple imputation, which is the overwhelmingly recommended approach. There is a new R package that I have worked on called MatchThem, which is a wrapper for WeightIt for multiply imputed datasets. It makes using WeightIt with multiply imputed data really easy.

fakeacct3000 commented 4 years ago

That’s great news, thanks! I do have the imputation model built, but it takes a while to run due to the sampling weights and multilevel setup, so I’ve been tackling different bits before rerunning the MI with all of the outcome vars, etc., included.

I will definitely check out the MatchThem package, as that would be ideal.