Reproducibility of the BART p-Score

lurui0421-tech commented 2 weeks ago

We have identified bugs in the implementation of BART for estimating propensity scores with the matchit function, particularly in handling the seed for reproducibility. How should we create reproducible result with distance=bart for propensity score matching?

ngreifer commented 2 weeks ago

Hi, sorry for the issue! Reproducibility with BART is a bit of an annoying issue, but there are solutions. The main problem is that using the usual seed doesn't work for multi-threading, which BART uses by default. You have two options: supply a seed directly to the function that estimates the BART PS, or request single threading. This is explained in the documentation for dbarts::bart2(). For the former, use matchit(., distance.options = list(seed = 1234)). For the latter, use matchit(., distance.options = list(n.threads = 1)) and set a seed as usual using set.seed(). In a future update, I'll make sure this is clarified in the documentation.

lurui0421-tech commented 2 weeks ago

Thanks for your help! This package includes a variety of algorithms for distance and matching, but there is no clear guidance on how to achieve reproducible results across different methods. Could we document the reproducibility aspects, as they are crucial for much of the work and research?

ngreifer commented 2 weeks ago

Thanks for the suggestion. I agree we can do a better job of documenting reproducibility. Any potential reproducibility issue within MatchIt is clearly documented, as there are very few random components in the matching methods implemented internally. One exception is when using m.order = "random", for which we do document the need to set a seed for reproducible results. For all other cases, any reproducibility issue comes from a package outside of MatchIt; for example, the issue you asked about had to do with the dbarts package and is very clearly documented in the help file for dbarts::bart2(). Similarly, any propensity score estimation methods that involve randomness will have the reproducibility issues documented in the respective package documentation. Genetic matching is another matching method where reproducibility could be an issue, but that is implemented in the Matching package and called by MatchIt functions. Since we can't control all these reproducibility issues outside MatchIt, we haven't documented them. I do agree we could do some better work to at least highlight where a reproducibility issue might arise (i.e., by indicating which matching or propensity score methods involve a random component).

kosukeimai / MatchIt

Reproducibility of the BART p-Score #202