Closed nchemine closed 3 years ago
So funny you asked, both of these features are available in the development version of MatchIt
on my GitHub. With nearest neighbor matching, when an argument to exact
is specified, matching takes place seperately within each level of the exact matching variables, which will speed up execution. I have plans to allow this to be parallelized, too, but I haven't implemented that yet. Also, if you match separately within subgroups (i.e., using different calls to matchit()
), you can now use a special rbind()
method to bind the several match.data()
outputs together into a single output for effect estimation. You do still have to assess balance within each matchit
object separately, though, as these cannot be combined. You can be clever and use the rbind()
output with cobalt
if you retain the unmatched units in the match.data()
calls.
To install the devlopment version on my GitHub, you can run devtools::install_github("ngreifer/MatchIt")
.
Hello Noah,
This is amazing!! Thank you so much! I have tried it this morning and it has worked brilliantly indeed. I am just wondering what the implications are of using this development version. I am doing professional biomedical data analysis and need to be sure the results are correct. Is there anything I should be aware of? When do you plan to release this feature into the "publicly available" version? Thank you very much again and kind regards,Nathalie Op vrijdag 30 april 2021 05:04:10 CEST schreef Noah Greifer @.***>:
So funny you asked, both of these features are available in the development version of MatchIt on my GitHub. With nearest neighbor matching, when an argument to exact is specified, matching takes place seperately within each level of the exact matching variables, which will speed up execution. I have plans to allow this to be parallelized, too, but I haven't implemented that yet. Also, if you match separately within subgroups (i.e., using different calls to matchit()), you can now use a special rbind() method to bind the several match.data() outputs together into a single output for effect estimation. You do still have to assess balance within each matchit object separately, though, as these cannot be combined. You can be clever and use the rbind() output with cobalt if you retain the unmatched units in the match.data() calls.
To install the devlopment version on my GitHub, you can run devtools::install_github("ngreifer/MatchIt").
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi Noah, is this new feature implemented from MatchIt 4.2.0? I'm trying optimal pair matching with exact matching on some selected variables, and mahalanobis distance on other variables.
I can't get it to work with the full dataset which has about 500k rows (of which about 150k are in the treated group), i keep running out of memory.... Do i have to use a specific syntax for the matching to take place separately in each level of the exact matching variables? I'm currently using something like this:
match1 <- matchit(
experiment_group ~ .,
method = "optimal",
mahvars = ~ age + children + comedy + drama + ents + factual + learning + music + news + sport ,
exact = c("gender", "acorn_category", "hf"),
data = db
)
Yes, that is a feature in 4.2.0, but only for nearest neighbor matching. Exact matching with optimal matching is handled by the optmatch
package, which may not be able to handle such a large dataset. The ability to handle large datasets is an advantage of nearest neighbor over optimal matching. By setting verbose = TRUE
with method = "nearest"
you can also track the progress within each category of the exact
variables. In general, nearest neighbor and optimal matching yield similar results, so you aren't losing anything by using nearest neighbor.
thank you so much for such a prompt reply, i'll try nearest!
I am using MatchIt on big data (2 mio records), so it does not run in one go. I need to split up my dataset into subsets (based on values of my exact variables) and run iterations of MatchIt on these subsets to make it work. It would be really great if MatchIt could do that automatically (or allow for users to specify that) - that would be an immense improvement in terms of computing efficiency!
Also, I am looking for a way to re-combine all my different MatchIt outputs (each run on a subset of my data) and do not know how to do that - any help would be greatly appreciated!!
Thank you for your great work!