markmfredrickson / optmatch

Functions for optimal matching in R
https://markmfredrickson.github.io/optmatch
Other
47 stars 14 forks source link

Fixes for issue 147 #148

Closed adamrauh closed 6 years ago

adamrauh commented 6 years ago

This has some changes to makedist.R and match_on.R to address the problems discussed here, plus some new tests in test.match_on.R.

coveralls commented 6 years ago

Coverage Status

Coverage increased (+0.1%) to 60.864% when pulling 3ef123dfc6da580983ad0d9f6506896ff53a470b on adamrauh:master into e5090a643e14fc5ac28d357b318768d4666aa36b on markmfredrickson:master.

coveralls commented 6 years ago

Coverage Status

Coverage increased (+0.1%) to 60.864% when pulling 3ef123dfc6da580983ad0d9f6506896ff53a470b on adamrauh:master into e5090a643e14fc5ac28d357b318768d4666aa36b on markmfredrickson:master.

benthestatistician commented 6 years ago

Looks good to me. @josherrickson ?

josherrickson commented 6 years ago

One note is the ordering in the ISM:

> data("nuclearplants")
> np <- nuclearplants[1:6,]
> (c1 <- caliper(match_on(pr ~ cost, data = np, method = "euclidean"), width = 100))
       control
treated   H   I   J   K
      A   0   0 Inf   0
      B Inf Inf   0 Inf
> np$cost[2] <- NA # Plant "I"
> (c2 <- caliper(match_on(pr ~ cost, data = np, method = "euclidean"), width = 100))
       control
treated   H   J   K   I
      A   0 Inf   0 Inf
      B Inf   0 Inf Inf
> fullmatch(pr ~ cost, within = c2, data = np)
   H    I    A    J    B    K 
 1.1 <NA>  1.1  1.2  1.2  1.1 

This resolves itself well within the actual match, but the ISM produced by caliper (or by match_on, with or without a within= argument) moves the NA'd column to the end. I don't immediately have a concern here (adding c1 + c2 works fine), but if this is easy to address, perhaps we should do so now so it doesn't cause any issues down the road. @benthestatistician feel free to override if you aren't concerned; other than this issue this looks ready for merging. (We could also pull this in just to move ahead with the CRAN submission and leave an open issue to investigate further.)

adamrauh commented 6 years ago

@josherrickson I don't think a fix for that would be too tricky, but I do think it might require saving some additional bookkeeping information, in addition to the dropped units. Depending on the direction of some of the other longer-term ideas that have been tossed around, that kind of code might eventually be taken out, anyway.

josherrickson commented 6 years ago

The original issue where we tackled ordering is #14. In it, @markmfredrickson originally created an order characteristic, then rolled it back to just using the data argument.

benthestatistician commented 6 years ago

@josherrickson's comment will prompt some docs updates (which @adamrauh has agreed to handle, in the process of closing out #147). But no further code changes necessary. Thanks for this contribution Adam!

markmfredrickson commented 6 years ago

I'm a little late to this discussion, but I wanted to add I'm not bothered by differences in ISM ordering. Thanks for all this great work.