Closed ngreifer closed 2 years ago
Odd that you didn't get the same results, as we're testing against a version from Paul's book: see tests/testthat/test.rank.mahal.R.
Rather, we're testing against something close to what was in his book (as noted in inline comments there). I don't recall the nature of the small change that I made, but perhaps you'll see it when you compare our reference implementation to yours.
Thank you so much, Ben, this was really helpful and I was able to resolve my issue (which was a coding error on my part).
One thing I noticed is that your function actually produces the squares of the distance values, whereas your other distance functions do not, so you may want to correct that if not intentional.
No thank you for the tip, Noah -- that's a plausible misfire, and something we'll want to correct.
(@josherrickson would you be willing to look at this? Presumably a matter of taking square-roots at the very end, plus adjusting docs & NEWS.)
Thanks for the report @ngreifer. Looks like you're right; Rosenbaum seems to refer to the D^2 directly as Mahalanobis distance (DOS, pg 170), hence the code in DOS that we are testing against is in fact returning the squared distance. Working on a fix in both our version of Rosenbaum's code used for testing, and the actual rank mahal function.
(@benthestatistician looks like your tweaks to Rosenbaums code were minor and unrelated to this: https://github.com/markmfredrickson/optmatch/commit/30fbded5b43e3569c5e4b68db7db9e79b2988bdb)
Fixed here: f6bc70d2
Sorry to bringing up a bug report right after you submitted to CRAN, but I found another strange behavior with the rank Mahalanobis distance. When a factor variable is supplied as a covariate to match_on()
, it seems to be converted to a numeric variable before the Mahalanobis distance is computed. This seems to be undesirable behavior unless the variable is an ordered factor, but it happens with unordered factors, too. Using the compute_smahal()
function from your tests:
data("lalonde", package = "MatchIt")
#Using factor race
form1 <- treat ~ age + race
X1 <- model.matrix(form1, data = lalonde)[,-1]
d1a <- compute_smahal(lalonde$treat, X1)
d1b <- as.matrix(optmatch::match_on(form1, data = lalonde,
method = "rank_mahalanobis"))
all.equal(d1a, d1b,
check.attributes = FALSE)
#> [1] "Mean relative difference: 0.03273117"
#Using numeric race
form2 <- treat ~ age + as.numeric(race)
X2 <- model.matrix(form2, data = lalonde)[,-1]
d2a <- compute_smahal(lalonde$treat, X2)
d2b <- as.matrix(optmatch::match_on(form2, data = lalonde,
method = "rank_mahalanobis"))
all.equal(d2a, d2b,
check.attributes = FALSE)
#> [1] TRUE
Created on 2022-05-16 by the reprex package (v2.0.1)
Happy to have your help in flagging such issues at any time, Noah. This is now #220.
Hi
optmatch
team,I'm wondering how the rank Mahalanobis distance is calculated in
match_on()
. I have tried to examine the source code but it involves several nested functions in R and C and in the end, I can't comprehend how the distance is computed. I assume you're using the definition from Rosenbaum's book, but when I try to implement that manually, I don't get the same result. Would it be possible to see an R implementation that yields the same result as the implementation you have in C so that I can get a better sense for how it is computed? Or, if you're using a formula other than the one described by Rosenbaum, could I see that formula? Thanks!Noah