markmfredrickson / optmatch

Functions for optimal matching in R
https://markmfredrickson.github.io/optmatch
Other
47 stars 14 forks source link

`compute_rank_mahalanobis` ignores index argument #128

Closed benthestatistician closed 7 years ago

benthestatistician commented 7 years ago

...in the event that there's an exact matching constraint in force. The function presently computes and returns a distance for each treatment/control combination, even when provided a list of specific treatment-control matching possibilities for which a distances is needed.

to do's:

benthestatistician commented 7 years ago

In the example in #122 , we have

> length(within@rows)
[1] 4889857

whereas compute_rank_mahalanobis has give a result of length

length(dists) [1] 22887076

matching the product of the overall number of rows and columns.

> within@dimension

[1] 2252 10163

benthestatistician commented 7 years ago

@jwbowers since you've caught the rank_mahalanobis bug, would you be willing to help out w/ writing of test cases?

I figured I'd start by writing out some tests where you do a rank_mahalanobis calculation by pre-computing the ranks in R, then applying ordinary Mahalanobis. That led me to try the following, which does not work, but which is more likely to represent a misunderstanding on my part of what rank_mahalanobis is trying to do than a problem with the code. I.e., maybe you can help me improve it w/o much effort.

test_that("compute_rank.mahal results match ordinary Mahalanobis's", {
   nr <- 10L
   z <- integer(nr)
    z[sample(1:nr, nr / 2L)] <- 1L

   X <- as.matrix(1L:nr)
   df <- data.frame(z = z, X)
   expect_equivalent(match_on(z~., data=df, method="rank_mahalanobis"),
                     match_on(z~., data=df, method="mahalanobis")
                     )
}
)

(Once I get something like this going, should be easy to write test cases for the combo of exact matching and rank Mahalanobis.)

benthestatistician commented 7 years ago

In tests/testthat/test.rank.mahal.R, compute_rank_mahalanobis is tested against the rank Mahalanobis function that Rosenbaum has posted to his site. I confirmed that their results match up in this example also, even as they differ from those of compute_mahalanobis.

The likely cause of the difference is underlying differences in how the covariance is being computed. optmatch has always calculated the covariances needed for Mahalanobis distance computations by pooling covariances in treatment and control groups. In defining rank-based Mahalanobis, Rosenbaum uses and unpooled covariance. Anyway there's no indication of anything amiss, other than the problem with compute_rank_mahalanobis noted at the top of this ticket. We'll need another strategy for test cases.