Closed benthestatistician closed 7 years ago
In the example in #122 , we have
> length(within@rows)
[1] 4889857
whereas compute_rank_mahalanobis
has give a result of length
length(dists) [1] 22887076
matching the product of the overall number of rows and columns.
> within@dimension
[1] 2252 10163
@jwbowers since you've caught the rank_mahalanobis bug, would you be willing to help out w/ writing of test cases?
I figured I'd start by writing out some tests where you do a rank_mahalanobis calculation by pre-computing the ranks in R, then applying ordinary Mahalanobis. That led me to try the following, which does not work, but which is more likely to represent a misunderstanding on my part of what rank_mahalanobis is trying to do than a problem with the code. I.e., maybe you can help me improve it w/o much effort.
test_that("compute_rank.mahal results match ordinary Mahalanobis's", {
nr <- 10L
z <- integer(nr)
z[sample(1:nr, nr / 2L)] <- 1L
X <- as.matrix(1L:nr)
df <- data.frame(z = z, X)
expect_equivalent(match_on(z~., data=df, method="rank_mahalanobis"),
match_on(z~., data=df, method="mahalanobis")
)
}
)
(Once I get something like this going, should be easy to write test cases for the combo of exact matching and rank Mahalanobis.)
In tests/testthat/test.rank.mahal.R, compute_rank_mahalanobis
is tested against the rank Mahalanobis function that Rosenbaum has posted to his site. I confirmed that their results match up in this example also, even as they differ from those of compute_mahalanobis
.
The likely cause of the difference is underlying differences in how the covariance is being computed. optmatch has always calculated the covariances needed for Mahalanobis distance computations by pooling covariances in treatment and control groups. In defining rank-based Mahalanobis, Rosenbaum uses and unpooled covariance. Anyway there's no indication of anything amiss, other than the problem with compute_rank_mahalanobis
noted at the top of this ticket. We'll need another strategy for test cases.
...in the event that there's an exact matching constraint in force. The function presently computes and returns a distance for each treatment/control combination, even when provided a list of specific treatment-control matching possibilities for which a distances is needed.
to do's: