markmfredrickson / optmatch

Functions for optimal matching in R
https://markmfredrickson.github.io/optmatch
Other
47 stars 14 forks source link

`match_on.numeric` to maintain dimension #189

Closed benthestatistician closed 4 years ago

benthestatistician commented 4 years ago

Example of problem:

 ( dat  <- data.frame(z=c(0,0,1), x=0:2, w=c(NA_real_, 1,2)) )
##   z x  w
## 1 0 0 NA
## 2 0 1  1
## 3 1 2  2
( within0  <- match_on(setNames(dat$w, nm=rownames(dat)),caliper=1, z=dat$z, data=dat) )
## within0
##       control
##treated 2
##      3 1

But observations with NA for x should be retained, as done by e.g. match_on.formula():

match_on(z~w, method="euclidean", caliper=1, data=dat)
##       control
## treated 2   1
##      3 1 Inf

This becomes a problem when trying to use the value of a match_on.numeric() call as within= argument to another match_on() call:

match_on(setNames(dat$x, nm=rownames(dat)), z=dat$z, data=dat, within=within0)
## Error in makedist(z, x, f, within) : 
##   Row and column names of within must match those of the data.
match_on(setNames(dat$x, nm=rownames(dat)), caliper=1, z=dat$z, data=dat, within=within0)
## Error in ismOpHandler("+", e1, e2) : non-conformable matrices
 match_on(z~x, data=dat, within=within0)
##Error in makedist(z, data, compute_mahalanobis, within) : 
##  Row and column names of within must match those of the data.
josherrickson commented 4 years ago

Adding names to z fixes this:

> ( dat  <- data.frame(z=c(0,0,1), x=0:2, w=c(NA_real_, 1,2)) )
  z x  w
1 0 0 NA
2 0 1  1
3 1 2  2
> ( within0  <- match_on(setNames(dat$w, nm=rownames(dat)),caliper=1, z=dat$z, data=dat) )
       control
treated 2
      3 1
> ( within0  <- match_on(setNames(dat$w, nm=rownames(dat)),caliper=1,
+                        z=setNames(dat$z, nm=rownames(dat)), data=dat) )
       control
treated 2   1
      3 1 Inf

The current documentation only requires names for x.

Options I see: 1) require names for z 2) copy over names from x to z if z's names are missing 3) fix this to work without names for z

(I think doing 3. without just doing 2. will require a heavy lift.)

benthestatistician commented 4 years ago

I agree about option 2, copy names over from x to z, being the most promising. In fact I see that that's already being done under the contingency that the user has supplied an exclude argument, as of 9fbfe5d . So it would just be a matter of doing more generally what's already done in a special case.

josherrickson commented 4 years ago

One additional wrinkle - if z is named but differently from x, how do you want to handle it? I was thinking:

Note that we already force length(x) == length(z).

josherrickson commented 4 years ago

I ended up just using the guidelines above since I was working on it; let me know if you think anything should change.

benthestatistician commented 4 years ago

Those rules make sense to me.