markmfredrickson / optmatch

Functions for optimal matching in R
https://markmfredrickson.github.io/optmatch
Other
47 stars 14 forks source link

binary ISM ops should flag incompatible dimnames #190

Closed benthestatistician closed 4 years ago

benthestatistician commented 4 years ago

Should the following be an error or a warning? One of the two, I think -- in some way we ought to give informative feedback in this and similar errors.

 m <- matrix(c(1,Inf, 2, 3), nrow = 2, ncol = 2)
 A <- as.InfinitySparseMatrix(m)
 q <- matrix(c(1, 2, Inf, 4), nrow = 2, ncol = 2)
 colnames(A) <- paste("C", 1:2, sep = "")
 rownames(A) <- paste("T", 1:2, sep = "")
 colnames(q) <- paste("C", 1:2, sep = "")
 rownames(q) <- paste("t", 1:2, sep = "")

 Aq = A + q
 Aq
##        control
## treated  C1  C2
##      T1 Inf Inf
##      T2 Inf Inf
josherrickson commented 4 years ago

Per discussion with Ben:

1) Error if the dimnames are not in complete agreement. (Order irrelevant; I believe we already handle sorting in ISM ops. 2) If no dimnames, allow to proceed.

benthestatistician commented 4 years ago

In-the-wild example of trouble this creates:

  1. User fits psmA using data dat; then distA = match_on(psmA, data=dat). At this point dat has informative row names.
  2. User then applies dplyr::mutate() to add columns to dat, overwriting dat with the result of this. (From the user's perspective, it's the same data set, just with some new columns added on.)
  3. User now fits psmB, again using data dat; distB = match_on(psmB, data=dat).
  4. When user does fullmatch(psmA + psmB), problem is declared to be infeasible.

What's happened here is that at step 2, mutate silently dropped the informative row names from step 1, replacing them w/ as.character(1L:nrow(dat)); then at step 4, the ISM psmA + psmB forbids all matches, due to incompatible dimnames.

As things currently stand, very difficult to diagnose.

josherrickson commented 4 years ago

How do you want to address when one ISM has dimnames but the other does not? Proceed with warning similar to lacking data argument?

benthestatistician commented 4 years ago

Proceeding with a warning seems right to me.

josherrickson commented 4 years ago

This is completed. See below for examples of warnings and errors; feel free to offer amendments to text.

>  m <- matrix(c(1,Inf, 2, 3), nrow = 2, ncol = 2)
>  A <- as.InfinitySparseMatrix(m)
>  q <- matrix(c(1, 2, Inf, 4), nrow = 2, ncol = 2)
> A+q
       control
treated [,1] [,2]
   [1,]    2  Inf
   [2,]  Inf    7
>  colnames(A) <- paste("C", 1:2, sep = "")
>  rownames(A) <- paste("T", 1:2, sep = "")
> A+q
       control
treated  C1  C2
     T1   2 Inf
     T2 Inf   7
Warning message:
In ismOpHandler("+", e1, e2) :
  One matrix has dimnames and the other does not. Proper ordering is not guaranteed.
>  colnames(q) <- paste("C", 1:2, sep = "")
>  rownames(q) <- paste("t", 1:2, sep = "")
> A+q
Error in ismOpHandler("+", e1, e2) : 
  dimension names between matrices must be in agreement:
    extra rows in first matrix: T1, T2
    extra rows in second matrix: t1, t2