markmfredrickson / optmatch

Functions for optimal matching in R
https://markmfredrickson.github.io/optmatch
Other
47 stars 14 forks source link

node price merge #222

Closed josherrickson closed 5 months ago

josherrickson commented 2 years ago

An issue thread to discuss the merge of node price and master branches.

josherrickson commented 2 years ago

The prepareMatching function is not, as far as I can tell, ever called in solve_reg_fm_prob. Should it be? The documentation suggests it should be:

https://github.com/markmfredrickson/optmatch/blob/2c3ab994f898bedc301988346ed9d6b059ed2511/R/solve_reg_fm_prob.R#L21

There's also an issue that prepareMatching returns the wrong column names (treatment, control, distance) compared to what edgelist.data.frame is expecting (i, j, dist).

josherrickson commented 2 years ago

Currently, the edgelist.data.frame function fails due to only extracting node names from the i column, not the j column.

https://github.com/markmfredrickson/optmatch/blob/2c3ab994f898bedc301988346ed9d6b059ed2511/R/edgelist.R#L108-L116

The prepareMatching function creates a column of only directed edges, so this causes an empty result:

> v <- c(1, Inf, 2,
+ 2, 1, Inf,
+ 3, 2, 1)
> m <- matrix(v, nrow = 3, ncol = 3)
> colnames(m) <- c("A", "B", "C")
> rownames(m) <- c("D", "E", "F")
> pm <- prepareMatching(m)
> pm
  control treated distance
1       A       D        1
2       A       F        2
3       B       D        2
4       B       E        1
5       C       D        3
6       C       E        2
7       C       F        1
> edgelist(pm)
Error in edgelist(pm) : 
  setequal(colnames(x), c("i", "j", "dist")) is not TRUE

The j column is all NA due to the node names only being extracted from the i column, and subsequently complete.cases is called.

josherrickson commented 2 years ago

(@benthestatistician I'm assigning you for feedback, not expecting you to make any relevant modifications.)

benthestatistician commented 2 years ago
  1. Re prepareMatching(), I set out in this branch to replace it with edgelist(). Pretty sure its mention in the docs is a vestige of docs I must have created for defunct function SubDivStrat() in the midst of the transition.
  2. Re edgelist(), I believe name extraction from levels(x$i) but not levels(x$j) was deliberate in the sense that I had an expectation that those two should be equal.
    The expectation ought to have been enforced with a stopifnot(); why wasn't it? Perhaps I didn't get around to it b/c I couldn't decide between identical() versus setequal(). My current inclination would be to go with identical(), at least until that shows itself to be limiting in some way.
benthestatistician commented 2 years ago

NB: We (I) should maybe attend to #161 before we merge the proj1-nodeprices branch into master, or have it supercede the master branch. But merging (what we want from) the master branch into this one will still be helpful and should proceed.

benthestatistician commented 5 months ago

@josherrickson, I suspect this issue can be closed -- whatever outstanding problems there were would have been addressed during the hasty merge we made to restore optmatch to cran. Would you be willing to look over the issue, record for posterity any comments that seem relevant and close it if appropriate?

josherrickson commented 5 months ago

No real comments - Given that the failing test is obviously no longer failing, I see no need to keep this open.