markmfredrickson / optmatch

Functions for optimal matching in R
https://markmfredrickson.github.io/optmatch
Other
47 stars 14 forks source link

infeasibility recovery via max flow #200

Open benthestatistician opened 3 years ago

benthestatistician commented 3 years ago

The process we call "infeasibility recovery" would ideally be maximizing the mean.controls parameter (i.e. seeking out the smallest feasible omit.fraction, as opposed to just settling for some feasible value of mean.controls as is currently done. I.e., it should be mapped to a max-flow problem, as distinct from min-cost flow.

To do's:

benthestatistician commented 3 years ago

Notes:

benthestatistician commented 1 month ago

The LEMON project's max flow finder is documented under the "Preflow" heading. An interface to this function is currently available via rlemon::MaxFlow(). I note this @adamrauh's reference, as well as for other future contributors considering helping us at moving this sub-project along.

I have a couple questions, for @josherrickson or someone else better situated than me to parse possibilities of our interaction with LEMON.

  1. Our purposes only call for the value of the max flow, not the flow itself. It happens that LEMON calculates the max flow value as part of a first phase that's separate from a second phase in which the flow is determined. If I'm reading LEMON's "Preflow" page correctly, it would seem that by initiating the Preflow algorithm via init() and startFirstPhase() as opposed to run() or runMinCut(), you can get it to skip the second phase and just return the max flow value. The rlemon package interfaces with these routines with code in src/max_flow.h#L62-L104
    I would be interested in @josherrickson's assessment of whether creating a first phase only version of this algorithm looks like a little job or a big one. Either for: (a) he himself (who knows LEMON pretty well), (b) @nullsatz (who knows Rcpp quite well); or (c) other contributors with limited familiarity with either LEMON or Rcpp. Looking at the code linked to above, I'm tempted to speculate that this would amount only to a light modification; but perhaps I'm being optimistic.
  2. In rlemon, max flow and min-cost flow both seem to begin by calling ListDigraph to set up the problem: compare code snippet above to, say, the NetworkSimplexRunner code. I wonder if there might potentially be further efficiencies to be realized by designing a specialized version of the call to LEMON that set up one and the same Digraph for a max flow problem, first, and then for a min-cost-flow problem that would come afterwards if the max flow value was found to be sufficiently large (otherwise just returning the desultory max flow value).
  3. Building on the last question, I wonder if it would also be feasible to design the specialized call to LEMON in such a way that the specifics of the min-cost flow problem to be solved at the second step depend on how the max flow value compares with other parameters that would be passed down from R. More specifically, would this be something that would seem to call for detailed knowledge of LEMON, detailed knowledge of Rcpp, both of these, or neither?
josherrickson commented 1 month ago

Adding custom algorithms to rlemon would be pretty easy, from the R side.

  1. Add the Rcpp function (e.g. the linked src/maxflow.h from above).
  2. Ensure it's exported in R/RcppExports.R (I believe this will be done automatically, but is easy enough to do manually if neeeded.)
  3. Add support for the algorithm in the R function, e.g. R/maxflow.R#L37-L41
  4. Adjust for differing input/output.

The C code would be (to me at least) the challenging part. I can take a crack at it, but I'd be more than happy to let @nullsatz go at it. I can help with the R side of course.