markmfredrickson / optmatch

Functions for optimal matching in R
https://markmfredrickson.github.io/optmatch
Other
47 stars 14 forks source link

subproblemSuccess() to pull its answers from MCFSolutions@subproblems table #178

Open benthestatistician opened 5 years ago

benthestatistician commented 5 years ago

..when that table is available. Its current logic can serve as a fallback for cases where it isn't.

benthestatistician commented 2 years ago

As things currently stand [proj1-nodeprices ce38fbc], there are a few conditions where we don't record subproblem info, e.g. solve_reg_fm_prob.R#L87-L94:

    if (floor(min.cpt) > ceiling(max.cpt) |     #inconsistent max/min
        ceiling(1/min.cpt) < floor(1/max.cpt) | #controls per treatment
        !rfeas |  !cfeas  # either no controls or no treatments
        )
  {
    ans <- rep(NA_integer_, nrows + ncols)
    names(ans) <- c(rownames, colnames)
    return(list(cells=ans, err=0, MCFSolution=NULL))

We'll have to address this in order to address the current issue. It seems like it could come in handly also to have captured partial info for the nodes table, if this can be done without ruining semantics of columns we won't have reliable info for when we haven't located a meaningful subproblem (e.g. price, upstream_not_down). Nothing from such a subproblem need be added to the arcs table.

benthestatistician commented 3 months ago

@adamrauh reports via email that subproblems tables are showing feasible=FALSE for subproblems that are actually feasible:

data(nuclearplants)
mm <- match_on(pr ~ cost + t1 + t2, data=nuclearplants)
# not infeasible as given
f <- fullmatch(mm, data=nuclearplants, max.controls=3)
attr(f, "MCFSolutions")@subproblems

Object of class "SubProbInfo"
  groups flipped      hashed_dist  resolution lagrangian_value dual_value feasible   exceedance
1          FALSE 623.423420488867 0.001066667          34.2503   34.22637    FALSE -0.003563345

which appears to be a bug.

It seems that the "solutions==1L" at https://github.com/markmfredrickson/optmatch/blob/4d9e3423adc095e591f3d5824d35c0e2c676d6e1/R/solve_reg_fm_prob.R#L130 was meant to be a "solution==-1L". This based on my recollection of things, buttressed by how I see fmatch() dealing with infeasibility, e.g. here: https://github.com/markmfredrickson/optmatch/blob/4d9e3423adc095e591f3d5824d35c0e2c676d6e1/R/fmatch.R#L139-L151

@adamrauh

adamrauh commented 3 months ago

It appears that something wonky might be happening here re: tracking feasibility. Using the following as a problem that should be infeasible:

data(nuclearplants)
mm <- match_on(pr ~ cost + t1 + t2, data=nuclearplants)
# infeasible as given
options("optmatch_verbose_messaging" = TRUE)
f <- fullmatch(mm, data=nuclearplants, max.controls=2)

generates a warning about infeasibility: Warning message: In fullmatch.matrix(mm, data = nuclearplants, max.controls = 2) : The problem is infeasible with the given constraints; some units were omitted to allow a match.

There are two attempts to solve from within .fullmatch_with_recovery():

https://github.com/markmfredrickson/optmatch/blob/4d9e3423adc095e591f3d5824d35c0e2c676d6e1/R/fullmatch.R#L569

and

https://github.com/markmfredrickson/optmatch/blob/4d9e3423adc095e591f3d5824d35c0e2c676d6e1/R/fullmatch.R#L587

If I take a look at how the temp objects for each attempt look, there are no -1s:

Browse[1]> temp$solution
  [1] 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
 [66] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
[131] 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
[196] 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
Browse[1]> temp$solution
  [1] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
 [66] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
[131] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
[196] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0

so it doesn't look like that condition is getting tripped in fmatch: https://github.com/markmfredrickson/optmatch/blob/4d9e3423adc095e591f3d5824d35c0e2c676d6e1/R/fmatch.R#L139-L151

Not sure what the solution is yet, but just documenting for now @benthestatistician

benthestatistician commented 3 months ago

In this case, the original f.m. problem was infeasible but the two f.m. problems that you call out, producing tmp2 and feeding a call to return(), are both feasible. As a result of the fact of their having adjusted the max.controls and omit.fraction parameters, respectively. Things appear to me to be as they should.

adamrauh commented 3 months ago

Ah, ok -- thanks. In that case, I think the solution might be just as straightforward as you initially expected. I think perhaps we want !any(temp$solution==-1L), as I understand -1 is returned when the problem is infeasible, and we want to check for feasibility.

With the change implemented here:

data(nuclearplants)
mm <- match_on(pr ~ cost + t1 + t2, data=nuclearplants)
# not infeasible as given
f <- fullmatch(mm, data=nuclearplants, max.controls=3)
attr(f, "MCFSolutions")@subproblems
Object of class "SubProbInfo"
  groups flipped      hashed_dist  resolution lagrangian_value dual_value feasible   exceedance
1          FALSE 623.423420488867 0.001066667          34.2503   34.22637     TRUE -0.003563345

as expected

benthestatistician commented 3 months ago

Nice!

Could you also add tests asserting of a known infeasible subproblem that it's recorded as such, and also of a known feasible problem that it's recorded as such? And also of a problem with some feasible and some infeasible subproblems that they are recorded as such? (E.g., start by breaking the nuclearplants example into 2 exact matching classes based on value of pt.)

As to !any(temp$solution==-1L) for feasiblity: if this is based on a read-through of fmatch.R to confirm how infeasibility is recorded there, then yes I agree. If not, perhaps it's better to stay closer to my intention in writing that line of code, which was to record feasiblity via any(temp$solution==1L) (with "solution" not "solutions"). As I may have decided on that after a review of fmatch(). If that turns out not to pass tests, of course, then !any(temp$solution==-1L) is a good next thing to try.

benthestatistician commented 3 months ago

The following two lines of fmatch() lead me to expect that infeasibility will in these cases manifest itself with temp$solution all equal to 0L, with no 1Ls. With the LEMON solver: https://github.com/markmfredrickson/optmatch/blob/4d9e3423adc095e591f3d5824d35c0e2c676d6e1/R/fmatch.R#L238-L240 and with the RELAX-IV solver: https://github.com/markmfredrickson/optmatch/blob/4d9e3423adc095e591f3d5824d35c0e2c676d6e1/R/fmatch.R#L256-L258 My conclusion would be that any(temp$solution==1L) should be used as the test of feasibility.

benthestatistician commented 3 months ago

... in addition, fmatch() itself should be using either the appropriate one of these feasibility indicators to populate the feasibility entry in its subproblems table, but instead I see: https://github.com/markmfredrickson/optmatch/blob/4d9e3423adc095e591f3d5824d35c0e2c676d6e1/R/fmatch.R#L319-L325

This might be addressed in tandem with filling out an MCFSolutions object to return with infeasible problems for which the solver winds up being bypassed: https://github.com/markmfredrickson/optmatch/blob/4d9e3423adc095e591f3d5824d35c0e2c676d6e1/R/fmatch.R#L139-L150 Potentially this should be broken out as its own minor issue.

adamrauh commented 3 months ago

The following two lines of fmatch() lead me to expect that infeasibility will in these cases manifest itself with temp$solution all equal to 0L, with no 1Ls. With the LEMON solver:

https://github.com/markmfredrickson/optmatch/blob/4d9e3423adc095e591f3d5824d35c0e2c676d6e1/R/fmatch.R#L238-L240

and with the RELAX-IV solver: https://github.com/markmfredrickson/optmatch/blob/4d9e3423adc095e591f3d5824d35c0e2c676d6e1/R/fmatch.R#L256-L258

My conclusion would be that any(temp$solution==1L) should be used as the test of feasibility.

Just flagging that I've amended an old commit here to reflect this logic. Of course, this doesn't deal with the potential MCFSolutions problem, so just a small patch for now.

adamrauh commented 2 months ago

I've documented some relevant updates over here. But, to summarize: Streamlining the UI to always include an explicit tracking of feasibility via the subproblems table is doable , but will require either the creation of a new class, or modifications to the expectations of existing ones. The linked updates should nonetheless move the needle on this in the right direction, with some tweaks/simplifications to feasibility tracking.