markmfredrickson / optmatch

Functions for optimal matching in R
https://markmfredrickson.github.io/optmatch
Other
47 stars 14 forks source link

When given hinting info, only call solver on unsolved subproblems #167

Open benthestatistician opened 5 years ago

benthestatistician commented 5 years ago

This will require per-subproblem assessment of whether the subproblem solved in the hint and the current subproblem are the same in terms of

for the first of these we can compare distance hashes. For the second, it seems most expedient use defined update.fullmatch() & update.pairmatch() methods to verify at the problem level that subproblem-defining parameters have not changed. (Assuming we take that approach, this problem becomes subordinate to #76.)

benthestatistician commented 5 years ago

(I'm posting this b/c it's not compatible with a shortcut I'm taking at the moment, namely passing down subproblem-specific node price information but not other MCF problem info. So the issue serves as a reminder to cycle back and take care of this later.)

benthestatistician commented 5 years ago

(In 4dbf103, I took a step or two in the direction of a different approach, namely to assess similarly of match restrictions at the subproblem level though comparison of bookkeeping arcs of new and old problem. But along the way I realized that it's likely to be expensive to inspect the hint info to decide if the problem that had been solved was similar in relevant respects; so I reverted that commit (in [issue54-hinting 4dbf103] ) before publishing it.)

benthestatistician commented 5 years ago

As of f9c44f4 / version 0.9.11.9001, the MCFSolutions returned by fullmatch.matrix() stores Lagrangian and dual functional values associated with each solved subproblem, i.e. what's needed to associate a regret with each subproblem. A second stab at the problem with a lower tolerance value might decide whether to use the existing solution vs build a new one based on a comparison of that regret bound to a subproblem-specific tolerance.

benthestatistician commented 5 years ago

This issue begs the question of what it means for a problem to be "solved". Ordinarily the regret bound will be positive, even if it's very small.

The sensible thing would be to compare that bound to some sort of tolerance, which the user could manipulate. But we may want to move away from requesting tolerances of the user, instead asking for grid widths or something similar. Perhaps update.fullmatch() and update.pairmatch(), could accept tol parameters, while fullmatch() and pairmatch() just ignored those. Then the update methods would have those specifications available, perhaps for use within a loop.

On that approach, this issue becomes part of #76 . Also there are potentially relevant notes about tol under #85.

benthestatistician commented 3 years ago

Whether to re-do a problem that was nearly solved previously will come into focus pending the solution to #85 sketched in this comment.

Regarding the matter of how to determine expediently whether the current subproblem is identical to a subproblem described in a hint, I think we need to just add structure to the SubProbInfo tables, as described in this comment to #161. Once these things are done, and the SubProbInfo table receives the other updates discussed in that later comment, this issue won't look so hard.

benthestatistician commented 3 months ago

In mapping out upcoming MEERKT work, @markmfredrickson and I were led to the conclusion that the functionality requested in this issue should be a priority to implement. So I've raised this to the top of the board in the node prices project view, and I think I'd like to put @adamrauh in charge of it. This is open to discussion, but likely enough that for now I've assigned the issue to him.

One reason progress was slow on this issue is that it's tied up with the matter of whether we should continue to have tol= as part of the basic way of specifying an optmatch problem, vs another parameter for specifying a grid width, a multiplier to be applied to distances before rounding. I had trouble reaching a firm decision on this point. Returning to it after a spell, I think the right answer will be to introduce a new user-facing arg for specifying a grid width while retaining but perhaps depracating the tol= argument. When given a tol=argument, fullmatch() and pairmatch() would first solve the problem after a potentially arbitrary discretization of distances, then use a duality-based regret calculation, per #164, to decide whether the requested tolerance had been observed; if not, then they'd decrease the grid width and try again. This will call for taking care of a number of outstanding issues in the node prices project, potentially also obviating or simplifying some of them (such as #85). This particular issue is an example of one that gets simpler, I think, if problems are specified with grid widths rather than by regret tolerances.

If any of Mark, @josherrickson and/or @nullsatz would like to hear more about these plans before work gets going, please do speak up so we can include you in the meeting for that discussion.