Closed odow closed 1 year ago
This paper is nice: https://optirisk-systems.com/wp-content/uploads/2018/06/2-Stage-LP.pdf
Section 4.5 is the box idea. Full algorithm is Section 3 of https://arxiv.org/pdf/math/0106151.pdf. It (TR) cut down the number of iterations by a lot:
but the time seemed weird. If you plot, the implementation is clearly O(N^2) when the level set is O(N)
The culprit seems to be that they used the multi-cut formulation:
But it's not immediately obvious to me why the multicut is needed. Jeff's paper goes into more detail on acceptance/rejection of new points and major/minor iterations with cut selection. It seems like we could get away with a much simpler algorithm, but perhaps I need to dig into the details more.
How about this:
Oscillation is okay, because it can only move by 5% of the domain. So close points should still help the value function. No change in problem complexity. Opt-in and yet automatic because uses information user already provides. Only a single parameter, for which 5% is reasonable choice. But it'll still converge, because it only takes 20 steps or so to move across the complete domain.
I'm not convinced this is useful, but we can use it as follows:
SDDP.train(model; forward_pass = SDDP.RegularizedForwardPass())
@jarandh: you'll need to update to the unreleased version of SDDP.jl with:
import Pkg
Pkg.pkg"add SDDP#master"
Ok. Sorry, I got it wrong. I think I got too exited and didn't saw the deterministic part 😅 In reality this has nothing to do with the quadratic regularization from Asamov. But I noticed that you mentioned Asamov regularization on another issue, I will add this information there.
I played around a little with this last night. I remember why I never liked this. There are too many tunable parameters. And people only present nice results in papers once they run a hyper-parameter turning. You have to pick the relative scaling of each state, and the initial penalty, and a sequence of the penalty parameter. Here's a related paper from Alessandro https://www.maxwell.vrac.puc-rio.br/54737/54737.PDF. They still guess "good" values for the scaling parameters with domain knowledge.
I also read through the paper mentioned in #623, and their non-convex thing shows nice results, but only after tuning. And even then, the step from nothing -> quadReg is bigger than the step from quadReg to nonconvex reg.
I also wondered if, instead of a quad penalty in the objective, we just did a box-constrained trust region. If you hit the same side of the box twice in a row, you make it bigger. If you hit opposite sides, you make it smaller? Now we need now scaling. Just bounds. There's no penalty costs. Dowside is you don't get an interior solution; you still have bang-bang, just in a smaller region.
Closes #623