odow / SDDP.jl

A JuMP extension for Stochastic Dual Dynamic Programming
https://sddp.dev
Other
309 stars 61 forks source link

Add regularization for deterministic first-stage #624

Closed odow closed 1 year ago

odow commented 1 year ago

I played around a little with this last night. I remember why I never liked this. There are too many tunable parameters. And people only present nice results in papers once they run a hyper-parameter turning. You have to pick the relative scaling of each state, and the initial penalty, and a sequence of the penalty parameter. Here's a related paper from Alessandro https://www.maxwell.vrac.puc-rio.br/54737/54737.PDF. They still guess "good" values for the scaling parameters with domain knowledge.

I also read through the paper mentioned in #623, and their non-convex thing shows nice results, but only after tuning. And even then, the step from nothing -> quadReg is bigger than the step from quadReg to nonconvex reg.

I also wondered if, instead of a quad penalty in the objective, we just did a box-constrained trust region. If you hit the same side of the box twice in a row, you make it bigger. If you hit opposite sides, you make it smaller? Now we need now scaling. Just bounds. There's no penalty costs. Dowside is you don't get an interior solution; you still have bang-bang, just in a smaller region.

Closes #623

odow commented 1 year ago

This paper is nice: https://optirisk-systems.com/wp-content/uploads/2018/06/2-Stage-LP.pdf

Section 4.5 is the box idea. Full algorithm is Section 3 of https://arxiv.org/pdf/math/0106151.pdf. It (TR) cut down the number of iterations by a lot:

image

but the time seemed weird. If you plot, the implementation is clearly O(N^2) when the level set is O(N)

image

The culprit seems to be that they used the multi-cut formulation:

image

But it's not immediately obvious to me why the multicut is needed. Jeff's paper goes into more detail on acceptance/rejection of new points and major/minor iterations with cut selection. It seems like we could get away with a much simpler algorithm, but perhaps I need to dig into the details more.

odow commented 1 year ago

How about this:

Oscillation is okay, because it can only move by 5% of the domain. So close points should still help the value function. No change in problem complexity. Opt-in and yet automatic because uses information user already provides. Only a single parameter, for which 5% is reasonable choice. But it'll still converge, because it only takes 20 steps or so to move across the complete domain.

odow commented 1 year ago

I'm not convinced this is useful, but we can use it as follows:

 SDDP.train(model; forward_pass = SDDP.RegularizedForwardPass())

@jarandh: you'll need to update to the unreleased version of SDDP.jl with:

import Pkg
Pkg.pkg"add SDDP#master"
Thuener commented 3 weeks ago

Ok. Sorry, I got it wrong. I think I got too exited and didn't saw the deterministic part 😅 In reality this has nothing to do with the quadratic regularization from Asamov. But I noticed that you mentioned Asamov regularization on another issue, I will add this information there.