odow / SDDP.jl

A JuMP extension for Stochastic Dual Dynamic Programming
https://sddp.dev
Other
304 stars 62 forks source link

Getting out of local optimal SDDiP #777

Closed Thuener closed 2 weeks ago

Thuener commented 3 weeks ago

Hi Oscar,

now with SDDiP, we actually have a heuristic to solve the model. We don't proved convergence anymore. I can see that on several occasions, the model is stuck on a local optimal. I have been using BanditDuality to help the model to get out of those local optimal. Using different cuts helps get out of the local optimal. However, StrengthenedConicDuality can make much slower progress.

Any ideas on how to get out of the local optimal? Is there any way to use different duality_handlers depending on the bound stalling?

PS: I also get an issue with BanditDuality when trying to start back the train, it gets stuck with just one type of duality. Example:

     1883S -3.526655e+04  2.575658e+03  5.779006e+04    750405   1
      1885S -6.197976e+04  2.575544e+03  5.828998e+04    751967   1
      1887S -3.414856e+04  2.568755e+03  5.844276e+04    752881   1
      1889S -3.993974e+04  2.568373e+03  5.876231e+04    753719   1
      1890  -5.145811e+04  2.568373e+03  5.880455e+04    754762   1
      1891S -1.481099e+04  2.568067e+03  5.883862e+04    754869   1
      1893S -2.585710e+04  2.568097e+03  5.899596e+04    755583   1
      1895S -2.168459e+04  2.567502e+03  5.911727e+04    755921   1
      1897S -2.168718e+04  2.562634e+03  5.917999e+04    756171   1
      1899S -2.104896e+04  2.559852e+03  5.930590e+04    756765   1
      1900  -6.025239e+04  2.559852e+03  5.935892e+04    758124   1
      1901S -2.302431e+04  2.559217e+03  5.946921e+04    758411   1
      1903S -1.432304e+04  2.558576e+03  5.951825e+04    758661   1
      1905S -3.676079e+04  2.557187e+03  5.980848e+04    759567   1
      1909S -4.680980e+04  2.555810e+03  6.017820e+04    760987   1
-------------------------------------------------------------------
status         : time_limit
total time (s) : 6.017820e+04
total solves   : 760987
best bound     :  2.555810e+03
simulation ci  : -1.533718e+05 ± 1.864513e+04
numeric issues : 0
-------------------------------------------------------------------

Starting back training as I don't think we achieved convergence:

-------------------------------------------------------------------
 iteration    simulation      bound        time (s)     solves  pid
-------------------------------------------------------------------
         1  -2.247751e+04  2.555810e+03  1.210600e+01    761414   1
         2  -3.003325e+04  2.555810e+03  3.303300e+01    762025   1
         3  -1.146033e+05  2.555810e+03  1.077620e+02    764276   1
         4  -2.030632e+04  2.555810e+03  1.140230e+02    764455   1
         7  -6.857285e+04  2.555810e+03  1.704500e+02    766032   1
         9  -4.955000e+04  2.555810e+03  2.123530e+02    767198   1
        10  -3.825554e+04  2.555810e+03  2.452390e+02    768069   1
        13  -4.923447e+04  2.555810e+03  2.830820e+02    769098   1
        16  -2.347561e+04  2.555810e+03  3.140410e+02    769911   1
        19  -1.949708e+04  2.555810e+03  3.547390e+02    770996   1
        20  -4.641142e+04  2.555810e+03  3.917980e+02    771967   1
        22  -2.252014e+04  2.555810e+03  4.338440e+02    773057   1
        25  -1.398725e+04  2.555810e+03  4.714310e+02    773994   1
        26  -4.205176e+04  2.555810e+03  5.093740e+02    774985   1
        29  -1.976783e+04  2.555810e+03  5.408240e+02    775806   1
        33  -3.568327e+04  2.555810e+03  5.897890e+02    777094   1
        35  -1.578067e+04  2.555810e+03  6.226700e+02    777944   1
        36  -7.880938e+04  2.555810e+03  6.910770e+02    779639   1
        39  -3.428028e+04  2.555810e+03  7.372160e+02    780640   1
        41  -4.941172e+04  2.555810e+03  7.725960e+02    781538   1
        46  -3.661219e+04  2.555810e+03  8.222400e+02    782745   1
        49  -2.477594e+04  2.555810e+03  8.750570e+02    783990   1
        53  -4.000302e+04  2.555810e+03  9.189290e+02    785098   1
        56  -1.498087e+04  2.555810e+03  9.498160e+02    785839   1
        59  -3.442730e+04  2.555810e+03  9.898650e+02    786836   1
        61  -4.222634e+04  2.555810e+03  1.045653e+03    788134   1
        62  -5.144041e+04  2.555810e+03  1.095805e+03    789329   1
        65  -6.342666e+04  2.555810e+03  1.180221e+03    791394   1
        69  -1.599701e+04  2.555810e+03  1.212788e+03    792194   1
        70  -4.505639e+04  2.555810e+03  1.249560e+03    793073   1
        73  -2.727797e+04  2.555810e+03  1.289537e+03    793978   1
        76  -4.387685e+04  2.555810e+03  1.331148e+03    794955   1
        80  -1.189576e+05  2.555810e+03  1.475348e+03    798267   1

Then I'm stuck in local optimal and the StrengthenedConicDuality is never used again.

Thuener commented 3 weeks ago

I have changed the BanditDuality code for this strategy to change the duality handler if the bound stops moving. It would be something like this:

mutable struct _Stab{T}
    handler::T
end

mutable struct StabDuality <: SDDP.AbstractDualityHandler
    dual_handlers::Vector{_Stab}
    last_dual_handlers_index::Int

    function StabDuality(args::SDDP.AbstractDualityHandler...)
        return new(_Stab[_Stab(arg) for arg in args], 1)
    end
end

function Base.show(io::IO, handler::StabDuality)
    print(io, "StabDuality with multiple dual_handlers:")
    for dual_handler in handler.dual_handlers
        print(io, "\n * ", dual_handler.handler)
    end
    return
end

function StabDuality()
    return StabDuality(SDDP.ContinuousConicDuality(), SDDP.StrengthenedConicDuality(), SDDP.LagrangianDuality())
end

function SDDP.prepare_backward_pass(
    node::SDDP.Node,
    handler::StabDuality,
    options::SDDP.Options,
)
    if length(options.log) > 2 && isapprox(options.log[end-1].bound, options.log[end].bound; atol = 1e-6)
        handler.last_dual_handlers_index = handler.last_dual_handlers_index % length(handler.dual_handlers) +1
    end
    return SDDP.prepare_backward_pass(node, handler.dual_handlers[handler.last_dual_handlers_index].handler, options)
end

function SDDP.get_dual_solution(node::SDDP.Node, handler::StabDuality)
    return SDDP.get_dual_solution(node, handler.dual_handlers[handler.last_dual_handlers_index].handler)
end

function SDDP.duality_log_key(handler::StabDuality)
    return SDDP.duality_log_key(handler.dual_handlers[handler.last_dual_handlers_index].handler)
end
Thuener commented 3 weeks ago

For the case when restarting the train gets stuck...

The issue with the BanditDuality seems to be on this line: https://github.com/odow/SDDP.jl/blob/d2495d6a9a0886048d560df7e29c7b220656594c/src/plugins/duality_handlers.jl#L380

ContinuousConicDuality doesn't improve the bound; thus const_bound is true, then the train is stuck with ContinuousConicDuality. Maybe some rand choice on the arms?

odow commented 3 weeks ago

Any ideas on how to get out of the local optimal?

Nope.

Is there any way to use different duality_handlers depending on the bound stalling?

Nope. We'd need to write a new duality handler.

PRs to improve Bandit welcome :smile: I vaguely remember issues related to the const_bound. But it was a while ago...

Thuener commented 3 weeks ago

I will create a PR in the future for this. Right now, I'm just testing it. Sometimes when changing the dual_handler I'm getting:

    Termination status : OPTIMAL
    Primal status      : FEASIBLE_POINT
    Dual status        : NO_SOLUTION.

Any ideas on why?

Answering my own question... It is because we can't change the dual_handler just for some nodes, thus we have to change it only on the beginning of the backwards.

Thuener commented 3 weeks ago

I achieved very good results by updating the BanditDuality with a simple solution.

      1738  -5.260732e+04 -3.734935e+02  5.905173e+04    717890   1
      1739S -1.735529e+04 -3.737250e+02  5.918178e+04    718209   1
      1741S -1.561158e+04 -3.737984e+02  5.927766e+04    719023   1
      1743S -1.144220e+04 -3.738277e+02  5.931456e+04    719313   1
      1744S -2.994307e+04 -3.785066e+02  5.957528e+04    719952   1
      1746S -1.733834e+04 -3.786535e+02  5.967713e+04    720326   1
      1747  -4.087047e+04 -3.786535e+02  5.971645e+04    721469   1
      1748S -5.050328e+04 -3.788257e+02  6.018140e+04    722632   1
-------------------------------------------------------------------
status         : time_limit
total time (s) : 6.018140e+04
total solves   : 722632
best bound     : -3.788257e+02
simulation ci  : -1.380501e+05 ± 1.543136e+04
numeric issues : 0
-------------------------------------------------------------------

It does BanditDuality typically, but if it gets 2 (parameter convergence_min_logs) iterations with the same bound it changes for the next arm. This udpate gives more diversity and helps with the issue when you read the cuts and stay stuck with only one arm.


mutable struct BanditDuality <: AbstractDualityHandler
    arms::Vector{_BanditArm}
    last_arm_index::Int
    logs_seen::Int
    update_min_logs::Int
    convergence_min_logs::Int

    function BanditDuality(args::AbstractDualityHandler...)
        return new(_BanditArm[_BanditArm(arg, Float64[]) for arg in args], 1, 1, 10, 2)
    end
end

function prepare_backward_pass(
    node::Node,
    handler::BanditDuality,
    options::Options,
)
    # Just change once at each backward iteration
    if length(options.log) > handler.logs_seen
        _update_rewards(handler, options.log)
        handler.logs_seen = length(options.log)

        # Check for bound convergence stalling and change the handler if that is the case
        if length(options.log) > handler.convergence_min_logs && 
                isapprox(options.log[end].bound, options.log[end-handler.convergence_min_logs].bound; atol = 1e-6)
            index = (handler.last_arm_index % length(handler.arms)) +1 # next handler
            @info "Change dual handler from $(handler.last_arm_index) to $index"
            handler.last_arm_index = index
            arm = handler.arms[index]
        else
            arm = _choose_best_arm(handler)
        end
    else
        arm = handler.arms[handler.last_arm_index]
    end

    return prepare_backward_pass(node, arm.handler, options)
end
odow commented 3 weeks ago

iterations with the same bound it changes for the next arm

Oh that's a good idea

odow commented 3 weeks ago

I'm going to bed, but I sent you and invite to join as a collaborator. It should give you write access to edit my existing PR.

odow commented 3 weeks ago

Does #779 close this issue?

Thuener commented 2 weeks ago

Sorry, I will check, give me some days.

Thuener commented 2 weeks ago

Yes, I can confirm that the PR solved the issue with bound stalling. Thanks @odow!