the notation `lower_bound`

WalterMadelim commented 10 months ago

Hello, odow. I find the notation lower_bound in your code to be obscure to some extent.

This kwarg name is firstly found in Constructing-the-model in An-introduction-to-SDDP.jl, later emphasized in Choosing-an-initial-bound.

Unfortunately, though, as a reader, I cannot infer that this lower_bound is meant to be an initial lower bound for the cost-to-go term (did you call this $\theta$ or "Approxmiate-Q" a bellmanfunction in your code?) in a NODE_SUBPROBLEM rather than the lower bound for the WHOLE Multi-stage model.

Take the code from Choosing-an-initial-bound for an example:

model = SDDP.LinearPolicyGraph(
    stages = 3,
    sense = :Min,
    lower_bound = 0.,
    optimizer = Gurobi.Optimizer,
) do subproblem, t
    ...
end

The lower_bound appears in a context where all other terms are related to the whole SDDP model. Thus I might think: is this lower_bound the initial lower bound that will rise steadily and finally meet the confidence interval of the upper bound, after trainning?

A fortiori, I believe this lower_bound to be stagewise, and here is the clue:

using SDDP,Gurobi
model = SDDP.LinearPolicyGraph(
    stages = 3,
    sense = :Min,
    lower_bound = 3.299999, #  3.299999 > 0., the original bound
    optimizer = Gurobi.Optimizer,
) do subproblem, t
    @variable(subproblem, x >= 0, SDDP.State, initial_value = 2)
    @variable(subproblem, u >= 0)
    @variable(subproblem, v >= 0)
    @constraint(subproblem, x.out == x.in - u)
    @constraint(subproblem, u + v == 1.5)
    @stageobjective(subproblem, t * v)
end

t = 1 # or 2
model.nodes[t].bellman_function.global_theta.theta |> lower_bound |> println # 3.299999
t = 3 # the final stage
model.nodes[t].bellman_function.global_theta.theta |> lower_bound |> println # 0 (no cost-to-go at final)

(By the way, what's the difference between bf.global_theta and bf.local_thetas, where bf = model.nodes[t].bellman_function?) Moreover, we run the following immediately,

SDDP.train(model)
simulations = SDDP.simulate(
    model,
    1,  # Number of replications
    [:x, :u, :v],
);
sum(t * simulations[1][t][:v] for t in 1:3) |> println # 4.699999333333333, which is == SDDP.calculate_bound(model)

the value in the last line shall be 3.5 when the lower_bound is kept to be its original value 0., incicating that we get an inferior solution by lifting the value of lower_bound from 0. to 3.299999 (because sense = :Min).

This lower_bound remind me of the $L_n$ in Eq(4.2) in the famous SDDiP paper when I first read it last year, which was used when constructing the Integer Optimility cut. It seems the $L_n$ should be chosen carefully there.

But I am not clear is this the case in your package? Is this lower_bound used to construct cuts (apart from providing an initial lower_bound for $\theta$)?

You refer to this in your doc Choosing-an-initial-bound:

The bound should not be as large as possible (since this will help with convergence and the numerical issues discussed above), but if chosen too small, it may cut off the feasible region and lead to a sub-optimal solution.

What is meant by "large", "small"? And what does the first half sentence mean? Can you leave a more clear explanation?

odow commented 10 months ago

this lower_bound is meant to be an initial lower bound for the cost-to-go term ... in a NODE_SUBPROBLEM rather than the lower bound for the WHOLE Multi-stage model.

It's both.

The lower bound is an initial bound on the cost-to-go variable in each node of the graph.

It is a bit confusing, because we use the same bound for every node in the graph. Therefore, it is the minimum of the lower bounds for each subproblem. This is usually (but not always) the lower bound of the first stage problem.

I believe this lower_bound to be stagewise

Correct. The only valid bound in your example is lower_bound = 0 because v = 0 is a feasible point.

Thus I might think: is this lower_bound the initial lower bound that will rise steadily and finally meet the confidence interval of the upper bound, after trainning?

Yes

did you call this \theta or "Approxmiate-Q" a bellmanfunction in your code? By the way, what's the difference between bf.global_theta and bf.local_thetas, where bf = model.nodes[t].bellman_function?

I wouldn't recommend looking too deeply into this part of the code. We use a non-standard value function that I've never fully described in a paper: https://github.com/odow/SDDP.jl/blob/e957848668544acb584e1714b4116710edff79a0/src/visualization/value_functions.jl#L9-L34

Is this lower_bound used to construct cuts (apart from providing an initial lower_bound for )?

No. Read: https://sddp.dev/stable/explanation/theory_intro/#Preliminaries:-Kelley's-cutting-plane-algorithm

What is meant by "large", "small"?

The words are vague because it is problem dependent.

And what does the first half sentence mean? Can you leave a more clear explanation?

A better bound for M in Kelley's cutting plane algorithm will require fewer iterations. See again https://sddp.dev/stable/explanation/theory_intro/#Preliminaries:-Kelley's-cutting-plane-algorithm

in the famous SDDiP paper

https://sddp.dev/stable/guides/add_integrality/#Does-SDDP.jl-implement-the-SDDiP-algorithm?

WalterMadelim commented 10 months ago

It's both.

Yes, you're wise, I didn't reckon with this possibility 🥲.

about the Lagrangian duality handler

# =========================== LagrangianDuality ============================== #

"""
    LagrangianDuality(;
        method::LocalImprovementSearch.AbstractSearchMethod =
            LocalImprovementSearch.BFGS(100),
    )
Obtain dual variables in the backward pass using Lagrangian duality.
## Arguments
 * `method`: the `LocalImprovementSearch` method for maximizing the Lagrangian
   dual problem.

Given the problem..., we solve the problem `max L(λ)`, where:

L(λ) = min Cᵢ(x̄, u, w) + θᵢ - λ' h(x̄) st (x̄, x′, u) in Xᵢ(w) ∩ S

and where `h(x̄) = x̄ - x`.
"""

Question: It seems that the LocalImprovementSearch.BFGS(100) method is used to solve the Max-min dual problem. Can this method make sure to solve this dual problem to Global optimal? (i.e., provide a $\lambda^ $) (you've mentioned that it's difficult to solve, does it means it takes too long time, or does it means it actually cannot guarantee a globally $\lambda^$) Because (I remembered) that only globally optimal dual variable can construct cuts that are tight, such that make sure the whole algorithm to converge (really close the gap between ub (CI) and lb).

Moreover, when the number of nodes in the graph is large, or there is uncertainty, we are not aware of another algorithm that can claim to find a globally optimal policy. (from Convergence)

Do you have some opinions about the cutting edge status of multi-stage stochastic programming research? Or any future plans toward this field? I've also read some papers in this field, and will keep on researching. Maybe we can exchange some ideas? Thanks.🙂

odow commented 10 months ago

It seems that the LocalImprovementSearch.BFGS(100) method is used to solve the Max-min dual problem. Can this method make sure to solve this dual problem to Global optimal?

No, and it doesn't need to. Any feasible dual can make progress.

Do you have some opinions about the cutting edge status of multi-stage stochastic programming research?

I think we should view multistage stochastic programming as a heuristic for computing good (but not optimal) policies and show that it can deliver value. That means less focus on new theoretical approaches, and more focus on using what SDDP.jl can already do.

We should try to train convex approximations of non-convex models. See:
- https://ieeexplore.ieee.org/abstract/document/9521833
- https://sddp.dev/stable/tutorial/pglib_opf/
We should find applications that are not t-stage linear graphs.
- https://onlinelibrary.wiley.com/doi/10.1002/net.21932

I would ignore SDDiP.

Or any future plans toward this field?

https://github.com/odow/SDDP.jl/issues/599 But this is an engineering challenge, not a research question.

WalterMadelim commented 10 months ago

more focus on using what SDDP.jl can already do.

Yes, I've seen many attractive characteristics offered by your package including multi-threading parallel computing, objective states etc..

I've read the paper in the 3rd link, even before I read your documentation. I'll check the 2nd link , and I'll keep on reading your code for some time.

But this is an engineering challenge, not a research question.

You wake me up (LOL). Since I'm a student majoring in Electrical Engineering honestly (I nearly forget, whatever). I should have read plenty of ieee papers (like the 1st link) as my vocation🥲.

WalterMadelim commented 10 months ago

@odow Hello again, but did you specify the file "pglib_opf_case5_pjm.m" in your Formulation section, the 2nd line in the code block.

filename = joinpath(@__DIR__, "pglib_opf_case5_pjm.m")

with respect to the 2nd link in your last reply.

Here is the result with Error

julia> function build_model(model_type)
           filename = joinpath(@__DIR__, "pglib_opf_case5_pjm.m")
           data = PowerModels.parse_file(filename)
           return SDDP.PolicyGraph(
               SDDP.UnicyclicGraph(0.95);
               sense = :Min,
               lower_bound = 0.0,
               optimizer = Ipopt.Optimizer,
           ) do sp, t
               power_model = PowerModels.instantiate_model(
                   data,
                   model_type,
                   PowerModels.build_opf;
                   jump_model = sp,
               )
               # Now add hydro power models. Assume that generator 5 is hydro, and the
               # rest are thermal.
               pg = power_model.var[:it][:pm][:nw][0][:pg][5]
               sp[:pg] = pg
               @variable(sp, x >= 0, SDDP.State, initial_value = 10.0)
               @variable(sp, deficit >= 0)
               @constraint(sp, balance, x.out == x.in - pg + deficit)
               @stageobjective(sp, objective_function(sp) + 1e6 * deficit)
               SDDP.parameterize(sp, [0, 2, 5]) do ω
                   return SDDP.set_normalized_rhs(balance, ω)
               end
               return
           end
       end
build_model (generic function with 1 method)

julia> 

julia> convex = build_model(PowerModels.DCPPowerModel)
ERROR: SystemError: opening file "K:\\de8\\pglib_opf_case5_pjm.m": No such file or directory
Stacktrace:
  [1] systemerror(p::String, errno::Int32; extrainfo::Nothing)
    @ Base .\error.jl:176
  [2] #systemerror#82
    @ .\error.jl:175 [inlined]
  [3] systemerror
    @ .\error.jl:175 [inlined]
  [4] open(fname::String; lock::Bool, read::Nothing, write::Nothing, create::Nothing, truncate::Nothing, append::Nothing)
    @ Base .\iostream.jl:293
  [5] open
    @ .\iostream.jl:275 [inlined]
  [6] open(f::PowerModels.var"#37#38"{Bool, Bool, String}, args::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Base .\io.jl:393
  [7] open
    @ .\io.jl:392 [inlined]
  [8] #parse_file#36
    @ K:\judepot\packages\PowerModels\4b7PA\src\io\common.jl:9 [inlined]
  [9] parse_file
    @ K:\judepot\packages\PowerModels\4b7PA\src\io\common.jl:8 [inlined]
 [10] build_model(model_type::Type)
    @ Main .\REPL[5]:3
 [11] top-level scope
    @ REPL[6]:1

WalterMadelim commented 10 months ago

Oh, I find it.🥲 But it's in the docs/src

odow commented 10 months ago

I'll add a better link: https://github.com/odow/SDDP.jl/pull/683

odow / SDDP.jl

the notation `lower_bound` #676

about the Lagrangian duality handler