odow / SDDP.jl

A JuMP extension for Stochastic Dual Dynamic Programming
https://sddp.dev
Other
304 stars 62 forks source link

Comparing DH vs HD problem #628

Closed hghayoom closed 1 year ago

hghayoom commented 1 year ago

Hi Oscar, I am working with your amazing tool and I want to see some examples for comparing DH and HD problems. I found this Inventory Problem which compares these two approaches. But it is a little different from what I expected. I need some examples without a "discount factor=0.95". I know that in your dissertation for MOO, you have compared result of DH and HD, But I cannot see how you calculated DH results image I think New version of this model is Here I'm wondering if there are other examples for DH? because a lot of times we are not aware of the uncertainties that might happen for us.

Thanks

odow commented 1 year ago

The MOO model is actually: https://github.com/odow/MOO. That model doesn't use SDDP. The decision-hazard was a two-stage stochastic program, in which we first compute the stocking rate, and then observe the price. The hazard-decision solves a different instance of the model for each price.

There is no tooling in SDDP.jl for explicitly modeling decision-hazard problems. Instead, you need to model the problem in which the decision is a state variable that is decided in the previous stage.

So in the HD example, the order_quantity is a control variable: https://github.com/odow/SDDP.jl/blob/09c8b6da7389b5b866e26ccf3a611a677ad5c8a7/papers/policy_graph/inventory_management.jl#L21

and in the DH example, the order_quantity is a state variable: https://github.com/odow/SDDP.jl/blob/09c8b6da7389b5b866e26ccf3a611a677ad5c8a7/papers/policy_graph/inventory_management.jl#L68 and we use the incoming value in the subproblem: https://github.com/odow/SDDP.jl/blob/09c8b6da7389b5b866e26ccf3a611a677ad5c8a7/papers/policy_graph/inventory_management.jl#L84

Take a look at

https://ieeexplore.ieee.org/abstract/document/7859406

https://www.researchgate.net/profile/Alexandre-Street/publication/328228810_On_the_Solution_of_Decision-Hazard_Multistage_Stochastic_Hydrothermal_Scheduling_Problems/links/5bbfc748a6fdcc2c91f6b0e5/On-the-Solution-of-Decision-Hazard-Multistage-Stochastic-Hydrothermal-Scheduling-Problems.pdf

hghayoom commented 1 year ago

Thanks for your response. It was very insightful. I have some other questions about this inventory management example. I am using Visual studio code and I run the model in debug mode. then I can see the final variables in the variable window

  1. After running this example I see 500 results( because we simulate it 500 times) but within each simulation, there are different numbers of stages( for example one of them is only one stage and the other one contains 3 stages.). Are there any convergence criteria for that leading to different stages?( see the picture) image

  2. In DH problem, I cannot save the outputs in results of simulation : results = SDDP.simulate(model, 500,[:state, :order_quantity,:lost_demand,:disposed_units,:backordered_units,:held_units,:demand,],) therefor, after I moved the definition of these variables to the beginning, I could save them. I moved these : https://github.com/odow/SDDP.jl/blob/09c8b6da7389b5b866e26ccf3a611a677ad5c8a7/papers/policy_graph/inventory_management.jl#L73 to Here: https://github.com/odow/SDDP.jl/blob/09c8b6da7389b5b866e26ccf3a611a677ad5c8a7/papers/policy_graph/inventory_management.jl#L68

In DH problem, for odd stages (e.g 1,3,5,...) demand and other parameters are zero. I thought the reason is due to defining a decision node at the beginning of DH policy graph here: https://github.com/odow/SDDP.jl/blob/09c8b6da7389b5b866e26ccf3a611a677ad5c8a7/papers/policy_graph/inventory_management.jl#L55 and we are somehow expanding stages and each two consequential nodes are representative of a time step. Results of even nodes ( like 2): image

Results of odd nodes ( like 3): image

  1. I will try to convert one of your basic examples from HD to DH and see if it is correct.
odow commented 1 year ago

Are there any convergence criteria for that leading to different stages?( see the picture)

If your graph is cyclic, then there will be different numbers of stages. This ensures that the sample mean of the cost for each scenario is an unbiased estimator for the objective value of the problem.

You can simulate to a fixed depth with:

SDDP.simulate(
    model,
    500;
    sampling_scheme = SDDP.InSampleMonteCarlo(;
        max_depth::Int = 5,
        terminate_on_dummy_leaf= false,
    ),
)

In DH problem, I cannot save the outputs in results of simulation

Try

SDDP.simulate(model, 500; skip_undefined_variables = true)

In DH problem, for odd stages (e.g 1,3,5,...) demand and other parameters are zero. I thought the reason is due to defining a decision node at the beginning of DH policy graph here: and we are somehow expanding stages and each two consequential nodes are representative of a time step.

This follows from the definition of our graph:

https://github.com/odow/SDDP.jl/blob/09c8b6da7389b5b866e26ccf3a611a677ad5c8a7/papers/policy_graph/inventory_management.jl#L53-L61

Take a look at the :node_index fields of each simulation. They'll go decision, recourse, decision, ...

You don't actually need the two nodes though. You could do this (untested so maybe typos, but you get the idea):

function infinite_lin_DH()
    graph = SDDP.LinearGraph(1)
    SDDP.add_edge(graph, 1 => 1, 0.95)
    return SDDP.PolicyGraph(
        graph;
        sense = :Min,
        lower_bound = 0.0,
        optimizer = HiGHS.Optimizer,
    ) do subproblem, node
        @variable(subproblem, -10 <= state <= 10, SDDP.State, initial_value = 0)
        @variable(subproblem, 0 <= order_quantity, SDDP.State, initial_value = 0)
        @variables(subproblem, begin
            lost_demand >= 0
            disposed_units >= 0
            backordered_units >= 0
            demand
        end)
        @constraints(subproblem, begin
            state.out == state.in + order_quantity.in - demand + lost_demand - disposed_units
            backordered_units >= -state.out
        end)
        Pg = rand(Distributions.TruncatedNormal(5, 2, 0, 10), 50)
        sort!(Pg)
        SDDP.parameterize(ω -> JuMP.fix(demand, ω), subproblem, Pg)
        @stageobjective(
            subproblem,
            20 * order_quantity.out +   # Order cost
            2 * state.out +             # Holding cost ch.
            10 * backordered_units +    # Backorder cost cb.
            10 * disposed_units +       # Disposal cost cd.
            100 * lost_demand           # Lost demand cost cl.
        )
    end
end

DH vs HD is a modeling choice. It doesn't change anything about the algorithm or require special structures.

hghayoom commented 1 year ago

Thank you Oscar for the illustrations. they help me a lot for shaping my codes. I wrote a code that compares two problems of HD and DH in four steps. I need to learn them to use in my work. HD problem works properly. But for DH problem I get infeasibility issue that you described here. but I don't know why I should get this error. I have defined positive and negative slack variables (+Borrow and -s) so that the constraint should be feasible all times. But I don't know why this happens.

using SDDP, HiGHS, Test, GLPK,Distributions,Random,Statistics

function finite_lin_DH()

    graph2 = SDDP.Graph(
        :root_node,
        # we have 4 time steps and for each one we need a decicion node and a resource node
        [:decision1, :recourse1,:decision2, :recourse2,:decision3, :recourse3,:decision4, :recourse4],
        [
            (:root_node => :decision1, 1.0),
            (:decision1 => :recourse1, 1.0),
            (:recourse1 => :decision2, 1.0),
            (:decision2 => :recourse2, 1.0),
            (:recourse2 => :decision3, 1.0),
            (:decision3 => :recourse3, 1.0),
            (:recourse3 => :decision4, 1.0),
            (:decision4 => :recourse4, 1.0),
            # (:recourse4 => :decision1, 0.95),
        ],
    )
dd=1
model = SDDP.PolicyGraph(
    graph2,
        # bellman_function = SDDP.BellmanFunction(lower_bound = 0),
        # optimizer = GLPK.Optimizer,
        sense = :Min,
        lower_bound = 0.0,
        optimizer = HiGHS.Optimizer,
    ) do sp, node
        @variable(sp, 0 <= x <= 8, SDDP.State, initial_value = 7)# reservoir water level in each step
        @variable(sp, 0 <= p , SDDP.State, initial_value = 2) # producing power with cost

        @variables(sp, begin
            y >= 0 # using reservoir water
           # p >= 0 #Power Generation
            s>=0   # spill water
            ξ  # input
            Borow>=0#Borrow water if water level is negative
        end)

        if  node in [:decision1 , :decision2 ,:decision3 ,:decision4 ]    
            @constraint(sp, x.out == x.in)
            @stageobjective(sp, 20 * p.out)

        else 
            @constraints(sp, begin
            p.in + y == 9
            x.out == x.in - y + ξ - s+ Borow
            end)
            Pg=rand([2, 3],4)
            # Pg = rand(Distributions.TruncatedNormal(5, 2, 0, 10), 4)
            # sort!(Pg)
            SDDP.parameterize(sp, Pg) do ω
                JuMP.fix(ξ, ω)
            end
            @stageobjective(sp, 200 * s+500*Borow)

        end
    end
    return model
end

function finite_lin_HD()

dd=1
model = SDDP.PolicyGraph(
    # this time we define a line graph
        SDDP.LinearGraph(4),
        sense = :Min,
        lower_bound = 0.0,
        optimizer = HiGHS.Optimizer,
    ) do sp, node
        @variable(sp, 0 <= x <= 8, SDDP.State, initial_value = 7)# reservoir water level in each step
       # @variable(sp, 0 <= p , SDDP.State, initial_value = 2) # producing power with cost

        @variables(sp, begin
            y >= 0 # using reservoir water
            p >= 0 #Power Generation
            s>=0   # spill water
            ξ  # input
            Borow>=0#Borrow water if water level is negative
        end)

        # if  node in [:decision1 , :decision2 ,:decision3 ,:decision4 ]    
        #     @constraint(sp, x.out == x.in)
        #     @stageobjective(sp, 20 * p.out)

        # else 
            @constraints(sp, begin
            p + y == 9
            x.out == x.in - y + ξ - s+ Borow
            end)
            Pg=rand([2, 3],4)
            # Pg = rand(Distributions.TruncatedNormal(5, 2, 0, 10), 4)
            # sort!(Pg)
            SDDP.parameterize(sp, Pg) do ω
                JuMP.fix(ξ, ω)
            end
            @stageobjective(sp, 200 * s+500*Borow+20 * p)

        # end
    end
    return model
end

a=1

Random.seed!(1234)
begin
    model = infinite_lin_DH()
    SDDP.train(model, iteration_limit = 30, print_level = 1)
    results = SDDP.simulate(model, 100, [:y,:p,:ξ,:x,:s,:Borow],)
    objectives = [sum(s[:stage_objective] for s in simulation) for simulation in results]
    sample_mean = round(Statistics.mean(objectives); digits = 2)
    sample_ci = round(1.96 * Statistics.std(objectives) / sqrt(500); digits = 2)
    println("DH Confidence_interval = $(sample_mean) ± $(sample_ci)")
end

Random.seed!(1234)
begin
    model = infinite_lin_HD()
    SDDP.train(model, iteration_limit = 30, print_level = 1)
    results = SDDP.simulate(model, 100, [:y,:p,:ξ,:x,:s,:Borow],)
    objectives = [sum(s[:stage_objective] for s in simulation) for simulation in results]
    sample_mean = round(Statistics.mean(objectives); digits = 2)
    sample_ci = round(1.96 * Statistics.std(objectives) / sqrt(500); digits = 2)
    println("HD Confidence_interval = $(sample_mean) ± $(sample_ci)")
end
odow commented 1 year ago

Here's how I would write your code.

Some tips:

julia> using SDDP, HiGHS, Random, Statistics

julia> function finite_lin_DH()
           return SDDP.LinearPolicyGraph(;
               stages = 4,
               sense = :Min,
               lower_bound = 0.0,
               optimizer = HiGHS.Optimizer,
           ) do sp, node
               @variables(sp, begin
                   0 <= x <= 8, SDDP.State, (initial_value = 7)
                   p >= 0, SDDP.State, (initial_value = 2)
                   y >= 0
                   s >= 0
                   ξ
                   borrow >= 0
               end)
               @constraints(sp, begin
                   # This constraint is wrong, because p.in could be > 9 already.
                   # p.in + y == 9
                   p.in + y >= 9
                   x.out == x.in - y + ξ - s + borrow
               end)
               SDDP.parameterize(sp, rand([2, 3], 4)) do ω
                   JuMP.fix(ξ, ω)
               end
               @stageobjective(sp, 200s + 500borrow + 20p.out)
           end
           return model
       end
finite_lin_DH (generic function with 1 method)

julia> function finite_lin_HD()
           model = SDDP.LinearPolicyGraph(
               stages = 4,
               sense = :Min,
               lower_bound = 0.0,
               optimizer = HiGHS.Optimizer,
           ) do sp, node
               @variables(sp, begin
                   0 <= x <= 8, (SDDP.State, initial_value = 7)
                   p >= 0
                   y >= 0
                   s >= 0
                   ξ
                   borrow >= 0
               end)
               @constraints(sp, begin
                   p + y == 9
                   x.out == x.in - y + ξ - s + borrow
               end)
               SDDP.parameterize(sp, rand([2, 3],4)) do ω
                   JuMP.fix(ξ, ω)
               end
               @stageobjective(sp, 200s + 500borrow + 20p)
           end
           return model
       end
finite_lin_HD (generic function with 1 method)

julia> function train_and_compute_ci(model; replications = 100)
           SDDP.train(model; iteration_limit = 30, print_level = 0)
           results = SDDP.simulate(model, replications)
           objectives = [
               sum(s[:stage_objective] for s in simulation) for simulation in results
           ]
           sample_mean = round(Statistics.mean(objectives); digits = 2)
           sample_ci = round(1.96 * Statistics.std(objectives) / sqrt(replications); digits = 2)
           println("Confidence_interval = $(sample_mean) ± $(sample_ci)")
           return
       end
train_and_compute_ci (generic function with 1 method)

julia> Random.seed!(1234)
MersenneTwister(1234)

julia> begin
           model = finite_lin_DH()
           train_and_compute_ci(model)
       end
Confidence_interval = 356.0 ± 2.67

julia> Random.seed!(1234)
MersenneTwister(1234)

julia> begin
           model = finite_lin_HD()
           train_and_compute_ci(model)
       end
Confidence_interval = 379.6 ± 3.15

This shows that being able to adapt your decision to the realization is worth ~$24.

hghayoom commented 1 year ago

Thanks, Oscar. That is wonderful. Understanding both DH and HD is very important because we can calculate EVPI. I suggest you add this example to your example. I learned a loooot from this.

Thanks again