Starting states for infinite horizon

adow031 commented 3 years ago

I've got an infinite horizon model with terminate_on_cycle=true and I've been looking at this section of the code.

        # ===== Begin: starting state for infinite horizon =====
        starting_states = options.starting_states[node_index]
        if length(starting_states) > 0
            # There is at least one other possible starting state. If our
            # incoming state is more than δ away from the other states, add it
            # as a possible starting state.
            if distance(starting_states, incoming_state_value) >
               options.cycle_discretization_delta
                push!(starting_states, incoming_state_value)
            end
            # TODO(odow):
            # - A better way of randomly sampling a starting state.
            # - Is is bad that we splice! here instead of just sampling? For
            #   convergence it is probably bad, since our list of possible
            #   starting states keeps changing, but from a computational
            #   perspective, we don't want to keep a list of discretized points
            #   in the state-space δ distance apart...
            incoming_state_value =
                splice!(starting_states, rand(1:length(starting_states)))
        end
        # ===== End: starting state for infinite horizon =====

I think that the best way to deal with steady-state is simply to start the next iteration using the ending state values from the current iteration, but this seems to be doing something a bit more complicated (and perhaps unnecessary, since the ending state values should reach a steady-state distribution over time, meaning you'll be sampling starting levels with the right distribution).

Also, since each iteration add at most one item to the starting_states vector and the splice! removes one value from the vector it's not really clear it the list would ever be more than two items (the ending state, and the initial state), this would over-train using the initial state (unless I misunderstand the code).

Also, I would like to be able to set the terminate_on_cycle=true flag (which is needed for the ending states to be stored) when I'm running a historical simulation, is this possible?

Thanks, Tony.

odow commented 3 years ago

A few points here:

I think that the best way to deal with steady-state is simply to start the next iteration using the ending state values from the current iteration ... since the ending state values should reach a steady-state distribution over time.

Is there a proof that it reaches the optimal steady-state distribution? I think I can construct a counter-example:

using SDDP, GLPK
graph = SDDP.LinearGraph(2)
SDDP.add_edge(graph, 2 => 1, 0.99)
model = SDDP.PolicyGraph(
    graph, 
    upper_bound = 100, 
    optimizer = GLPK.Optimizer,
) do sp, t
    @variable(sp, x >= 0, SDDP.State, initial_value = 10)
    @variable(sp, 0 <= u <= (t == 1 ? 10 : 1))
    @constraint(sp, x.out == x.in - u)
    @stageobjective(sp, (t == 1 ? 1 : 10) * u)
end

The initial first stage solution is going to be u=10, x.out=0 for a return of 10. But from then on x is trapped at 0 and you can never recover. The real optimal solution is to release u=1 for the first 10 stages and earn 10 * sum(0.99^i for i in 0:9) = 95.6.

removes one value from the vector it's not really clear it the list would ever be more than two items (the ending state, and the initial state), this would over-train using the initial state (unless I misunderstand the code).

So I go around the loop and drop off the ending state in starting_states. It has one element.
Next iteration, options.initial_state will be added to starting_states if it's sufficiently different. There are now two elements.
Then we splice! one of the two elements.

Case 1: the previous starting point

I go around the loop again and drop off the new end state. starting_states how has two elements.
Next iteration, options.initial_state won't be added to starting_states because it's already there. There are now two elements and it's the same as above.

Case 2: the initial state

I go around the loop again and drop off the new end state. starting_states now has two elements.
Next iteration, options.initial_state will be added to starting_states because it's not there. There are now three elements, and the list grows.

when I'm running a historical simulation

You mean using HistoricalSamplingScheme? You have control over which nodes you visit, so I don't see what this has to do with it. If you want to run a simulate with the InSampleMonteCarlo, you should be able to do that.

adow031 commented 3 years ago

I think that a sufficient condition for the model to be able to reach a steady state is that there there is a positive probability that you can transition from one state to any other state (similar to an irreducible / ergodic Markov chain), but the necessary conditions are probably less restrictive. Your example has an absorbing state, which is quite extreme, if we are thinking about modelling infinite horizon.

For my infinite horizon, I had terminate_on_cycle=true, so we would only ever get one more ending state - that's why I was confused, I didn't realise that there were being collated for every time you go around the loop in a single iteration.

Indeed, I meant HistoricalSamplingScheme, if I'm using that I can never enter this part of the code:

if terminated_due_to_cycle
        # Get the last node in the scenario.
        final_node_index = scenario_path[end][1]
        # We terminated due to a cycle. Here is the list of possible starting
        # states for that node:
        starting_states = options.starting_states[final_node_index]
        # We also need the incoming state variable to the final node, which is
        # the outgoing state value of the last node:
        incoming_state_value = sampled_states[end]
        # If this incoming state value is more than δ away from another state,
        # add it to the list.
        if distance(starting_states, incoming_state_value) >
           options.cycle_discretization_delta
            push!(starting_states, incoming_state_value)
        end
    end

Since terminated_due_to_cycle will always be false. I would like to force this to be true. (I want to be able to control the noise, and have the state wrap around in the training automatically.)

odow commented 3 years ago

which is quite extreme, if we are thinking about modelling infinite horizon.

Do you have a way of looking at a model and checking if it meets the necessary conditions? That seems pretty hard, even if you have domain knowledge of the problem.

I didn't realise that there were being collated for every time you go around the loop in a single iteration.

Each iteration is only one time around the loop.

I want to be able to control the noise, and have the state wrap around in the training automatically.

When training or simulating? If you're simulating, just do one hugely long simulation, then chop it up into years after the fact.

If you're training, we won't do that because you can't prove convergence. Did you try the defaults where we do forward passes of different lengths?

adow031 commented 3 years ago

Thanks for clarifying, I re-read the bit about the starting states, and I now understand what it does, but it seems that the list will only grow if you happen to select the initial state, and if the new point is close to a previously generated point the list will shrink. Do you have a proof of convergence for this? Is it sufficient to simply occasionally restart the method from the initial state vector (e.g. perform 100 iterations carrying the final states forward to the next iteration, and then for iteration 101 reset to the starting levels)?

The reason I don't like the terminate_on_dummy method is that at the beginning of the algorithm if the probability of looping is near 1, there are potentially very long forward passes that is learning very little about the shape of the value to go. The terminate_on_loop is much more effective in this regard.

I'm training with the HistoricalSamplingScheme, and all I really need is to be able to return the terminate_on_cycle=true flag so that the forward pass updates the starting states (just like it does for Monte Carlo). The reason this is useful for me is that I'm creating custom forward passes that are a combination of Monte Carlo and historical that in an infinite horizon setting should have each iteration starting in a different place - currently they all start from the same state.

(For my simulations, I perform a single simulation that is sliced up.)

odow commented 3 years ago

Do you have a proof of convergence for this?

We didn't include the algorithm in the paper, so we never proved. But: assume the state space is closed and convex. Given some discretization distance \delta there exists a finite number of points that can be added to the list of starting states. And if the value function is Lipschitz, then there is some epsilon-error in the final value function. Since we keep adding the initial state back, we will sample the route from the initial state and infinite number of times and so on.

Is it sufficient to simply occasionally restart the method from the initial state vector

You can also use this rollout_limit argument to limit the number of long passes early on https://github.com/odow/SDDP.jl/blob/092eeb32171f4ec82d57efa6e0e5650f303ca144/src/plugins/sampling_schemes.jl#L34-L36

For example to extend the length by 52 weeks every 100 iterations, do:

rollout_limit = i -> 52 * ceil(Int, i / 100)

odow commented 3 years ago

You could probably also write a plugin that swapped out different sampling schemes at different iterations.

It's not too much work. Here's an example of a trivial plug-in: https://github.com/odow/SDDP.jl/blob/092eeb32171f4ec82d57efa6e0e5650f303ca144/src/plugins/sampling_schemes.jl#L393-L430

odow commented 3 years ago

You could also write a wrapper around the Historical one to allow you to return the terminated_on_cycle

adow031 commented 3 years ago

Thanks for those suggestions, they sound like they'll achieve what I need. I'll probably come back for help when I can't figure out how to actually implement them.

odow commented 3 years ago

Totally un-tested, but maybe something like this:

mutable struct SamplingSchemeSwapper{F} <: AbstractSamplingScheme
    f::F
    counter::Int
    SamplingSchemeSwapper(f::Function) = new(f, 0)
end

function sample_scenario(g::PolicyGraph, s::SamplingSchemeSwapper; kwargs...)
    s.counter += 1
    return sample_scenario(k, s.f(s.counter); kwargs...)
end

sampling_scheme = SamplingSchemeSwapper() do iteration
    return SDDP.InSampleMonteCarlo(terminate_on_cycle = iteration < 100)
end

odow commented 3 years ago

Did you make any progress on this?

adow031 commented 3 years ago

Sorry; I needed to engage my brain a bit before embarking on this - and that hasn't happened, yet.

I'm closing this, since you've given me a few ways to do - I just need to go through it and work out the right approach for my needs.

Thanks.

adow031 commented 3 years ago

I'm still working on this...

What is the expected behaviour of terminate_on_cycle? It appears to terminate after a cycle is detected and the node (node 1) is sampled a second time. This results in twice as many cuts being present at node 1 than other nodes. (I have a linear network with 52 nodes and an additional arc from node 1 to node 52. (The behaviour I'm describing is for the InSampleMonteCarlo sampling scheme.)

It also makes it unclear (to me) whether the correct initial state is being stored for node 1 for the subsequent forward pass.

Thanks.

odow commented 3 years ago

This results in twice as many cuts being present at node 1 than other nodes

I think that makes sense. You need a cut for x0 and a cut for x52.

adow031 commented 3 years ago

Typically (without the cycle) there would be no cuts stored in week 52. With the cycle we get n cuts stored in stage 52 and an extra n cuts in stage 1. I'm not sure that makes sense.

In my view, the cycle should only be directly affecting the end of horizon value function, not week 1's value function.

I made this simple example to try to make sense of what is happening:

using JuMP, SDDP, Random, GLPK

include("sddp_modifications.jl")

graph = SDDP.LinearGraph(4)
SDDP.add_edge(graph, 4 => 1, 0.5)

function subproblem_builder(subproblem::Model, node::Int)
    # State variables
    @variable(subproblem, volume, SDDP.State, initial_value = 4.0)

    # Random variables
    @variable(subproblem, inflow)

    Ω = [1.0]
    P = [1.0]
    SDDP.parameterize(subproblem, Ω, P) do ω
        return JuMP.fix(inflow, ω)
    end

    # Transition function and constraints
    if node==1
        @constraint(subproblem, volume.out == volume.in + inflow - 4.0)
    else
        @constraint(subproblem, volume.out == volume.in + inflow)
    end

    # Stage-objective
    @stageobjective(subproblem, volume.in)

    return subproblem
end

model = SDDP.PolicyGraph(
    subproblem_builder,
    graph;
    sense = :Min,
    lower_bound = 0.0,
    optimizer = GLPK.Optimizer,
)

SDDP.train(model; iteration_limit = 10,
           sampling_scheme = SDDP.InSampleMonteCarlo(terminate_on_cycle = true)
           )

SDDP.train(model; iteration_limit = 10,
          sampling_scheme = SDDP.InSampleMonteCarlo(max_depth=4)
          )

There are four stages, and each stage has an inflow of 1, but in the first stage 4 units of volume are removed. The objective is the sum of the incoming volumes. The cycle should have volume.in = [4, 1, 2, 3] across the four stages, and just repeat this. However, here is the result from training with terminate_on_cycle=true:

------------------------------------------------------------------------------
                      SDDP.jl (c) Oscar Dowson, 2017-21

Problem
  Nodes           : 4
  State variables : 1
  Scenarios       : Inf
  Solver          : serial mode

Numerical stability report
  Non-zero Matrix range     [1e+00, 1e+00]
  Non-zero Objective range  [1e+00, 1e+00]
  Non-zero Bounds range     [0e+00, 0e+00]
  Non-zero RHS range        [4e+00, 4e+00]
No problems detected

 Iteration    Simulation       Bound         Time (s)    Proc. ID   # Solves
        1    1.000000e+01   1.200000e+01   1.999855e-03          1          9
        2    1.000000e+01   1.600000e+01   3.999949e-03          1         18
        3    1.400000e+01   1.800000e+01   1.499987e-02          1         29
        4    1.400000e+01   1.900000e+01   1.799989e-02          1         40
        5    2.000000e+00   1.900000e+01   2.099991e-02          1         51
        6   -2.000000e+00   1.900000e+01   2.300000e-02          1         60
        7   -1.000000e+00   1.900000e+01   2.499986e-02          1         71
        8    1.000000e+01   1.950000e+01   2.799988e-02          1         80
        9    1.000000e+01   1.975000e+01   2.900004e-02          1         89
       10   -2.000000e+00   1.975000e+01   3.099990e-02          1         98

Terminating training with status: iteration_limit

I would have expected the simulation to be 10 every time. Have I missed something?

odow commented 3 years ago

Yeah I'm not sure. I have't played with this much so something probably needs fixing.

Perhaps we should return scenario_path [1:end-1] here https://github.com/odow/SDDP.jl/blob/5687376e0ed92009a42e2c140edb24bb0c371740/src/plugins/sampling_schemes.jl#L272

adow031 commented 3 years ago

I dug through the code a bit and managed to fix the issue. My fix is slightly different to what you suggested (which didn't quite work):

I return the full sequence of nodes in the cycle, but do not add the final stage objective to the total:

        # Cumulate the stage_objective.
        if !terminated_due_to_cycle || depth < length(scenario_path)
            cumulative_value += subproblem_results.stage_objective
        end

However, the wrong incoming state was being appended; so I changed it to end-1.

        # We also need the incoming state variable to the final node, which is
        # the outgoing state value of the last node:
        incoming_state_value = sampled_states[end-1]

These changes have resulted in the expected output:

 Iteration    Simulation       Bound         Time (s)    Proc. ID   # Solves
        1    1.000000e+01   1.150000e+01   3.000021e-03          1         11
        2    1.000000e+01   1.575000e+01   3.999949e-03          1         22
        3    1.000000e+01   1.787500e+01   6.999969e-03          1         33
        4    1.000000e+01   1.893750e+01   1.399994e-02          1         44
        5    1.000000e+01   1.946875e+01   1.600003e-02          1         55
        6    1.000000e+01   1.973438e+01   1.799989e-02          1         66
        7    1.000000e+01   1.986719e+01   1.999998e-02          1         77
        8    1.000000e+01   1.993359e+01   2.199984e-02          1         88
        9    1.000000e+01   1.996680e+01   2.399993e-02          1         99
       10    1.000000e+01   1.998340e+01   2.600002e-02          1        110

Terminating training with status: iteration_limit
------------------------------------------------------------------------------

These changes have been made to the forward pass function. However, I think the proper fix would be to not evaluate the repeated node in the forward pass; however, when I tried this, I didn't have a reference to the final node for which I needed to set the starting state.

adow031 commented 3 years ago

Actually, ignore my last fix. This is the real fix.

    # First up, sample a scenario. Note that if a cycle is detected, this will
    # return the cycle node as well.
    TimerOutputs.@timeit SDDP_TIMER "sample_scenario" begin
        scenario_path, terminated_due_to_cycle =
            sample_scenario(model, options.sampling_scheme)
    end

    if terminated_due_to_cycle
        final_node = scenario_path[end]
        scenario_path=scenario_path[1:end-1]
    end

# Get the last node in the scenario.
        final_node_index = final_node[1]
        # We terminated due to a cycle. Here is the list of possible starting
        # states for that node:
        starting_states = options.starting_states[final_node_index]

This fixes both the starting state issue and the extra cuts that were being inserted at node 1.

odow commented 1 year ago

@adow031 worked around this by adding a new sampling scheme and a new forward pass: https://github.com/EPOC-NZ/JADE.jl/issues/19.

odow / SDDP.jl

Starting states for infinite horizon #445