Closed adow031 closed 1 year ago
A few points here:
I think that the best way to deal with steady-state is simply to start the next iteration using the ending state values from the current iteration ... since the ending state values should reach a steady-state distribution over time.
Is there a proof that it reaches the optimal steady-state distribution? I think I can construct a counter-example:
using SDDP, GLPK
graph = SDDP.LinearGraph(2)
SDDP.add_edge(graph, 2 => 1, 0.99)
model = SDDP.PolicyGraph(
graph,
upper_bound = 100,
optimizer = GLPK.Optimizer,
) do sp, t
@variable(sp, x >= 0, SDDP.State, initial_value = 10)
@variable(sp, 0 <= u <= (t == 1 ? 10 : 1))
@constraint(sp, x.out == x.in - u)
@stageobjective(sp, (t == 1 ? 1 : 10) * u)
end
The initial first stage solution is going to be u=10
, x.out=0
for a return of 10
. But from then on x
is trapped at 0
and you can never recover. The real optimal solution is to release u=1
for the first 10 stages and earn 10 * sum(0.99^i for i in 0:9) = 95.6
.
removes one value from the vector it's not really clear it the list would ever be more than two items (the ending state, and the initial state), this would over-train using the initial state (unless I misunderstand the code).
starting_states
. It has one element.options.initial_state
will be added to starting_states
if it's sufficiently different. There are now two elements. splice!
one of the two elements.Case 1: the previous starting point
starting_states
how has two elements. options.initial_state
won't be added to starting_states
because it's already there. There are now two elements and it's the same as above.Case 2: the initial state
starting_states
now has two elements.options.initial_state
will be added to starting_states
because it's not there. There are now three elements, and the list grows.when I'm running a historical simulation
You mean using HistoricalSamplingScheme
? You have control over which nodes you visit, so I don't see what this has to do with it. If you want to run a simulate
with the InSampleMonteCarlo
, you should be able to do that.
I think that a sufficient condition for the model to be able to reach a steady state is that there there is a positive probability that you can transition from one state to any other state (similar to an irreducible / ergodic Markov chain), but the necessary conditions are probably less restrictive. Your example has an absorbing state, which is quite extreme, if we are thinking about modelling infinite horizon.
For my infinite horizon, I had terminate_on_cycle=true
, so we would only ever get one more ending state - that's why I was confused, I didn't realise that there were being collated for every time you go around the loop in a single iteration.
Indeed, I meant HistoricalSamplingScheme
, if I'm using that I can never enter this part of the code:
if terminated_due_to_cycle
# Get the last node in the scenario.
final_node_index = scenario_path[end][1]
# We terminated due to a cycle. Here is the list of possible starting
# states for that node:
starting_states = options.starting_states[final_node_index]
# We also need the incoming state variable to the final node, which is
# the outgoing state value of the last node:
incoming_state_value = sampled_states[end]
# If this incoming state value is more than δ away from another state,
# add it to the list.
if distance(starting_states, incoming_state_value) >
options.cycle_discretization_delta
push!(starting_states, incoming_state_value)
end
end
Since terminated_due_to_cycle
will always be false. I would like to force this to be true. (I want to be able to control the noise, and have the state wrap around in the training automatically.)
which is quite extreme, if we are thinking about modelling infinite horizon.
Do you have a way of looking at a model and checking if it meets the necessary conditions? That seems pretty hard, even if you have domain knowledge of the problem.
I didn't realise that there were being collated for every time you go around the loop in a single iteration.
Each iteration is only one time around the loop.
I want to be able to control the noise, and have the state wrap around in the training automatically.
When training or simulating? If you're simulating, just do one hugely long simulation, then chop it up into years after the fact.
If you're training, we won't do that because you can't prove convergence. Did you try the defaults where we do forward passes of different lengths?
Thanks for clarifying, I re-read the bit about the starting states, and I now understand what it does, but it seems that the list will only grow if you happen to select the initial state, and if the new point is close to a previously generated point the list will shrink. Do you have a proof of convergence for this? Is it sufficient to simply occasionally restart the method from the initial state vector (e.g. perform 100 iterations carrying the final states forward to the next iteration, and then for iteration 101 reset to the starting levels)?
The reason I don't like the terminate_on_dummy
method is that at the beginning of the algorithm if the probability of looping is near 1, there are potentially very long forward passes that is learning very little about the shape of the value to go. The terminate_on_loop
is much more effective in this regard.
I'm training with the HistoricalSamplingScheme
, and all I really need is to be able to return the terminate_on_cycle=true
flag so that the forward pass updates the starting states (just like it does for Monte Carlo). The reason this is useful for me is that I'm creating custom forward passes that are a combination of Monte Carlo and historical that in an infinite horizon setting should have each iteration starting in a different place - currently they all start from the same state.
(For my simulations, I perform a single simulation that is sliced up.)
Do you have a proof of convergence for this?
We didn't include the algorithm in the paper, so we never proved. But: assume the state space is closed and convex. Given some discretization distance \delta there exists a finite number of points that can be added to the list of starting states. And if the value function is Lipschitz, then there is some epsilon-error in the final value function. Since we keep adding the initial state back, we will sample the route from the initial state and infinite number of times and so on.
Is it sufficient to simply occasionally restart the method from the initial state vector
You can also use this rollout_limit
argument to limit the number of long passes early on
https://github.com/odow/SDDP.jl/blob/092eeb32171f4ec82d57efa6e0e5650f303ca144/src/plugins/sampling_schemes.jl#L34-L36
For example to extend the length by 52 weeks every 100 iterations, do:
rollout_limit = i -> 52 * ceil(Int, i / 100)
You could probably also write a plugin that swapped out different sampling schemes at different iterations.
It's not too much work. Here's an example of a trivial plug-in: https://github.com/odow/SDDP.jl/blob/092eeb32171f4ec82d57efa6e0e5650f303ca144/src/plugins/sampling_schemes.jl#L393-L430
You could also write a wrapper around the Historical one to allow you to return the terminated_on_cycle
Thanks for those suggestions, they sound like they'll achieve what I need. I'll probably come back for help when I can't figure out how to actually implement them.
Totally un-tested, but maybe something like this:
mutable struct SamplingSchemeSwapper{F} <: AbstractSamplingScheme
f::F
counter::Int
SamplingSchemeSwapper(f::Function) = new(f, 0)
end
function sample_scenario(g::PolicyGraph, s::SamplingSchemeSwapper; kwargs...)
s.counter += 1
return sample_scenario(k, s.f(s.counter); kwargs...)
end
sampling_scheme = SamplingSchemeSwapper() do iteration
return SDDP.InSampleMonteCarlo(terminate_on_cycle = iteration < 100)
end
Did you make any progress on this?
Sorry; I needed to engage my brain a bit before embarking on this - and that hasn't happened, yet.
I'm closing this, since you've given me a few ways to do - I just need to go through it and work out the right approach for my needs.
Thanks.
I'm still working on this...
What is the expected behaviour of terminate_on_cycle? It appears to terminate after a cycle is detected and the node (node 1) is sampled a second time. This results in twice as many cuts being present at node 1 than other nodes. (I have a linear network with 52 nodes and an additional arc from node 1 to node 52. (The behaviour I'm describing is for the InSampleMonteCarlo sampling scheme.)
It also makes it unclear (to me) whether the correct initial state is being stored for node 1 for the subsequent forward pass.
Thanks.
This results in twice as many cuts being present at node 1 than other nodes
I think that makes sense. You need a cut for x0
and a cut for x52
.
Typically (without the cycle) there would be no cuts stored in week 52. With the cycle we get n cuts stored in stage 52 and an extra n cuts in stage 1. I'm not sure that makes sense.
In my view, the cycle should only be directly affecting the end of horizon value function, not week 1's value function.
I made this simple example to try to make sense of what is happening:
using JuMP, SDDP, Random, GLPK
include("sddp_modifications.jl")
graph = SDDP.LinearGraph(4)
SDDP.add_edge(graph, 4 => 1, 0.5)
function subproblem_builder(subproblem::Model, node::Int)
# State variables
@variable(subproblem, volume, SDDP.State, initial_value = 4.0)
# Random variables
@variable(subproblem, inflow)
Ω = [1.0]
P = [1.0]
SDDP.parameterize(subproblem, Ω, P) do ω
return JuMP.fix(inflow, ω)
end
# Transition function and constraints
if node==1
@constraint(subproblem, volume.out == volume.in + inflow - 4.0)
else
@constraint(subproblem, volume.out == volume.in + inflow)
end
# Stage-objective
@stageobjective(subproblem, volume.in)
return subproblem
end
model = SDDP.PolicyGraph(
subproblem_builder,
graph;
sense = :Min,
lower_bound = 0.0,
optimizer = GLPK.Optimizer,
)
SDDP.train(model; iteration_limit = 10,
sampling_scheme = SDDP.InSampleMonteCarlo(terminate_on_cycle = true)
)
SDDP.train(model; iteration_limit = 10,
sampling_scheme = SDDP.InSampleMonteCarlo(max_depth=4)
)
There are four stages, and each stage has an inflow of 1, but in the first stage 4 units of volume are removed. The objective is the sum of the incoming volumes. The cycle should have volume.in = [4, 1, 2, 3] across the four stages, and just repeat this. However, here is the result from training with terminate_on_cycle=true
:
------------------------------------------------------------------------------
SDDP.jl (c) Oscar Dowson, 2017-21
Problem
Nodes : 4
State variables : 1
Scenarios : Inf
Solver : serial mode
Numerical stability report
Non-zero Matrix range [1e+00, 1e+00]
Non-zero Objective range [1e+00, 1e+00]
Non-zero Bounds range [0e+00, 0e+00]
Non-zero RHS range [4e+00, 4e+00]
No problems detected
Iteration Simulation Bound Time (s) Proc. ID # Solves
1 1.000000e+01 1.200000e+01 1.999855e-03 1 9
2 1.000000e+01 1.600000e+01 3.999949e-03 1 18
3 1.400000e+01 1.800000e+01 1.499987e-02 1 29
4 1.400000e+01 1.900000e+01 1.799989e-02 1 40
5 2.000000e+00 1.900000e+01 2.099991e-02 1 51
6 -2.000000e+00 1.900000e+01 2.300000e-02 1 60
7 -1.000000e+00 1.900000e+01 2.499986e-02 1 71
8 1.000000e+01 1.950000e+01 2.799988e-02 1 80
9 1.000000e+01 1.975000e+01 2.900004e-02 1 89
10 -2.000000e+00 1.975000e+01 3.099990e-02 1 98
Terminating training with status: iteration_limit
I would have expected the simulation to be 10 every time. Have I missed something?
Yeah I'm not sure. I have't played with this much so something probably needs fixing.
Perhaps we should return scenario_path [1:end-1]
here
https://github.com/odow/SDDP.jl/blob/5687376e0ed92009a42e2c140edb24bb0c371740/src/plugins/sampling_schemes.jl#L272
I dug through the code a bit and managed to fix the issue. My fix is slightly different to what you suggested (which didn't quite work):
I return the full sequence of nodes in the cycle, but do not add the final stage objective to the total:
# Cumulate the stage_objective.
if !terminated_due_to_cycle || depth < length(scenario_path)
cumulative_value += subproblem_results.stage_objective
end
However, the wrong incoming state was being appended; so I changed it to end-1
.
# We also need the incoming state variable to the final node, which is
# the outgoing state value of the last node:
incoming_state_value = sampled_states[end-1]
These changes have resulted in the expected output:
Iteration Simulation Bound Time (s) Proc. ID # Solves
1 1.000000e+01 1.150000e+01 3.000021e-03 1 11
2 1.000000e+01 1.575000e+01 3.999949e-03 1 22
3 1.000000e+01 1.787500e+01 6.999969e-03 1 33
4 1.000000e+01 1.893750e+01 1.399994e-02 1 44
5 1.000000e+01 1.946875e+01 1.600003e-02 1 55
6 1.000000e+01 1.973438e+01 1.799989e-02 1 66
7 1.000000e+01 1.986719e+01 1.999998e-02 1 77
8 1.000000e+01 1.993359e+01 2.199984e-02 1 88
9 1.000000e+01 1.996680e+01 2.399993e-02 1 99
10 1.000000e+01 1.998340e+01 2.600002e-02 1 110
Terminating training with status: iteration_limit
------------------------------------------------------------------------------
These changes have been made to the forward pass function. However, I think the proper fix would be to not evaluate the repeated node in the forward pass; however, when I tried this, I didn't have a reference to the final node for which I needed to set the starting state.
Actually, ignore my last fix. This is the real fix.
# First up, sample a scenario. Note that if a cycle is detected, this will
# return the cycle node as well.
TimerOutputs.@timeit SDDP_TIMER "sample_scenario" begin
scenario_path, terminated_due_to_cycle =
sample_scenario(model, options.sampling_scheme)
end
if terminated_due_to_cycle
final_node = scenario_path[end]
scenario_path=scenario_path[1:end-1]
end
# Get the last node in the scenario.
final_node_index = final_node[1]
# We terminated due to a cycle. Here is the list of possible starting
# states for that node:
starting_states = options.starting_states[final_node_index]
This fixes both the starting state issue and the extra cuts that were being inserted at node 1.
@adow031 worked around this by adding a new sampling scheme and a new forward pass: https://github.com/EPOC-NZ/JADE.jl/issues/19.
I've got an infinite horizon model with
terminate_on_cycle=true
and I've been looking at this section of the code.I think that the best way to deal with steady-state is simply to start the next iteration using the ending state values from the current iteration, but this seems to be doing something a bit more complicated (and perhaps unnecessary, since the ending state values should reach a steady-state distribution over time, meaning you'll be sampling starting levels with the right distribution).
Also, since each iteration add at most one item to the
starting_states
vector and thesplice!
removes one value from the vector it's not really clear it the list would ever be more than two items (the ending state, and the initial state), this would over-train using the initial state (unless I misunderstand the code).Also, I would like to be able to set the
terminate_on_cycle=true
flag (which is needed for the ending states to be stored) when I'm running a historical simulation, is this possible?Thanks, Tony.