Add examples of Stochastic Optimal Control

odow commented 3 years ago

For the INFORMS talk

odow commented 3 years ago

Ideally, one of @pulsipher's examples from InfiniteOpt: https://github.com/pulsipher/InfiniteOpt.jl/tree/master/docs/src/examples

odow commented 3 years ago

If anyone reading this has nice examples of an optimal control type problem they want to share, please reach out.

pulsipher commented 3 years ago

The stochastic optimal economic control problem I recently setup with @azev77 in https://github.com/pulsipher/InfiniteOpt.jl/issues/131#issuecomment-892964519 (look at the conversation starting from this comment) might be a good fit. Naturally, the difference here is that you would want to implement its discrete version.

odow commented 3 years ago

Thanks for the pointer!

Here's the SDDP.jl equivalent

using SDDP
import Distributions
import Gurobi
import Plots

function main(
    ρ = 0.020,  # discount rate
    k = 90.0,   # utility bliss point
    T = 10.0,   # life horizon
    r = 0.05,   # interest rate
    dt = 0.1,   # time step size
    B0 = 100.0, # endowment
    μ = 0.0,    # Brownian motion
    σ = 0.1,
)
    env = Gurobi.Env()
    model = SDDP.LinearPolicyGraph(
        stages = round(Int, T / dt),
        sense = :Max,
        upper_bound = 0,
        optimizer = () -> Gurobi.Optimizer(env),
    ) do sp, t
        set_silent(sp)
        @variable(sp, B >= 0, SDDP.State, initial_value = B0)
        @variable(sp, y, SDDP.State, initial_value = 10)
        @variable(sp, 0 <= c <= 99)
        @constraint(sp, B.out == B.in + dt * (r * B.in + y.out - c))
        @constraint(sp, con, y.out == y.in + dt * μ * y.in + σ * y.in)
        SDDP.parameterize(sp, rand(Distributions.Normal(0, 1), 10)) do ω
            set_normalized_coefficient(con, y.in, -(1 + dt * μ + σ * ω))
        end
        @stageobjective(sp, dt * (exp(-ρ * t * dt) * -(c - k)^2))
    end
    SDDP.train(
        model;
        iteration_limit = 100,
    )
    sims = SDDP.simulate(model, 100, [:B, :c, :y]);
    function plot_simulations(sims)
        plot = Plots.plot()
        plot_simulation!(
            sims[1];
            labels = ["B: wealth balance" "c: consumption" "y"],
        )
        for s in sims[2:end]
            plot_simulation!(s, labels = false)
        end
        return plot
    end
    function plot_simulation!(sim; kwargs...)
        Plots.plot!(
            dt:dt:T,
            [
                map(data -> data[:B].out, sim),
                map(data -> data[:c], sim),
                map(data -> data[:y].out, sim),
            ],
            color = [:blue :red :green];
            kwargs...
        )
    end
    return model, plot_simulations(sims)
end

model, plt = main()

azev77 commented 3 years ago

@odow thanks!

I currently don't have Gurobi so can't replicate
In my discrete time analog to the continuous time model: c[0], c[1], ..., c[T] & B[0], B[1], ..., B[T], B[T+1]
I don't have B >= 0 (as you do)
The basic theory is that households try to smooth consumption by borrowing B[t]<0 & saving B[t]>0. Hence we expect consumption to be smoother than income/wealth (in an unconstrained world), but still stochastic.
The theory also says that if households are credit constrained, B >= 0 (as you do), then consumption will be less smooth.

Here is my discrete time (non-stochastic, y_t=0) code:

using JuMP, Ipopt, Plots    # InfiniteOpt, 
ρ  = 0.020  # discount rate
k  = 100.0  # utility bliss point
T  = 10.0   # life horizon
r  = 0.05   # interest rate
B0 = 100.0  # endowment

time_c = 0.0:1.0:T
time_B = 0.0:1.0:T+1

u(c; k=k) = -(c - k)^2.0  # utility function
d(t; ρ=ρ) = exp(-ρ*t)     # discount function

m = Model(Ipopt.Optimizer)
register(m, :u, 1, u; autodiff = true)
register(m, :d, 1, d; autodiff = true)
@variable(m, c[time_c]) #@variable(m, 1e-4 <= c[0.0:1.0:10.0]  <= 99.0) 
@variable(m, B[time_B])
fix(B[0.0], B0)
fix(B[T+1], 0.0)
@NLobjective(m, Max, sum(d(t)*u(c[t]) for t in time_c) )
#@NLobjective(m, Max, sum(exp(-ρ*t)*(-1.0)*(c[t]-k)^2.0 for t in 0.0:1.0:10.0) )

for i in 2:length(time_B)
    t_1, t = time_B[i - 1], time_B[i]
    @constraint(m, B[t] == B[t_1] + 1.0 * (r * B[t_1] - c[t_1]))
end
optimize!(m)
objective_value(m)
c_opt, B_opt = value.(c), value.(B)
scatter(time_B,   B_opt.data, lab = "B: wealth balance");
scatter!(time_c,  c_opt.data, lab = "c: consumption")

odow commented 3 years ago

The theory also says that if households are credit constrained, B >= 0 (as you do), then consumption will be less smooth.

But in this example we are not credit constrained.

To clarify my question in the other issue: in this solution, does a particular c(t) get to see y(t') for all t' > t? i.e., can c(t) see the future, or is it non-anticipative?

I'm guessing by the solution and that the answer is "we come up with a different consumption curve for each sample of y, and given fixed y, the consumption curve is linear."

SDDP.jl does better than this by taking into account that the agent at time t cannot know the future of times t+1 and so on.

azev77 commented 3 years ago

Can you resolve the problem w/o B>=0?

I think this problem actually has a closed form solution. I will take a look when I get home

odow commented 3 years ago

Can you resolve the problem w/o B>=0?

It will be the same answer because the wealth constraint is never binding.

I think this problem actually has a closed form solution

This makes me much more sure that

"we come up with a different consumption curve for each sample of y, and given fixed y, the consumption curve is linear."

is true. I think we're talking about different problems here with a different set of assumptions.

azev77 commented 3 years ago

Can you re-write using Ipopt.jl so I can solve on my computer?

odow commented 3 years ago

Replace optimizer = () -> Gurobi.Optimizer(env) with optimizer = Ipopt.Optimizer. But it might not solve properly. Ipopt struggles with these sorts of cutting plane problems. You really need a good quality QP solver => Gurobi or CPLEX.

odow commented 3 years ago

I don't think we need to solve problems though, we need to answer the conceptual question:

does a particular c(t) get to see y(t') for all t' > t? i.e., can c(t) see the future, or is it non-anticipative?

azev77 commented 3 years ago

At t=0, observe B0, y0 Choose c0

next y1 is realized At t=1, observe B1, y1 Choose c1

Next y2 is realized

you choose ct after observing yt, before y_t+1

odow commented 3 years ago

you choose ct after observing yt, before y_t+1

Okay, we on the same page then. But that's not what your discrete-time Ipopt code above does?

azev77 commented 3 years ago

My discrete time Ipopt code is non-stochastic, y_t=0, for all time periods, so it’s perfect foresight

odow commented 3 years ago

My discrete time Ipopt code is non-stochastic

Yeah haha I'm getting confused with the code across the two issues. I meant the code that produced the figures with the straight lines.

azev77 commented 3 years ago

Below is my matrix the the 4 types of intertemporal optimization problems we solve in Econ. I'm gonna write a blog post about this at some point.

The only sol so far I don't understand is @pulsipher's stochastic continuous-time. It sounds like you think it solves a different model.

Take a look here: https://github.com/JuliaPOMDP/POMDPs.jl/discussions/351 Can SDDP.jl solve an infinite-horizon version of the above problem? (Btw, where is the Bellman equation? where are the the iteration based solvers VFI/PFI etc?)

odow commented 3 years ago

SDDP.jl can solve the discrete-time deterministic and stochastic cases, for both the finite horizon and the infinite horizon cases.

For the infinite case, you just need to replace the LinearPolicyGraph with (this will make more sense once you read the theory below):

    graph = SDDP.LinearGraph(1)
    SDDP.add_edge(graph, 1 => 1, 1 - ρ)
    model = SDDP.PolicyGraph(
        graph,
        sense = :Max,
        upper_bound = 0,
        optimizer = () -> Gurobi.Optimizer(env),
    ) do sp, t

Here's a tutorial on the theory, which outlines the bellman recursion etc: https://odow.github.io/SDDP.jl/stable/tutorial/theory/21_theory_intro/

It's not clear to me that your notation for the stochastic cases is correct.

Shouldn't c(t) be indexed by the history of the stochastic process up to time t?
Can c(t) see y_t or only y_{t-1}? The imprecision in this notation is one reason why I will strongly advocate for the policy graph.

It's another one of those "MDP solvers are all alike in different ways". It uses linear programming duality to construct the value function. You could say that it is value function approximation ala Approximate Dynamic Programming. You could also say that it is Q-learning in the POMDP world, but we converge to a provably optimal solution and it takes into account more than one-step into the future.

There are plotting tools to visualize the resulting value function: https://odow.github.io/SDDP.jl/stable/tutorial/basic/05_plotting/#Plotting-the-value-function

The best thing about SDDP is that the subproblems are JuMP models, so you're free to add as many variables and constraints as you like. We regularly solve problems with thousands of nodes in the policy graph, 100-200 state variable dimensions, 10^50 scenarios, and subproblems with thousands of variables and constraints.

azev77 commented 3 years ago

Shouldn't c(t) be indexed by the history of the stochastic process up to time t?

Yes. Me & economists are often a bit sloppy & omit these things to make the notation more digestible. Here is an example from my research (where s^t is the history of the shock...):

azev77 commented 3 years ago

We regularly solve problems with thousands of nodes in the policy graph, 100-200 state variable dimensions, 10^50 scenarios, and subproblems with thousands of variables and constraints.

If what you're saying is true (& works for problems w/ non-linear reward functions like log) then packages like yours can be a game-changer for economists. Economists typically avoid problems w/ more than 5-state variables.

Check out VFIToolkit.m.

As it currently stands the docs in packages like SDDP & POMDPs appear far too foreign for us to invest time in trying to use them. I'd love to contribute some examples w/ standard econ notation that show how to exploit your package.

odow commented 3 years ago

As it currently stands the docs in packages like SDDP & POMDPs appear far too foreign for us to invest time in trying to use them. I'd love to contribute some examples w/ standard econ notation that show how to exploit your package.

This is what @bernardokp has been telling me for some time... https://github.com/odow/SDDP.jl/issues/428

There are some issues with nonlinear functions: everything must be convex. So you could have log rewards (if maximizing) but not log costs (if maximizing).

There are some crappy finance/inventory problems, but not enough: https://github.com/odow/SDDP.jl/blob/master/examples/inventory_management.jl https://github.com/odow/SDDP.jl/blob/master/docs/src/examples/asset_management_stagewise.jl

@lorenzoreus has some finance examples.

Happy to accept PRs with examples and/or documentation. But I don't have much interest (or domain expertise) to write them myself.

odow commented 3 years ago

Here's a model that @jd-lara had been running

We can also find risk-averse policies, so swap out that E operator for your choice of convex risk measure. https://odow.github.io/SDDP.jl/stable/guides/add_a_risk_measure/ https://odow.github.io/SDDP.jl/stable/tutorial/theory/22_risk/

We also have the ability to solve problems where the randomness is a hidden Markov model (e.g., bear and bull markets with random returns, where the returns are observed but the bear/bull state is hidden).

I should tidy the examples/tutorials to clarify all of this.

azev77 commented 3 years ago

Is it possible to show me how to solve the example you did above (discrete time version of my model) w/ an open-source solver? Except an infinite horizon version (so I can compare w/ closed form solutions more easily)

PS: wouldn't you agree the world would be a much better place if more people from different fields cooperated & submitted bug-reports & PRs to the same packages? (I sure do...)

using SDDP
import Distributions
import Gurobi
import Plots

function main(
    ρ = 0.020,  # discount rate
    k = 90.0,   # utility bliss point
    T = 10.0,   # life horizon
    r = 0.05,   # interest rate
    dt = 0.1,   # time step size
    B0 = 100.0, # endowment
    μ = 0.0,    # Brownian motion
    σ = 0.1,
)
    env = Gurobi.Env()
    model = SDDP.LinearPolicyGraph(
        stages = round(Int, T / dt),
        sense = :Max,
        upper_bound = 0,
        optimizer = () -> Gurobi.Optimizer(env),
    ) do sp, t
        set_silent(sp)
        @variable(sp, B >= 0, SDDP.State, initial_value = B0)
        @variable(sp, y, SDDP.State, initial_value = 10)
        @variable(sp, 0 <= c <= 99)
        @constraint(sp, B.out == B.in + dt * (r * B.in + y.out - c))
        @constraint(sp, con, y.out == y.in + dt * μ * y.in + σ * y.in)
        SDDP.parameterize(sp, rand(Distributions.Normal(0, 1), 10)) do ω
            set_normalized_coefficient(con, y.in, -(1 + dt * μ + σ * ω))
        end
        @stageobjective(sp, dt * (exp(-ρ * t * dt) * -(c - k)^2))
    end
    SDDP.train(
        model;
        iteration_limit = 100,
    )
    sims = SDDP.simulate(model, 100, [:B, :c, :y]);
    function plot_simulations(sims)
        plot = Plots.plot()
        plot_simulation!(
            sims[1];
            labels = ["B: wealth balance" "c: consumption" "y"],
        )
        for s in sims[2:end]
            plot_simulation!(s, labels = false)
        end
        return plot
    end
    function plot_simulation!(sim; kwargs...)
        Plots.plot!(
            dt:dt:T,
            [
                map(data -> data[:B].out, sim),
                map(data -> data[:c], sim),
                map(data -> data[:y].out, sim),
            ],
            color = [:blue :red :green];
            kwargs...
        )
    end
    return model, plot_simulations(sims)
end

model, plt = main()

odow commented 3 years ago

Is it possible to show me how to solve the example you did above

https://github.com/odow/SDDP.jl/issues/429#issuecomment-900610201

Replace optimizer = () -> Gurobi.Optimizer(env) with optimizer = Ipopt.Optimizer. But it might not solve properly. Ipopt struggles with these sorts of cutting plane problems. You really need a good quality QP solver => Gurobi or CPLEX.

Even though this problem is small, it still requires on the order of 10^4 QP solves. So having a good solver is critical. Get Gurobi. They have free licenses for academics.

wouldn't you agree the world would be a much better place if more people from different fields

Yes.

I'll say this publicly: I don't believe the stochastic programming community has done a good job of publicizing the technology we have.

There were three main issues:

a massive theoretical barrier to entry with overly complicated notation and far too involved math
no one open-sourced an implementation until recently
(almost) everyone works on a single problem: hydro thermal scheduling

Here's some good primers on the industrial problem: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8442754&casa_token=tGCBXzSNn_AAAAAA:LiZBvAxDyPAr0F3NPpycCWqJioDKTNZ8-3g20w77OlhF0F8s5sC79Dhdrr47D0QHjEN6zfUXMw&tag=1 https://www.psr-inc.com/softwares-en/ https://blog.sintef.com/sintefenergy/hydropower/learning-from-hydrothermal-scheduling-models-in-brazil-and-norway/

azev77 commented 3 years ago

Got it to work w/ Ipopt. It runs, but no good...

odow commented 3 years ago

Yeah it's probably very slow and might crash/warn about numerical instability?

azev77 commented 3 years ago

It stops at the iteration_limit = 100 but is not optimal by then & I don't wanna wait forever.

I just worked out the closed form solution for the problem above (infinite horizon version w/ AR(1) income). This would be an awesome example bc it's so simple & we can compare w/ closed form. PS: I just realized the finite-horizon version of this problem is even on wikipedia...

If I had Gurobi, I'd do it on my own. @odow Would it be possible for you to solve this infinite-horizon problem w/ SDDP.jl & compare w/ the closed form solution (like I did here) w/ @pulsipher's help?

Also, can SDDP.jl solve for the policy function so we can compare it w/ the closed form?

@pulsipher I messed up in explaining the stochastic problem to you before. c[t] is lazy/sloppy notation for optimal consumption, which is in fact a function of the entire history of the state variable up to time t...

PS: It would be interesting to see how your package solves this model where income follows an AR(1) process. Economists usually approx the AR(1) w/ a Markov chain & then do VFI or PFI...

PS: I'd be happy to submit a PR w/ an example (like I did w/ InfiniteOpt) if you provide the code/output In the example I like to plot:

the numerical value function vs closed form (see my discussion in POMDP)
the numerical policy function vs closed form
a simulation of consumption/income/wealth numerical vs closed form

odow commented 3 years ago

This notation looks better. Nice to see that c_t isn't linear, and is a function of the history of the random walk.

I'll see if I have time later to implement it. It would make a very nice tutorial.

azev77 commented 3 years ago

There are a few typos in my notes above, please hold-off until I fix them...

azev77 commented 3 years ago

Here are the corrected solutions (recursive & time-series):

I coded them here:

using Distributions, Plots

# Define parameters 
β=0.98; # discount factor
R=1/β;  # gross interest rate
r=R-1;  # net interest rate
b = 0.01  # utility parameter. c ∈ (0,1/b)   
dt = 1    # infinite horizon. time step. 
B0 = 100.0 # endowment
u(c; b=b) = c - (b/2)*c^2       # utility function
discount(t;β=β) = β^t # discount function

λ=0.9; # income persistence
ȳ=0.0; # long-run mean income 
σ_ε = 0.5; # income shock vol. 
y0 = 0.0; # starting income 
θ = 1/(1+r-λ)
dist_ε = Normal(0.0,σ_ε)

μ_B(B,y,c; r=r) = B +r*B +y -c 
μ_y(y,ε; λ=λ,ȳ=ȳ) = (1-λ)*ȳ +λ*y +ε

# Closed Form 
c_pol_cf(B,y;r=r,θ=θ,λ=λ,ȳ=ȳ) = r*B +r*θ*y +(1-λ)*θ*ȳ
B_pol_cf(B,y;r=r,θ=θ,λ=λ,ȳ=ȳ) = B +(1-λ)*θ*y - (1-λ)*θ*ȳ
val_cf(B,y;r=r,θ=θ,λ=λ,ȳ=ȳ,σ_ε=σ_ε) = ((1+r)/r) * (c_pol_cf(B,y)) +
    -(b/2)*((1+r)/r) * (c_pol_cf(B,y))^2 +
    -(b/2)*(1+r)*((θ*σ_ε)^2); 
#
c0 = c_pol_cf(B0,y0)

# Simulation 
T_sim = 200
ε_sim = rand(dist_ε, T_sim)
#
y_sim, B_sim, c_sim = [], [], []
push!(y_sim, y0)
push!(B_sim, B0)

# Recursive solutions
for tt in 1:T_sim
    y_new = μ_y(y_sim[tt],ε_sim[tt])        # y_t+1 =f(yt,ε_t+1)
    c_new = c_pol_cf(B_sim[tt], y_sim[tt])  # c_t =f(Bt,yt)
    B_new = B_pol_cf(B_sim[tt], y_sim[tt])  # B_t+1 =f(Bt,yt)
    #B_new = μ_B(B_sim[tt], y_sim[tt], c_new) # same as above.
    #
    push!(y_sim, y_new)
    push!(c_sim, c_new) 
    push!(B_sim, B_new) 
end

# Time-series solutions 
y_sim_TS, B_sim_TS, c_sim_TS = [], [], []
push!(y_sim_TS, y0)
push!(B_sim_TS, B0)
push!(c_sim_TS, c0)
for tt in 1: (T_sim-1)
    push!(y_sim_TS, (λ^tt)*y0 + (1-λ^tt)*ȳ +sum(i->λ^(tt-i) * ε_sim[i], 1:tt))
    push!(c_sim_TS, c0 + r*θ*sum(ε_sim[1:tt]))
    push!(B_sim_TS, B0 +(1-λ)*θ*sum(y_sim[1:tt]) -tt*(1-λ)*θ*ȳ)
end 

# compare Recursive vs TS 
y_sim_TS ≈ y_sim[1:end-1]
c_sim_TS ≈ c_sim
B_sim_TS ≈ B_sim[1:end-1]

plot(); 
plot!(y_sim_TS, lab=""); plot!(y_sim, lab="y");
plot!(c_sim_TS, lab=""); plot!(c_sim, lab="c");
plot!(B_sim_TS, lab=""); plot!(B_sim, lab="B")

# stats 
[std(x) for x in [y_sim, y_sim + r*B_sim, B_sim, c_sim] ]

#
plot(0:150,u, lab="return function");
plot!([1/b], seriestype = :vline, lab="bliss point")

plot();
plot!(0:200, B -> c_pol_cf(B,-2), lab="c Policy, y=-2") ;
plot!(0:200, B -> c_pol_cf(B,0), lab="c Policy, y=0") ;
plot!(0:200, B -> c_pol_cf(B,2), lab="c Policy, y=2") 

plot();
plot!(0:200, B -> B_pol_cf(B,-2), lab="B Policy, y=-2") ;
plot!(0:200, B -> B_pol_cf(B,0), lab="B Policy, y=0") ;
plot!(0:200, B -> B_pol_cf(B,2), lab="B Policy, y=2") 

plot();
plot!(0:2000, B -> val_cf(B,-2), lab="Value, y=-2") ;
plot!(0:2000, B -> val_cf(B,0), lab="Value, y=0") ;
plot!(0:2000, B -> val_cf(B,2), lab="Value, y=2")

Simulation: Policy function (consumption): Value function (quadratic):

PS: it's easier to see that consumption is fluctuating if you zoom in (consumption is just less volatile, "smoother", when ppl can borrow/save, as econ-theory predicts)

odow commented 3 years ago

This needs some local changes to SDDP.jl (to add the plotting callback), so you won't be able to run the code. But, the blue line is us, the orange dashed line is the closed-form solution. For now I just used β = 0.9 to make it easier, but it takes surprisingly many iterations before we start to reliably reproduce the optimal policy. Part of the problem is that y is unbounded over the infinite horizon. It'd be a lot easier if the horizon was finite or y was bounded. training

using SDDP
import Distributions
import Gurobi
import Plots

function main(
    β = 0.9,
    b = 0.01,
    B0 = 100.0,
    λ = 0.9,
    ȳ = 0.0,
    σ_ε = 0.5,
    y0 = 0.0,
    SAMPLE_SPACE_SIZE = 100,
)
    r = 1 / β - 1
    θ = 1 / (1 + r - λ)
    Ω = rand(Distributions.Normal(0, σ_ε), SAMPLE_SPACE_SIZE)
    model = SDDP.PolicyGraph(
        SDDP.Graph(0, [1], [(0 => 1, 1.0), (1 => 1, β)]),
        sense = :Max,
        upper_bound = 1000,
        optimizer = Gurobi.Optimizer,
    ) do sp, t
        set_silent(sp)
        @variables(sp, begin
            B >= 0, (SDDP.State, initial_value = B0)
            y, (SDDP.State, initial_value = y0)
            0 <= c <= 1 / b, (SDDP.State, initial_value = 0)
            ε
            δ >= 0
        end)
        @constraints(sp, begin
            B.out == (1 + r) * B.in + y.in - c.out + δ
            y.out == (1 - λ) * ȳ + λ * y.in + ε
        end)
        SDDP.parameterize(ω -> fix(ε, ω), sp, Ω)
        @stageobjective(sp, c.out - (b / 2) * c.out^2 - 10_000 * δ)
    end
    function closed_form_solution(ε)
        T = length(ε)
        y = zeros(T)
        c = zeros(T)
        B = zeros(T)
        c0 = r * B0 + r * θ * y0 + (1 - λ) * θ * ȳ
        for t = 1:T
            y[t] = (1 - λ^t) * ȳ + λ^t * y0 + sum(λ^(t - j) * ε[j] for j=1:t)
            c[t] = c0 + r * θ * sum(ε[1:t])
            B[t] = B0 + (1 - λ) * θ * (y0 + sum(y[1:t-1])) - t * (1 - λ) * θ * ȳ
        end
        return B, y, c
    end
    iteration = 1
    function forward_pass_callback(trajectory)
        ε = map(x -> x[2], trajectory.scenario_path)
        Tmax = length(ε)
        cf_B, cf_y, cf_c = closed_form_solution(ε)
        plot_B = Plots.plot(
            ylabel = "B",
            legend = false,
            xlim = (0, 20),
            ylim = (0, 200),
        )
        plot_y = Plots.plot(
            title = "Iteration $(iteration)",
            ylabel = "y",
            legend = false,
            xlim = (0, 20),
            ylim = (-3, 3),
        )
        plot_c = Plots.plot(
            ylabel = "c",
            legend = false,
            xlim = (0, 20),
            ylim = (0, 100),
        )
        Plots.plot!(plot_B, 1:Tmax, map(x -> x[:B], trajectory.sampled_states))
        Plots.plot!(plot_B, 1:Tmax, cf_B, style = :dash)
        Plots.plot!(plot_y, 1:Tmax, map(x -> x[:y], trajectory.sampled_states))
        Plots.plot!(plot_y, 1:Tmax, cf_y, style = :dash)
        Plots.plot!(plot_c, 1:Tmax, map(x -> x[:c], trajectory.sampled_states))
        Plots.plot!(plot_c, 1:Tmax, cf_c, style = :dash)
        plt = Plots.plot(
            plot_B,
            plot_y,
            plot_c,
            layout = (1, 3),
        )
        Plots.savefig(plt, "iterations/iteration_$(1000 + iteration).png")
        iteration += 1
        return
    end
    SDDP.train(
        model;
        iteration_limit = 500,
        forward_pass_callback = forward_pass_callback,
    )
    run(`convert iterations/\*.png training.gif`)
    return
end

main()

azev77 commented 3 years ago

I’ll take a closer look when I come home. Are you able to plot the policy & value functions? Like I did: As a function of B, for say 3 values of y. I don’t need to see each iteration, just the solution

azev77 commented 2 years ago

Hey @odow I'm trying to compare intertemporal optimization solvers in Julia here.

I coded up the (correct) finite horizon solution w/ JuMP-Ipopt:

#Finite Horizon Discrete Time w/ jump
if 1==1
    using JuMP, Ipopt, Plots    # InfiniteOpt, 
    δ=0.10; ρ=0.05; 

    # Need: z > ρ + δ
    z= ρ + δ +.10    # Case: k_T+1 =0 liquidate capital 
    #z= ρ + δ +.50    # Case: k_T+1 >0 burn some capital 

    I_SS = (z-ρ-δ)/(ρ+δ)
    K_SS = I_SS/δ

    T = 90.0   # life horizon
    k0 = 1.0 # endowment

    time_i = 0.0:1.0:T
    time_k = 0.0:1.0:T+1

    u(k,i;z=z) = z*k -i -(1/2)*(i)^2  # utility function
    d(t; ρ=ρ) = (1+ρ)^(-t)     # discount function

    m = Model(Ipopt.Optimizer)

    register(m, :u, 2, u; autodiff = true)
    register(m, :d, 1, d; autodiff = true)

    @variable(m, i[time_i]) #@variable(m, 1e-4 <= c[0.0:1.0:10.0]  <= 99.0) 
    @variable(m, k[time_k])

    # LOM 
    for j in 2:length(time_k)
        t_1, t = time_k[j - 1], time_k[j]
        @constraint(m, k[t] == i[t_1] + (1-δ)*k[t_1])
    end

    # Can't sell more than all your capital.
    # Equiv to K_t >=0
    for j in 1:length(time_i)
        t = time_i[j]
        @constraint(m, i[t] >= -(1-δ)*k[t])
    end

    fix(k[0.0], k0)
    # @constraint(m, i[T] == max(-1, -(1-δ)*k[T]) )
    @constraint(m, i[T] >= -1 )

    @NLobjective(m, Max, sum(d(t)*u(k[t],i[t]) for t in time_i) )

    optimize!(m)
    objective_value(m)
    i_opt, k_opt = value.(i), value.(k)

    scatter(time_k,   k_opt.data, lab = "k");
    scatter!(time_i,  i_opt.data, lab = "i")
    plot!([0.0],  seriestype = :hline, lab="", color="red")
    plot!([-(1-δ)*k_opt.data[end-1] -1],  seriestype = :hline, lab="I_T TC", color="grey", l=:dash)
    plot!([K_SS I_SS],  seriestype = :hline, lab="", color="grey", l=:dash)
    plot!([T+1],  seriestype = :vline, lab="", color="grey", l=:dash)
end

How can I solve the same finite horizon problem w/ SDDP.jl - Ipopt?
How can I solve the infinite horizon version w/ SDDP.jl - Ipopt? PS: the infinite horizon terminal condition is: limit e^(-\rhot) k_t * (1+i_t) =0

odow commented 1 year ago

Closing for the same reason as https://github.com/odow/SDDP.jl/issues/528#issuecomment-1350146308.

SDDP.jl isn't really intended as a solution method for these types of problems.

odow / SDDP.jl

Add examples of Stochastic Optimal Control #429