Closed slwu89 closed 1 year ago
I wonder if it is sufficient to provide some docs on how to build the distribution using Distributions.jl
, and then sample N points into a vector? It's pretty common for people to want Normal
etc.
Yes, this request/question is fully possible with the existing framework. I'm in a case where I need to sample from many independent distributions, something like the situation in my comment where if I was using Distributions.jl, I'd use something like Product([Binomial(20,0.5), Binomial(10,0.5), Binomial(50,0.25)])
. It gets a bit clunky to form the explicit product of supports and probabilities as suggested in https://odow.github.io/SDDP.jl/stable/guides/add_multidimensional_noise/.
I suppose sampling N points into a vector would be fine, but N would have to be quite large as the number of distributions in the product distributions grows.
What I had in mind was documenting:
julia> function sample_distributions(; distributions, number_of_samples::Int)
Ω = [rand.(distributions) for _ in 1:number_of_samples]
P = fill(1 / number_of_samples, number_of_samples)
return Ω, P
end
sample_distributions (generic function with 1 method)
julia> using Distributions
julia> distributions = [Binomial(20,0.5), Binomial(10,0.5), Binomial(50,0.25)]
3-element Vector{Binomial{Float64}}:
Binomial{Float64}(n=20, p=0.5)
Binomial{Float64}(n=10, p=0.5)
Binomial{Float64}(n=50, p=0.25)
julia> number_of_samples = 20
20
julia> Ω, P = sample_distributions(; distributions, number_of_samples);
julia> Ω
20-element Vector{Vector{Int64}}:
[12, 5, 14]
[11, 3, 12]
[12, 3, 7]
[6, 4, 10]
[12, 3, 16]
[9, 8, 14]
[9, 3, 9]
[12, 5, 11]
[10, 6, 14]
[11, 6, 13]
[14, 6, 14]
[9, 7, 10]
[11, 4, 13]
[15, 4, 16]
[7, 7, 16]
[12, 4, 14]
[12, 6, 12]
[12, 5, 9]
[8, 5, 8]
[11, 7, 13]
julia> P
20-element Vector{Float64}:
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
I suppose sampling N points into a vector would be fine, but N would have to be quite large as the number of distributions in the product distributions grows.
Yip. This is a fundamental limitation of the method. If you have a large uncertainty space, appropriately choosing which samples to train with is part of the art. You might not want to use unbiased independent sampling from each distribution.
Ah ok, I missed that this was a fundamental limitation. Can I ask why that is? I glanced over the "policy graph" paper, and at the algorithm proposed, it seems that effectively sampling large uncertainty spaces would just require potentially excessively large numbers of forward/backward passes, becoming computationally infeasible, but it is not a fundamental limit of the method?
effectively sampling large uncertainty spaces would just require potentially excessively large numbers of forward/backward passes, becoming computationally infeasible
This is correct. I guess it's not a mathematical limitation, but it is certainly a computational one.
To a rough approximation, doubling the number of scenarios at least doubles the amount of work required per iteration, and it might make more iterations required to reach the same level of accuracy. My rule of thumb for the number of samples is that O(10^1) is easy, O(10^2) requires quite a bit more work. O(10^3) is the edge of what SDDP.jl can do, and O(10^4) is too much. But it's all model-dependent. Depends on the number of nodes in the graph and the number of state variables.
It's also not really a SDDP-related limitation, but a statistical one. Approximating high-dimensional distributions as a finite discrete set of points is tricky...
Yes, agreed of course. Crude monte carlo will definitely be infeasible for those "large" uncertainty spaces. The suggested documentation you have is great. If you were interested in incorporating Distributions.jl, I'd be happy to help. Thanks for the discussion!
My only hesitation for adding Distributions is that it adds a lot of package dependencies: https://juliahub.com/ui/Packages/Distributions/xILW0/0.25.93?page=1.
I've tried quite strongly to avoid keep our level of dependencies under control. But we can and should add a Distributions example to the documentation.
If you want to have a go at adding the docs:
Add a new markdown file here: https://github.com/odow/SDDP.jl/tree/master/docs/src/guides
Add the name of the file here: https://github.com/odow/SDDP.jl/blob/e6bf8449eb262e2b0765541e7d6e23cef0dff0f2/docs/make.jl#L141-L154
and add Distributions to this Project.toml
https://github.com/odow/SDDP.jl/blob/master/docs/Project.toml
Ok, fair enough, that's a heavy amount of dependencies to slow down CI for something that doesn't extend actual functionality! Thanks for the tips on the docs, I'll have a go at adding to the docs tomorrow or the day after.
that's a heavy amount of dependencies to slow down CI
It's not really the CI time. It's that every user would need to install Distributions, and we'd have to pay the compilation and latency costs of including using Distributions
in the start-up of SDDP.jl.
Now I'm sure a lot of people will actually need to use Distributions for their models, but we don't want to force the costs on people that don't use the functionality.
Hello @odow,
I am wondering if you would consider a feature to make it easier to parameterize the noise terms using types from Distributions.jl, barring any fundamental incompatibilities with the structure of SDDP.jl. I'm aware that the restriction is that the distributions be finite and discrete, but the API of Distributions.jl could check that easily (e.g. with
minimum
andmaximum
or by checking against "approved" types), and the type hierarchy can ensure that users only plug in discrete distributions. It would in particular make sampling multi-variate noise easier, with something likeIf this is something possible and in scope for SDDP, I'd be happy to work on a PR to implement such functionality. A cursory glance at the source code seemed to indicate most of the relevant methods and types are in user_interface.jl, are there other places I should look too?