Can a policy graph model MDPs?

bernardokp commented 2 years ago

Is it true that MDP = Policy graphs without squiggling arrows, where transition probabilities are given by the p_{ij}'s?

If that is true, which I think it is, then I strongly suggest you solve an MDP right of the bat in the tutorial. Many people can relate to MDPs, and without the squiggling arrows, the whole setup will be way easier to understand. My guess/opinion is that this is a necessary step to fully understand policy graphs.

The way I see it, they are a unifying representation that accommodates several frameworks. But need to relate to it, they need to clearly see their framework/approach represented there to be convinced to use it (I am already convinced).

Best, Bernardo

joaquimg commented 2 years ago

That's good money! Perhaps a good replacement for the Poincaré Conjecture in the Millenium Prize Problems.

Strictly speaking, general MDP allows decision-dependent transition probabilities. Is that well defined in policy graphs? I can imagine that working with some messy inefficient disjunctive programming and sddip-like stuff.

However, I agree that an MDP (with some hypothesis) would be great for talking about policy graphs now that it is a Buzz word.

odow commented 2 years ago

Stationary MDPs
- A single box with a squiggly line
- There is only one decision rule
- But we don't quite generalize the decision-dependent transition probabilities. My conjecture is that decision-dependent transition probabilities are just poorly modeled problems with latent state/random variables that obscure the true problem. Of course, incorporating those latent variables probably makes the problem non-convex, which is why decision-dependent is hard to solve.
Non-stationary MDPs
- Exactly as you describe.

The hardest conceptual jump for MDP people is going from x' = P(x | a) to an explicit model of the uncertainty, where the transition function is deterministic given the realization of the random variable.

bernardokp commented 2 years ago

Thanks, both for the answers, I agree with all that was said. @joaquimg started hinting at different classes of MDPs, and @odow explained how several classes can be represented at SDDP.jl. Which proves my point:

The tutorial should include a non-stationary MDP, a very simple one, and then the linear policy graph example. In this way, users would understand that the package is more than an sddp solver (which it is as well): it is a powerful modeling framework for dynamic problems, which can accommodate Stochastic Programming 1.0 (linear policy graphs) and MDPs 1.0 (non-stationary case). And then you would add all the other sections, with more advanced stuff. The el niño/la niña example and the dairy farm one are too complex for a first reader because they are sort of a combination of the two basic frameworks I mentioned above (even more complex).

Anyway, this is based on my intuition plus the fact that I have been working with people outside stochastic programming.

FInally, you destroyed my brilliant title, which fortunately was appreciated by my friend @joaquimg.

odow commented 1 year ago

Closing in favor of https://github.com/odow/SDDP.jl/issues/496

odow / SDDP.jl

Can a policy graph model MDPs? #505