Simulating paths on a MDP for an optimal scheduler

vanhecke1 commented 4 years ago

Hi,

I would like to know if there is any support on simulating the paths of an induced DTMC from an optimal policy, here is a minimal prism model:

mdp const double ps=0.9; module example s : [0 .. 4] init 0; [] s = 0 -> ps : (s'=1) + (1 - ps) : (s'=0); [] s = 0 -> ps : (s'=2) + (1 - ps) : (s'=0); [] s = 1 -> ps : (s'=3) + (1 - ps) : (s'=1); [] s = 2 -> ps : (s'=4) + (1 - ps) : (s'=1); [] s = 3 -> true; [] s = 4 -> true; endmodule

rewards "progress" s = 0: 0; s = 1: 1; s = 2: 1; s = 3: 2; endrewards

Suppose that I have a time-bounded specification "Pmin=? [F<=5 (s = 3)]" or a simple reachability specification "Pmin=? [F (s = 3)]", is it possible to obtain or simulate the paths for an optimal scheduler using STORM?

Thanks,

sjunges commented 4 years ago

Hi vanhecke1,

I think you are asking multiple thinks at the same time. Given a prism-model, storm can construct the underlying MDP, compute the optimal policy, and compute the DTMC associated with it, and (often) output this DTMC (or information about the policy). Generally, support for reachability is easier than for bounded-reachability (due to the fact that policies for step-bounded rewards are not memoryless).

From the python bindings,

we can simulate on top of a DTMC (or MDP),
we can simulate a prism model without building the underlying DTMC or MDP, but we cannot compute the optimal policy and without knowing the optimal policy, we do not know how to resolve the non-determinism.

From the command line, there is currently no option to sample paths, but this could be a feature that we support soon, if there is demand for it (although I think that simulating is better done from the python interface).

Feel free to continue this thread!

Best regards, Sebastian

P.S. Is there a particular reason for the rewards in your model? Your question seems to not include rewards.

vanhecke1 commented 4 years ago

Hi Sebastian,

I have installed the python interface, so I can use it if it is easier to use.

Both of your options would be interesting to me, is it possible to simulate any policy for a MDP, or can I only simulate the optimal policy?

For the case of without building the model for an MDP, can I supply a policy to the STORM and simulate the behavior of the system?

And finally, if there are multiple prism variables in my model, can I have access to each of them instead of having a single numerical value of the state, which results in storm's transformation of the prism model.

Thanks a lot!

sjunges commented 4 years ago

Yes, I understand these needs. I think that some better examples how to interact via storm with the models are indeed helpful. I think that they make more sense from within python, however, as a command line is unlikely to be flexible enough.

I therefore close this thread for now, and will address the two issues created above within the python interface.

sjunges commented 4 years ago

Notice that significant process was made in https://github.com/moves-rwth/stormpy/issues/14, allowing you to sample paths.

moves-rwth / storm

Simulating paths on a MDP for an optimal scheduler #66