Add a function to obtain transition probabilities in a given state

If we want to be able to obtain optimal policies (for baselines or validation) through value iteration we need a function that can return the state transition probability for a given state and action (or all probabilities in a given state). In our case the only uncertainty comes from the attacker behaviour. If the attacker action is assumed to be known (because the attacker acts first) then one can iterate over all possible defender actions to get the possible next states.

This does not fit into the Gym/PettingZoo API so we are free to do implement this as we please. Here is one way of doing it:

https://github.com/alessiodm/drl-zh/blob/main/01_MDPs.

mal-lang / mal-simulator

Add a function to obtain transition probabilities in a given state #23