If we want to be able to obtain optimal policies (for baselines or validation) through value iteration we need a function that can return the state transition probability for a given state and action (or all probabilities in a given state). In our case the only uncertainty comes from the attacker behaviour. If the attacker action is assumed to be known (because the attacker acts first) then one can iterate over all possible defender actions to get the possible next states.
This does not fit into the Gym/PettingZoo API so we are free to do implement this as we please. Here is one way of doing it:
If we want to be able to obtain optimal policies (for baselines or validation) through value iteration we need a function that can return the state transition probability for a given state and action (or all probabilities in a given state). In our case the only uncertainty comes from the attacker behaviour. If the attacker action is assumed to be known (because the attacker acts first) then one can iterate over all possible defender actions to get the possible next states.
This does not fit into the Gym/PettingZoo API so we are free to do implement this as we please. Here is one way of doing it:
https://github.com/alessiodm/drl-zh/blob/main/01_MDPs.