mal-lang / mal-simulator

Apache License 2.0
1 stars 1 forks source link

Add a function to obtain transition probabilities in a given state #23

Open kasanari opened 1 month ago

kasanari commented 1 month ago

If we want to be able to obtain optimal policies (for baselines or validation) through value iteration we need a function that can return the state transition probability for a given state and action (or all probabilities in a given state). In our case the only uncertainty comes from the attacker behaviour. If the attacker action is assumed to be known (because the attacker acts first) then one can iterate over all possible defender actions to get the possible next states.

This does not fit into the Gym/PettingZoo API so we are free to do implement this as we please. Here is one way of doing it:

https://github.com/alessiodm/drl-zh/blob/main/01_MDPs.

kasanari commented 1 month ago

If we were to assume the attacker did not act first, then we would need agent classes to provide a probability distribution over their actions.