Example: Ergodicity + Deterministic

nilscrm / stackelberg-ml

0 stars 0 forks source link

Example: Ergodicity + Deterministic #34

Open YanickZengaffinen opened 1 month ago

YanickZengaffinen commented 1 month ago

Come up with example
Implement it
Test it
If fails with MAL but PAL successful => put on poster

Branch: https://github.com/nilscrm/stackelberg-ml/tree/more-mdps

YanickZengaffinen commented 1 month ago

Tried to go as simple as possible with this one (initial_state: 0, final_state: None): Here MAL actually achieves 243.2 reward, which is pretty close to the max 246. Probably too simple / the chance of visiting s1 is too high. For reference, the final model that is learned:

YanickZengaffinen commented 1 month ago

Here I tried to avoid self-loops (initial_state: 0, final_state: None): On this MAL achieved a reward of 223. As you can see it actually discovered the best loop:

YanickZengaffinen commented 1 month ago

Even in this 4 state MDP (initial_state: 0, final_state: None) MAL is learning sth (achieves 267 out of 485 reward) with the following model

YanickZengaffinen commented 1 month ago

Here, a run on an MDP that's only ergodic but not deterministic (initial_state: 1, final_state: None): MAL achieves 9.8 reward and the final model is: