nilscrm / stackelberg-ml

0 stars 0 forks source link

Example: Ergodicity + Deterministic #34

Open YanickZengaffinen opened 1 month ago

YanickZengaffinen commented 1 month ago
  1. Come up with example
  2. Implement it
  3. Test it
  4. If fails with MAL but PAL successful => put on poster

Branch: https://github.com/nilscrm/stackelberg-ml/tree/more-mdps

YanickZengaffinen commented 1 month ago

Tried to go as simple as possible with this one (initial_state: 0, final_state: None): image Here MAL actually achieves 243.2 reward, which is pretty close to the max 246. Probably too simple / the chance of visiting s1 is too high. For reference, the final model that is learned: image

YanickZengaffinen commented 1 month ago

Here I tried to avoid self-loops (initial_state: 0, final_state: None): image On this MAL achieved a reward of 223. As you can see it actually discovered the best loop: image

YanickZengaffinen commented 1 month ago

Even in this 4 state MDP (initial_state: 0, final_state: None) image MAL is learning sth (achieves 267 out of 485 reward) with the following model image

YanickZengaffinen commented 1 month ago

Here, a run on an MDP that's only ergodic but not deterministic (initial_state: 1, final_state: None): image MAL achieves 9.8 reward and the final model is: image