nilscrm / stackelberg-ml

0 stars 0 forks source link

Make simple MDP and use Code from model based RL approach #3

Closed nilscrm closed 7 months ago

nilscrm commented 7 months ago

We want to create a very simple one player MDP with very few states. Then try to run the code of the this paper to solve it (Project website, Code)

YanickZengaffinen commented 7 months ago

We can use the minigrid collection that Kacper suggested.

https://github.com/Farama-Foundation/Minigrid/blob/master/minigrid/minigrid_env.py

Their environments inherit from gym.Env but we might need to wrap them somehow bc Gerstgrasser uses ray with rllib.env environments (they inherit from gym.Env too though, so shouldn't be too much work)

nilscrm commented 7 months ago

I was more thinking like 3 or 4 states with like 2 actions each or something like that. Will probably be hard enough to query the world model or policy in that case.

YanickZengaffinen commented 7 months ago

Fair point, maybe more like this then: https://github.com/yigitunallar/reinforcement-learning-on-simple-grid-world-game/tree/master

We can also start with a 1D gridworld where we just want to move towards a target (which is generated randomly at the start) and then we scale up the world, make it 2D and see how well the approach scales

nilscrm commented 7 months ago

What about a simple MDP like this. Gives us maximal chance of having something to show at the milestone and we can always make it more complicated after that (and we don't need to understand any library).

YanickZengaffinen commented 7 months ago

Okay yeah we can start with this. Think once we implemented one game, it should be rather straight forward to implement others too

YanickZengaffinen commented 7 months ago

Completed in commit https://github.com/nilscrm/stackelberg-ml/commit/1fbf51b797f3e9788e7f9509a91547d1ffc52bf0 for MDP specified above by Nils