Closed nilscrm closed 6 months ago
We can use the minigrid collection that Kacper suggested.
https://github.com/Farama-Foundation/Minigrid/blob/master/minigrid/minigrid_env.py
Their environments inherit from gym.Env but we might need to wrap them somehow bc Gerstgrasser uses ray with rllib.env environments (they inherit from gym.Env too though, so shouldn't be too much work)
I was more thinking like 3 or 4 states with like 2 actions each or something like that. Will probably be hard enough to query the world model or policy in that case.
Fair point, maybe more like this then: https://github.com/yigitunallar/reinforcement-learning-on-simple-grid-world-game/tree/master
We can also start with a 1D gridworld where we just want to move towards a target (which is generated randomly at the start) and then we scale up the world, make it 2D and see how well the approach scales
What about a simple MDP like this. Gives us maximal chance of having something to show at the milestone and we can always make it more complicated after that (and we don't need to understand any library).
Okay yeah we can start with this. Think once we implemented one game, it should be rather straight forward to implement others too
Completed in commit https://github.com/nilscrm/stackelberg-ml/commit/1fbf51b797f3e9788e7f9509a91547d1ffc52bf0 for MDP specified above by Nils
We want to create a very simple one player MDP with very few states. Then try to run the code of the this paper to solve it (Project website, Code)