Consider adding a no-op action to all environments.

At the moment the environments only have cardinal directional action space. This complicates analytically solving some of the environments such as lava land where the mouse spawns surrounded by lava and in cases where a mouse spawns on the same square as cheese for example (though we usually try and avoid the latter).

Consider adding a no-op action which would simplify these corner cases. Maze solving code already supports possibility for no-op actions.

Aside from changing the environments and level solvers themselves, some changes would be required for example to policy heatmap plotting (thankfully the diamond plots can still work with the central square used to represent the no-op action). Also some of the environment demos such as interactive mode.

The main negative side effect would be that existing baselines would no longer be compatible with the new environments because the architecture type signature would be changing. This also means old checkpoints would no longer be load-able.

matomatical / jaxgmg

Consider adding a no-op action to all environments. #16