Environment for Reinforcement Learning Algorithms?

reconnaissanceblindchess / reconchess

ReconChess python implementation

BSD 3-Clause "New" or "Revised" License

42 stars 12 forks source link

Environment for Reinforcement Learning Algorithms? #23

Closed acxz closed 2 years ago

acxz commented 2 years ago

It would be amazing if an environment (such as an openai-gym environment) can be provided that makes it easy to plug and play with reinforcement learning libraries.

acxz commented 2 years ago

I spent some time searching for one and it looks like openspiel open_spiel (a collection of environments for board games by deepmind) has one for RBC: https://github.com/deepmind/open_spiel/blob/master/docs/games.md#reconnaissance-blind-chess

daihuiao commented 2 years ago

can you find any demo in openspiel? i can't use it in my python project

daihuiao commented 2 years ago

btw,it is highly recommended to organized the code in the way of openai-gym, feature engineering waste me lot of time

acxz commented 2 years ago

can you find any demo in openspiel? i can't use it in my python project

it would be appropriate to ask this at https://github.com/deepmind/open_spiel

acxz commented 2 years ago

btw,it is highly recommended to organized the code in the way of openai-gym, feature engineering waste me lot of time

while this is off-topic, i'll answer in brief

there are limitations that openai-gym environments have, which make them hard to scale in production. Many RL libraries end up rolling their own extensions or their own environment solutions for example see rllib's Environments. I'm sure open_spiel had their own rationale and again it would be appropriate to ask at their repo.

daihuiao commented 2 years ago

thank you for your reply

ginoperrotta commented 2 years ago

Thanks for the comments. I see two major issues with treating RBC as a reinforcement learning environment using an OpenAI-gym style API:

As in any multiplayer game, your opponent's policy is part of the environment. RBC in general cannot be a single environment; RBC against a random player would be one, RBC against trout.py would be another... and so on.
Each step of a player's turn involves different kinds of information (if/where the opponent captured, what pieces you sensed, what move resulted from your request) and actions (sense or move). RBC therefore does not have a clear observation space or a single action space, except perhaps the concatenation of all of these separate kinds. Gym environments must have relatively simple observation and action spaces.

This second point in particular is why feature engineering may be a significant part of a machine-learning agent for RBC. It's not time wasted, it's research!

acxz commented 2 years ago

I would just like to point out that, while the above two points are valid concerns for vanilla OpenAI Gyms, recent RL environments are able to handle both of these two points, such as OpenSpiel.

There is a RL environment for RBC out there that you can use to train RL agents in a plug and play fashion (linked above).

ginoperrotta commented 2 years ago

OpenSpeil is a great library for this sort of thing! However, it uses its own API, not the openAI-gym API, since it is made for multiplayer games and not static RL environments. When player strategies are not specified, they default to random. You can train your own agent using those tools, but be aware that it is training against a fixed opponent strategy! Typically, you would iteratively train an agent against previous versions of itself in order to advance from training against random actions to training against a strong opponent. In any case, this is not a single RL environment and it is still not well-suited to an openAI-gym style API.

acxz commented 2 years ago

Typically, you would iteratively train an agent against previous versions of itself in order to advance from training against random actions to training against a strong opponent.

OpenSpiel does support self play.