hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

Reinforcement learning of tic-tac-toe is not possible. #1076

Closed loySoGxj closed 3 years ago

loySoGxj commented 3 years ago

aaa

Miffyli commented 3 years ago

Hey. Unfortunately we do not offer tech support for getting algorithms to work in custom environments, as the issue may lie in myriad of things. These issues are reserved for stable-baselines focused bugs and improvements ideas, so unless this issue can be pointed at a problem in stable-baselines, this is not a place to ask this question. You could ask for help on reddit.com/r/reinforcementlearning or Stackoverflow.

One thing right out of the bat: Standard RL algorithms do not support or do not play well with "invalid action masks", like we have here (you can not put a x/o mark on a slot that has already been taken). Stable-baselines does not support this kind of things.

loySoGxj commented 3 years ago

nice