nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.66k stars 343 forks source link

can it can be used for chess? #24

Closed Unimax closed 4 years ago

Unimax commented 4 years ago

hi, I am new to RL, I was wondering can I use this for a game of chess? either https://github.com/genyrosk/gym-chess or i can make my own env based on python-chess if needed. I am confused mainly because chess is a two-player game. so after the ai move someone has to make move from as opponent. also because the reward is 1 if won else -1.

note: goal is not to train a start of the art chess ai

thanks for the help in advance.

nikhilbarhate99 commented 4 years ago

Standard model free RL algorithms like PPO cannot solve chess without significant reward engineering (I am not sure if it can be solved even after those changes) and will definitely fail with sparse reward of 1 or -1 (highly unlikely that agent will learn to win a game of chess by doing random actions). I would suggest you work with algorithms using MCTS (eg. AlphaZero) which have been previously used to solve games like chess or go.