Open rickytan-AA opened 3 years ago
Doesn't seem like there are many good RL-based environments for chess.
That's fine as it gives me a chance to experiment around with value functions and defining the reward states.
Should reward be a function from the environment or from the agent?
The environment should at least return game outcomes (+1 for a win, -1 for a loss, 0 for draws).
Probably need to read the AlphaZero paper to understand how they did their reward functions.
We need to:
reset()
- resets the board with standard piecesstep(a)
- returns the new state anddone
booleanstep_seq()
- takes a sequence of actionsHelpful links: