[Internal change] Decompose the game logic and API

sotetsuk commented 8 months ago

I believe that the API of Pgx is sufficiently general, but the optimal API varies depending on the use case. I would like to separate the implementation functions of each game's logic and the API, to make it easier for users to adjust to their preferred API.

For example,

core.State -> core.EnvState, GameState
Each game implements
- step(game_state, action) -> game_state
- legal_action_maks(game_state)
- is_terminal(game_state)
- observe(game_state)
- terminal_value(game_state) ?

[x] go
- 1129
- 1130
- 1131
- 1132
- 1133
- 1134
[ ] 2048
[ ] animal shogi
[ ] backgammon
[ ] bridge bidding
[ ] chess
[x] connect four
- 1151
- 1152
- 1153
[ ] gardner chess
[ ] hex
[ ] kuhn poker
[ ] leduc hold'em
[ ] minatar
- [ ] asterix
- [ ] breakout
- [ ] freeway
- [ ] seaquest
- [ ] space invaders
[ ] othello
[ ] shogi
[ ] sparrow mahjong
[x] tic tac toe
- 1146
- 1148
- 1149

carlosgmartin commented 5 months ago

@sotetsuk How would this approach handle intermediate rewards?

sotetsuk commented 5 months ago

Sorry for the late response 🙏 It depends on the game. In the case of Go, it looks like https://github.com/sotetsuk/pgx/blob/main/pgx/_src/games/go.py#L129

Note that this change is just a internal change and is supposed to give no effects to the current public API.

sotetsuk / pgx

[Internal change] Decompose the game logic and API #1127

1129

1130

1131

1132

1133

1134

1151

1152

1153

1146

1148

1149