Open sotetsuk opened 8 months ago
@sotetsuk How would this approach handle intermediate rewards?
Sorry for the late response 🙏 It depends on the game. In the case of Go, it looks like https://github.com/sotetsuk/pgx/blob/main/pgx/_src/games/go.py#L129
Note that this change is just a internal change and is supposed to give no effects to the current public API.
I believe that the API of Pgx is sufficiently general, but the optimal API varies depending on the use case. I would like to separate the implementation functions of each game's logic and the API, to make it easier for users to adjust to their preferred API.
For example,
core.State
->core.EnvState, GameState
step(game_state, action) -> game_state
legal_action_maks(game_state)
is_terminal(game_state)
observe(game_state)
terminal_value(game_state)
?1129
1130
1131
1132
1133
1134
1151
1152
1153
1146
1148
1149