Closed erinn-lee closed 2 years ago
I checked it while looking at the branch being edited, and it seems to calculate without any problem for multistep.
The actual n-step reward list contains the rewards of the game reset after the terminal. However, when calculating target_q, the value after the terminal is calculated as 0 by multiplying by done, so it seems okay.
Describe the bug A clear and concise description of what the bug is.
Samples of Multistep agent has trash value about post-terminal state.
To Reproduce Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Development Env. (OS, version, libraries): Please describe current development environment
Additional context Add any other context about the problem here.