kakaoenterprise / JORLDY

Repository for Open Source Reinforcement Learning Framework JORLDY
Apache License 2.0
362 stars 49 forks source link

Non-episodic update of Multistep agent #171

Closed erinn-lee closed 2 years ago

erinn-lee commented 2 years ago

Describe the bug A clear and concise description of what the bug is.

Samples of Multistep agent has trash value about post-terminal state.

To Reproduce Steps to reproduce the behavior:

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Development Env. (OS, version, libraries): Please describe current development environment

Additional context Add any other context about the problem here.

kan-s0 commented 2 years ago

I checked it while looking at the branch being edited, and it seems to calculate without any problem for multistep.

스크린샷 2022-04-16 오전 11 14 52

The actual n-step reward list contains the rewards of the game reset after the terminal. However, when calculating target_q, the value after the terminal is calculated as 0 by multiplying by done, so it seems okay.