Closed pzhabc closed 7 months ago
done_bool
is used in Bellman equation calculation. Essentially, it shows if the state was terminal - if a goal was reached or collision occurred, i.e. there were no following steps. While we do terminate the episode when episode_timesteps
is reached, the state itself was not terminal and its value should be calculated (with the Bellman equation) assuming that there are following steps. Therefore, we set the done_bool
to 0 in this case instead of 1.
done_bool
is used in Bellman equation calculation. Essentially, it shows if the state was terminal - if a goal was reached or collision occurred, i.e. there were no following steps. While we do terminate the episode when is reached, the state itself was not terminal and its value should be calculated (with the Bellman equation) assuming that there are following steps. Therefore, we set the to 0 in this case instead of 1.episode_timesteps``done_bool
Okay, I get it. Thank you.
Thanks for your sharing. I have a question. In your code, "done_bool = 0 if episode_timesteps + 1 == max_ep else int(done)", when episode_timesteps reaches the maximum "max_ep", done_bool Why is it not set to 1?