reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
MIT License
487 stars 97 forks source link

Some problems #87

Closed pzhabc closed 7 months ago

pzhabc commented 7 months ago

Thanks for your sharing. I have a question. In your code, "done_bool = 0 if episode_timesteps + 1 == max_ep else int(done)", when episode_timesteps reaches the maximum "max_ep", done_bool Why is it not set to 1?

reiniscimurs commented 7 months ago

done_bool is used in Bellman equation calculation. Essentially, it shows if the state was terminal - if a goal was reached or collision occurred, i.e. there were no following steps. While we do terminate the episode when episode_timesteps is reached, the state itself was not terminal and its value should be calculated (with the Bellman equation) assuming that there are following steps. Therefore, we set the done_bool to 0 in this case instead of 1.

pzhabc commented 7 months ago

done_bool is used in Bellman equation calculation. Essentially, it shows if the state was terminal - if a goal was reached or collision occurred, i.e. there were no following steps. While we do terminate the episode when is reached, the state itself was not terminal and its value should be calculated (with the Bellman equation) assuming that there are following steps. Therefore, we set the to 0 in this case instead of 1.episode_timesteps``done_bool

Okay, I get it. Thank you.