reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
MIT License
486 stars 97 forks source link

Some problems #106

Closed Ethan0207 closed 2 months ago

Ethan0207 commented 5 months ago

Hello, It's me again. I have a problem about the project. What is the area of the training environment? 1 18

reiniscimurs commented 5 months ago

You mean the size of it? One square is 1x1 meters, so you could calculate based on that.

Ethan0207 commented 5 months ago

OK, thank you for your reply.

Ethan0207 commented 5 months ago

Hello, I have another problem. In this project, what is the meaning of step, episode or epoch?

reiniscimurs commented 5 months ago

See: https://medium.com/@reinis_86651/deep-reinforcement-learning-in-mobile-robot-navigation-tutorial-part3-training-13b2875c7b51

Ethan0207 commented 5 months ago

Hi, Thank you for your reply. Maybe I hasn't understood the meaning of episode. In part3, it indicates that this is a collection of subsequent steps, until one of the termination conditions is reached. But when I train the agent, an episode is updated approximately every 20 minutes. And we can see the agent can reach the termination quickly. Could you explain it?

reiniscimurs commented 5 months ago

Hi,

That would not be an episode. Episode is exactly that, a collection of steps until either a crash, reaching a goal or reaching maximum episode step number. What you are observing is the end of an epoch and the starting of the evaluation cycle. One epoch is bunch of episodes between evaluations.

Ethan0207 commented 5 months ago

Hi, Thank you very much. I know what you mean. Just like this sentence, is that means this is the 62th epoch, and the average reward is -82.113001? But what is the meaning of the "0.900000" and how do we know how many episodes it has gone through? 1 19

reiniscimurs commented 5 months ago

That is the average reward during the 10 evaluation runs after epoch 62. 0.9 is the collision rate. This would be quite easy to tell from the code: https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/train_velodyne_td3.py#L32-L37

I would suggest to fully familiarize yourself with the code. This will help understanding what is in this repo.

Ethan0207 commented 5 months ago

Thank you very much. I've understood.