reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
MIT License
571 stars 119 forks source link

some small questions about details #88

Closed peterhuang621 closed 5 months ago

peterhuang621 commented 10 months ago

We appreicate your amazing work. After checking most issues, we have solved most of problems encountered and perform the experiment successfully. However, we still have several small and simple quesitons here:

  1. We see your reply from an issue like if we stop the training, for example after about 70 epochs, and then restart training, the weights will reset and lost? But we checked the .pth files and tensorboard log, it doesn't seem like that. So if we continue training another 50 epochs, the results would be like about 70+50 epochs, right?

  2. Even though we use the command "killall -9" and make sure everything closed, the "test" step often stucks in rviz gui where nothing is going on. If we reboot system or relogging in linux user, "test" step will work properly. The killall command seems not finish its job so perfectly.

  3. We have trained about 70 epochs for about 10 hours, the result is quite like the training.gif you demonstrated. It's great, but we'd love to see a better situtation like the car can go for one-minute journey. How roughly epochs number you would suggest? Like 200 epochs or more?

Thanks a lot if you have time to help. Have a good day!

reiniscimurs commented 10 months ago

Hi

  1. No, what my comment said is that the weights will be loaded from the checkpoint. What will not be loaded is the training hyperparameters. In order to fully resume the training, you would also need to save the hyperparameters and set them before re-starting the training. If you are using the default settings, stopping the training at 70 epochs and re-starting again for another 50 will not quite be the same as running it for 120 epochs, as the exploration noise will be reset. If there are any other dynamic hyperparameters, they would influence the training. Here, it probably does not matter much as it would still train quite fine, but training samples would be more random at the start of training.
  2. I do not know what you mean by "test" step?
  3. What do you mean about 1 minute journey? If a model is trained, there are no limitations to how long the policy can be rolled out for. Then you can have any length of episode on your test scenario as you want. The number of epochs will not influence the length of the policy rollout, it can only influence the quality of the policy.