Closed IcenDy closed 1 year ago
Hi,
That would be very difficult for me to answer with any certainty. It might be possible that there is some overfitting. I have used different models in different stages of convergence in real implementations, some worked better than others. Letting it optimize for a very long time definitely did not bring benefit and mostly did cause overfitting. Of course, in your instance, it also could be any sort of sim2real gap related issue.
Dear IcenDy and Reinis,
I'm new to DRL and ROS so I need help from others to solve my current problems while using this repo.
I ran this repo with Melodic branch on Ubuntu 18.04 LTS machine. I think I have set the environment correctly since I'm able to start the training. However, the pioneer3dx robot did not navigate to the goals after 12 hours of training. The robot was spinning at the same location. Is this a normal situation? If not, how can I fixed that. In addition, during the training, I realized my GPUs are not fully utilized. How can I spend more resources on the training.
Best regards
Hi,
You could look at issues https://github.com/reiniscimurs/DRL-robot-navigation/issues/19, https://github.com/reiniscimurs/DRL-robot-navigation/issues/42, https://github.com/reiniscimurs/DRL-robot-navigation/issues/49 and see if you can find some useful information there.
Most importantly, see if your network is running in real time and try testing out other seed values.
Thank you very much for your previous answer, and I am sorry for taking so long to reply. I would like to ask how to determine if the TD3 network has completed convergence based on the maximum value curve and the average value curve in the tensorboard? Is it just near the inflection point where the average value curve drops to the minimum? I also want to ask by the way if I want to change the robot's model, what hyperparameter values in the python file need to be modified? Trying to modify parameters such as collision distance always leads to a non-convergence of the network.
What do you mean by "where the average value curve drops to the minimum?" You would expect the average value to not be minimum, but rather some stable value.
I am sure there is some arbitrary value in some paper out there that defines what convergence looks like. But I cannot give a general answer for this implementation. Since I was more interested in the performance in real life, I did not look that much at specific points after inflection for them. Rather, I just saw that the values have stabilized and stopped training at an arbitrary point after that. However, training for a long time after the point of inflection did not help.
I have written a bit how to change the robot model here: https://medium.com/@reinis_86651/using-turtlebot-in-deep-reinforcement-learning-4749946e1c15 You can take a look and get some insights there. There shouldn't be any specific hyperparameters (besides the collision distance that you mentioned, and the ground filter height) if you are using a proper package for a different robot. If you are want to change just the visual aspects of the robot, without changing the robot package, there will be a lot of parameters to adjust for your specific robot in the robot package file.
I've trained a few times before, and both the average reward curve and the maximum value curve drop and then rise to a plateau, and then I stop training when I think it has converged. I found that the trained model seems to be a bit overfitted (i.e., the obstacle avoidance effect is not so good after replacing the robot model with a similar size machine differential), and the loss curve during the training process instead reaches the minimum at the inflection point of the reward curve and diverges after the reward curve is smooth. I would like to ask if this is due to overfitting caused by training.
I've trained a few times before, and both the average reward curve and the maximum value curve drop and then rise to a plateau, and then I stop training when I think it has converged. I found that the trained model seems to be a bit overfitted (i.e., the obstacle avoidance effect is not so good after replacing the robot model with a similar size machine differential), and the loss curve during the training process instead reaches the minimum at the inflection point of the reward curve and diverges after the reward curve is smooth. I would like to ask if this is due to overfitting caused by training.
Sorry, I do not have a definite answer to that. It is also quite hard for me to make an educated guess here.
That's okay, I'll look into it. Thank you for your patience in answering.
Hello, I tried to repeat the training two more times, one of which stopped too early and made the network's obstacle avoidance performance turn out to be poor, and the other was a little better. I would like to ask you how to determine if convergence is completed during training? Is it by looking at the graphs in tensorboard, the live navigation in rviz or the average rewards in the shell? I would appreciate your answer. If it is convenient, I would also like to ask you to show the tensorboard training curve of your successful convergence, thank you very much for your help.
I would look at tensorboard and actual performance. If the performance seems good and the graphs have settled, I would stop the training. I do not have any more insights into what constitutes convergence than that.
If the model visibly does not perform well in the simulation, then the training is either unsuccessful or it has not completed training yet. If, however, it looks fine in the simulation, but you test it on a real robot and it does not perform well, you most likely have some sim-to-real problem.
Are you trying to test it in the real world environment and it does not perform well? Without knowing more details, I cannot make any estimations about issues you are running into.
Excellent work. Can I ask about how long the TD3 network is trained to converge? I got good results in simulation after 16h training (32G, 11700k, 3080ti), but not so good in real environment deployment. Is this due to overfitting?