reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
MIT License
486 stars 97 forks source link

Training convergence problem #103

Closed Ethan0207 closed 2 months ago

Ethan0207 commented 5 months ago

Hello, I have two problems .

  1. Could you give me some suggestions to solve training convergence problem? What parameters can be adjusted? These days, I tried to train the agent, but I failed. Firstly, I didn't adjust anything. I trained the agent for 15 hours, but it just had 67 epochs. When I tested it, the agent just turned in circles. And the loss is strange. 1 12 1 Then, I tried to change TIME_DELTA. But the problem still existed. 1 12 2 1 12 3 1 12 4 1 12 5 Finally, I changed the discount. I trained the agent for 24 hours, it just had 103 epochs. When I tested it, the agent just turned in circles. And the loss is also strange. 1 12 6 1 12 7 1 12 8 1 12 9 1 12 10

  2. How can a trained data set be used for test? For example, when I trained the agent for 5 times. Then I have 5 trained data named "1" "2" "3" "4" "5". When I used the command "python3 test_velodyne_td3.py", which one are used for test? if I want to test "3", how can I do? 1 12 11

reiniscimurs commented 5 months ago

Hi,

  1. The first thing to try is changing the seed value for random initialization. See explanation here: https://github.com/reiniscimurs/DRL-robot-navigation/issues/19

  2. Note that what is saved is model weights, we do not save any dataset in this repo. The model weights are loaded from the .pth file that stores the actor model weights: https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/test_velodyne_td3.py#L41 By default, it will be loaded from whatever actor model is placed in the pytorch_models directory: https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/test_velodyne_td3.py#L65 The runs directory is place where all trained models are stored but they are not loaded from this directory. Simplest way of loading your trained model weights for testing is simply copying over the appropriate .pth file to the pytorch_models directory.

Ethan0207 commented 5 months ago

Thank you for your reply. I will try it.

Ethan0207 commented 5 months ago

Hi, I had changed the seed value for random initialization. The seed value is from "0" to "1".The training process seems good. The agent can arrive at target. So I have two questions. Firstly, why do we change the seed value? What role does the seed value play here? secondly, why does the loss curve look so strange? How can I judge convergence and then stop training

1 13 1 1 13 2

reiniscimurs commented 5 months ago

Hi,

As i mentioned here seed value determines the random initialization of the model weights. The better the weights that you are starting from, the easier it is for the model to learn.

Loss function in DRL is not the same as loss function in other supervised learning methods and you should not look at the curve in the same way. See some explanation and link here: https://github.com/reiniscimurs/DRL-robot-navigation/issues/89#issuecomment-1837966443

There is no specific convergence criteria, but I would look at the point where the max Q value has converged.

Ethan0207 commented 5 months ago

Hi, Thank you for your reply. So, like this curve, if the max Q value tends to be stable, whether it can be considered to have converged? 1 13 3

reiniscimurs commented 5 months ago

It seems a reasonable convergence. The max Q value should be around 120 after training. However it is not a guarantee that it is a well performing policy. You should visually evaluate if the robot performs to your satisfaction during evaluation episodes.

Emptyth commented 5 months ago

Hi, May I ask that have you solved the probelm about the account of epochs? I trained the model for about 20 hours and got about 90+ epochs. Changing the seed value seems don't work for this. Did you change any other parameter? Or it is just the normal situation. For the result of training shows the ablity of navigation, but the agent may get in trouble in some situations.

Ethan0207 commented 5 months ago

Hi, Maybe I think it is just the normal situation. Because I also trained the model for about 25 hours and got about 105 epochs. If you want to train quickly, maybe you can refer to https://medium.com/@reinis_86651/deep-reinforcement-learning-in-mobile-robot-navigation-tutorial-part5-some-extra-stuff-b744852345ac.

Hi, May I ask that have you solved the probelm about the account of epochs? I trained the model for about 20 hours and got about 90+ epochs. Changing the seed value seems don't work for this. Did you change any other parameter? Or it is just the normal situation. For the result of training shows the ablity of navigation, but the agent may get in trouble in some situations.

Emptyth commented 5 months ago

Hi, Maybe I think it is just the normal situation. Because I also trained the model for about 25 hours and got about 105 epochs. If you want to train quickly, maybe you can refer to https://medium.com/@reinis_86651/deep-reinforcement-learning-in-mobile-robot-navigation-tutorial-part5-some-extra-stuff-b744852345ac.

Hi, May I ask that have you solved the probelm about the account of epochs? I trained the model for about 20 hours and got about 90+ epochs. Changing the seed value seems don't work for this. Did you change any other parameter? Or it is just the normal situation. For the result of training shows the ablity of navigation, but the agent may get in trouble in some situations.

Thanks for your reply. That helps me a lot.