Closed Ethan0207 closed 2 months ago
Hi,
The first thing to try is changing the seed value for random initialization. See explanation here: https://github.com/reiniscimurs/DRL-robot-navigation/issues/19
Note that what is saved is model weights, we do not save any dataset in this repo. The model weights are loaded from the .pth file that stores the actor model weights: https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/test_velodyne_td3.py#L41
By default, it will be loaded from whatever actor model is placed in the pytorch_models
directory: https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/test_velodyne_td3.py#L65
The runs
directory is place where all trained models are stored but they are not loaded from this directory. Simplest way of loading your trained model weights for testing is simply copying over the appropriate .pth file to the pytorch_models
directory.
Thank you for your reply. I will try it.
Hi, I had changed the seed value for random initialization. The seed value is from "0" to "1".The training process seems good. The agent can arrive at target. So I have two questions. Firstly, why do we change the seed value? What role does the seed value play here? secondly, why does the loss curve look so strange? How can I judge convergence and then stop training
Hi,
As i mentioned here seed value determines the random initialization of the model weights. The better the weights that you are starting from, the easier it is for the model to learn.
Loss function in DRL is not the same as loss function in other supervised learning methods and you should not look at the curve in the same way. See some explanation and link here: https://github.com/reiniscimurs/DRL-robot-navigation/issues/89#issuecomment-1837966443
There is no specific convergence criteria, but I would look at the point where the max Q value has converged.
Hi,
Thank you for your reply. So, like this curve, if the max Q value tends to be stable, whether it can be considered to have converged?
It seems a reasonable convergence. The max Q value should be around 120 after training. However it is not a guarantee that it is a well performing policy. You should visually evaluate if the robot performs to your satisfaction during evaluation episodes.
Hi, May I ask that have you solved the probelm about the account of epochs? I trained the model for about 20 hours and got about 90+ epochs. Changing the seed value seems don't work for this. Did you change any other parameter? Or it is just the normal situation. For the result of training shows the ablity of navigation, but the agent may get in trouble in some situations.
Hi, Maybe I think it is just the normal situation. Because I also trained the model for about 25 hours and got about 105 epochs. If you want to train quickly, maybe you can refer to https://medium.com/@reinis_86651/deep-reinforcement-learning-in-mobile-robot-navigation-tutorial-part5-some-extra-stuff-b744852345ac.
Hi, May I ask that have you solved the probelm about the account of epochs? I trained the model for about 20 hours and got about 90+ epochs. Changing the seed value seems don't work for this. Did you change any other parameter? Or it is just the normal situation. For the result of training shows the ablity of navigation, but the agent may get in trouble in some situations.
Hi, Maybe I think it is just the normal situation. Because I also trained the model for about 25 hours and got about 105 epochs. If you want to train quickly, maybe you can refer to https://medium.com/@reinis_86651/deep-reinforcement-learning-in-mobile-robot-navigation-tutorial-part5-some-extra-stuff-b744852345ac.
Hi, May I ask that have you solved the probelm about the account of epochs? I trained the model for about 20 hours and got about 90+ epochs. Changing the seed value seems don't work for this. Did you change any other parameter? Or it is just the normal situation. For the result of training shows the ablity of navigation, but the agent may get in trouble in some situations.
Thanks for your reply. That helps me a lot.
Hello, I have two problems .
Could you give me some suggestions to solve training convergence problem? What parameters can be adjusted? These days, I tried to train the agent, but I failed. Firstly, I didn't adjust anything. I trained the agent for 15 hours, but it just had 67 epochs. When I tested it, the agent just turned in circles. And the loss is strange.
Then, I tried to change TIME_DELTA. But the problem still existed.
Finally, I changed the discount. I trained the agent for 24 hours, it just had 103 epochs. When I tested it, the agent just turned in circles. And the loss is also strange.
![1 12 10](https://github.com/reiniscimurs/DRL-robot-navigation/assets/138771150/15ec3c26-c011-4db6-8d5b-cf731893f9bd)
How can a trained data set be used for test? For example, when I trained the agent for 5 times. Then I have 5 trained data named "1" "2" "3" "4" "5". When I used the command "python3 test_velodyne_td3.py", which one are used for test? if I want to test "3", how can I do?![1 12 11](https://github.com/reiniscimurs/DRL-robot-navigation/assets/138771150/e7191c9f-a829-49e4-8196-c2990cd45f35)