reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
MIT License
634 stars 126 forks source link

Robot moving in circle after set the multiple seed and change the Lr #163

Open Seher-789 opened 3 weeks ago

Seher-789 commented 3 weeks ago

i have trained my robot many time but robot is moving in circle while evolution and i changed the seed from 0 to 42 but no result even change the lr from 1e-3 to 1e-4 and policy noise but no result please suggest me some solution Screenshot from 2024-10-31 23-38-19

reiniscimurs commented 3 weeks ago

Hi,

Please use the submission template when submitting the issue. This gives me the necessary information to start figuring out what the issue is.

How many epochs did you try training with each seed?

Seher-789 commented 3 weeks ago

Hi, Thank you for response i trained for 75 but the reward was negative.

reiniscimurs commented 3 weeks ago

Again, please provide full information as in the template.

75 epochs for each seed? Seems excessive. If it does not converge at such time for any seed there is either some change or mistake in the implementation. Here, the template information is needed.

Seher-789 commented 2 weeks ago

thank you for response , No i have trained the robot 75 epochs for seed 2 and seed 0 but the rewards were negative and i have trained with all the other seed values for ten epochs but the rewards were negative after the validating start the robot starts to move in circle

reiniscimurs commented 2 weeks ago

One last time, provide the information as set in the issue template. It is always very difficult to answer questions without knowing the full information.

TD3 can be a bit iffy with training and could also not converge but if no changes were made to the code I would expect it to work with at least some random seed. 10 epochs might still not be enough though to evaluate the performance. But if longer training does not help, you could try to use different learning rates.