model training process setup

reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.

MIT License

486 stars 97 forks source link

model training process setup #121

Open zzt007 opened 3 months ago

zzt007 commented 3 months ago

Hi , thanks for your great job! I have done all the work mentioned in readme. I also start to train the model, however, the car moves very slowly in rviz and gazebo , and does not run smoothly as in the example ,even needs to wait a minute before changing positon. Is this because of my computer's poor performance?

my computer and environment as follows:
- cpu : 13700k i7
- no NVIDIA Graphics Card
- Windows WSL2 & Ubuntu20.04 ROS noetic
the following two pictures are 1 minute apart
the terminal belike 👇

reiniscimurs commented 3 months ago

Hi,

I have deployed this model on i5 and i3 cpus without cuda so the training part where episode is collected is not that resource intensive. If it takes a minute to change a single (or even a couple) step it does not seem normal to me. However, it does seem you are using a virtual machine, and it either might not have enough resources or not configured correctly. So I would suggest checking if any other ros application works there, and if it does not run smoothly, you could know where the problem lies. In any case, this seems more of a hardware issue and I don't think I can help you much there.

zzt007 commented 3 months ago

Thank you for your reply. Does that situation mean running the program successfully? I want to debug this program to learn the connection between DRL and ROS&gazebo simulation, and then I could work on my own project.

reiniscimurs commented 3 months ago

From what I can see, the software is launched properly.

zzt007 commented 3 months ago

Hi there, I want to know how to judge whether the trained model is convergent? This is my first contact with RL training. So what metrics that I need to check? loss curve like DL model? or reward value reaches a stable value? The following is my terminal information during the training . -- training start

-- epoch increase

I find that with the increase of epoch, its average rewards becomes negative . Looking forward to your reply.

reiniscimurs commented 3 months ago

Hi,

Better indicator would be curves in tensorboard. Evaluation in the beginning in your case probably places the goals really close to the robot so it randomly "collects" them. As training goes on, the goal distance gets increased and situations become more complex to the robot.

Loss is not a good indicator and you can see more on the topic here: https://github.com/reiniscimurs/DRL-robot-navigation/issues/89#issuecomment-1837966443

Generally I would look for the convergence of the maxQ reward on tensorboard.

zzt007 commented 3 months ago

Many thanks for your kind help and share.