Global Navigation: Does the robot know the next waypoint by setting the reward function？

Yangzuodong commented 3 months ago

Dear Reinis Cimurs, I hope this message finds you well.First of all thank you very much for the work you have done, it has helped me a lot. Now I'm trying to build a global navigation simulation. I removed the move_base code and used the trained TD3 network to drive the robot, and I added the code for reward in GDAM_env:

GDAM_env_4 GDM_env_1 GDAM_env_3

These codes are written with reference to the code of training TD3 in DRL_navigation, which I personally understand as adding the setting of reward function to let the robot know that reaching local_goal can get a positive reward of 100, and the robot will go to the selected POI through the trained network output.

In the original definition of the check_goal function, I also changed the code about Movebase publishing moving targets: GDAM_2

https://github.com/reiniscimurs/GDAE/assets/161119196/798dd7ab-9518-4275-86de-9d6a1218261d

However, the simulation video shows that the robot has no intention to go to the local_goal, and I feel that the robot does not know where to go. I am very confused, can you give me some guidance. Thank you for your time, and I appreciate your efforts in making this project open-source.

reiniscimurs commented 3 months ago

Hi,

Why are you adding the reward function to this code? Are you planning to continue training in this setting with already pre-trained model?

Be aware that there are differences in how state is represented here in GDAE and in the base DRL training repository. Mainly, the robot state.

Yangzuodong commented 3 months ago

Hi, Thanks for your reply！

Firstly, I don't want to continue training, I want to make the robot move entirely through TD3's actor network, without relying on move_base. But I don't know how the agent knows local_goal. After looking at the test_velodyne_td3.py file, I guessed that the robot knew the end point because it was set up in env to receive a reward for reaching the end point.

Secondly, I modified the state representation in GDAM to be consistent with that in TD3. This includes laser_state and robot_state

Lastly，Thank you very much for your patient reply！

reiniscimurs commented 3 months ago

Reward is not necessary for model deployment. It is only needed to train the model. The model knows the goal as it is part of the state that is given to the model.

There is no distinction between local and global goal. At each individual step the model recieves a single target to go to and it does not know if it is a global or local goal. The step function reads the current selected node and uses it as the current target that is then passed to the TD3 model in the state. The node selection is done entirely through the heuristics function.

Yangzuodong commented 3 months ago

Hi， Thanks for your reply！So does the robot know the target by the Dist_to_goal parameter in the robot_state returned from step function？ I modified the code and now it looks like the video：

https://github.com/reiniscimurs/GDAE/assets/161119196/24ac8c2b-e04b-48b2-abfc-5bc50dceb6ae

However, the motion path of the agent is a little strange, it seems that it always wants to move clockwise.This was not the case with test_velodeny_td3.

Thank you very much for your work and your patient response！

reiniscimurs commented 3 months ago

The goal is given in polar coordinates by distance and angle. This is explained in the tutorial: https://medium.com/@reinis_86651/deep-reinforcement-learning-in-mobile-robot-navigation-tutorial-part3-training-13b2875c7b51

Yangzuodong commented 3 months ago

Thank you for solving my doubts, I wish you a happy life！

reiniscimurs / GDAE

Global Navigation: Does the robot know the next waypoint by setting the reward function？ #18