reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
MIT License
635 stars 126 forks source link

Apply the trained netwrok obtained from p3dx to turtlebot #60

Closed Weizhi-T closed 1 year ago

Weizhi-T commented 1 year ago

Hi,

I have trained a pretty good network model on p3dx (about 90 average reward and 0 averag collision), then I followed your methods of changing the model from p3dx to turtlebot (waffle and burger). I loaded the pretained network to turtlebots, they act strange (didn't perform the tasks, just swung in place)

Then I trained the turtlebots from the start. In this case, I applied the network obtained from turtlebots to p3dx, same thing happened. (swung in the place)

Do you have the same problem? Or could you explain the reason for this situation? Looking forward your reply. Thanks!

reiniscimurs commented 1 year ago

Different robots models have different kinematics. If you learn with a specific robot model, you will implicitly learn the kinematics of the robot in a form of expected reaction to the taken action for the state. I would find it very strange that if you train with one robot, then the network weights would be applicable to another/different robot.

Another thing to consider is the different location and filtering of the velodyne puck data. If you do not update it, the sensor data will be either unexpected or wrong for the specific robot.

In conclusion, I would not expect this to work due to differences in robot characteristics.

Weizhi-T commented 1 year ago

Thank you so much for your reply.

I understand that different kinematics would lead to different motions. But the question is that all three robot (P3dx, Waffle, Burger) are two wheel differential car. They have the same kinematics model (also they used the same plugin control algorithm: as shown in urdf file). Therefore, one train network can work for all, only not best for every one. In my case, prefect network worked for P3dx, just didn't work for Turtlebot.


Anyway, after few attempts (I directly published cmd_vel to robots), I have found that P3dx has different postive rotation direction as Turtlebots'. Clockwise is positive for P3dx, Anticlockwise is postive for Turtlebots. Therefore, I added a minus sign in the front of action[1] whichi was rotation output. Then it worked! Turtlebots can perform as well as P3dx!

Finally, I still have one question. Where is the place of definition of rotation direction? Is it in gazebo Plugin? or anywhere else? Thank you again!

reiniscimurs commented 1 year ago

That is an interesting outcome. I am not aware where the change happens. Most likely you would either have to look into the robot files themselves or how the controller is called.