reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
MIT License
634 stars 126 forks source link

Global goal point #127

Open YMUDRL opened 8 months ago

YMUDRL commented 8 months ago

Dear Reinis Cimurs, I recently read your essay titled "Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning",I think your paper is fantastic and having watched your videos on youtube, I can't wait to implement it.

My problem now is that I want to use my cart to do real physics experiments, but I don't have in your code you find where to set the global target point. And I don't understand very well how to realize the exploration of the environment by going to a global goal point, instead of navigating from a-b at once. Could it be that the purpose of exploring as much of the environment as possible is achieved by going through the POIs generated in your code and traversing enough of them?

I am very touched to see your serious and patient replies to other questions, and I look forward to hearing from you and wish you a happy life!

reiniscimurs commented 8 months ago

Hi,

This repo is for simply training a robot motion policy through DRL. There is no global exploration in this repository. Global planning and exploration is described in GDAE repository. POI generation and selection is also only part of GDAE and not this repository.

YMUDRL commented 8 months ago

Thanks for your prompt reply, I had read your GDAE before this and asked you questions as well. May I ask if I have Pytorch + Ubuntu 20.04 + ROS Neotic and can I run your GDAE.py? What environment would you recommend to configure for simulation and eventual generalization to a real cart, my cart is equipped with an on-board computer with only a cpu!

YMUDRL commented 8 months ago
  1. I would also like to ask you a question about the selection of the global point, is it manually selected inside rviz or is it directly entered inside the code. Many of the global points inside your youtube videos are far away from the starting point.

2.The second question is, what do you think are the sometimes advantages of reinforcement learning compared to other autonomous exploration algorithms such as Active Slam, etc. The only thing I can think of is the ability to reduce the amount of computation on the on-board computers through an end-to-end approach, and what are the other huge advantages you see?

Looking forward to your reply!

reiniscimurs commented 8 months ago

Thanks for your prompt reply, I had read your GDAE before this and asked you questions as well. May I ask if I have Pytorch + Ubuntu 20.04 + ROS Neotic and can I run your GDAE.py? What environment would you recommend to configure for simulation and eventual generalization to a real cart, my cart is equipped with an on-board computer with only a cpu!

Yes that setup would be able to run GDAM but you would have to update the code accordingly. The training is the thing that requires the most resources. The trained model is fairly lightweight so for deployment there are no heavy hardware requirements. I have deployed the GDAE on intel NUC with i3 CPU so it is possible to do that with only onboard cpu support.

Not sure what you mean by environment. If you mean the simulated world, then the one used in this repo works quite well.

reiniscimurs commented 8 months ago
  1. I would also like to ask you a question about the selection of the global point, is it manually selected inside rviz or is it directly entered inside the code. Many of the global points inside your youtube videos are far away from the starting point.

2.The second question is, what do you think are the sometimes advantages of reinforcement learning compared to other autonomous exploration algorithms such as Active Slam, etc. The only thing I can think of is the ability to reduce the amount of computation on the on-board computers through an end-to-end approach, and what are the other huge advantages you see?

Looking forward to your reply!

  1. There are no global points in this repository. This repository trains purely the local navigation.
  2. Some additional benefits are described in the paper.
YMUDRL commented 7 months ago

Dear Reinis Cimurs,I'd like to continue to ask you why you add robot_state inside the input state, that is, why you continue to input the action as a state, which seems to be different from our traditional reinforcement learning algorithms robot_state = [distance, theta, action[0], action[1]] state = np.append(laser_state, robot_state) reward = self.get_reward(target, collision, action, min_laser) return state, reward, done, target

reiniscimurs commented 7 months ago

Hi

Essentially we add it there because of inertia. We want to give the model an idea if it is moving already or not. If a robot is already moving, the same new action might bring it farther than if the robot has stopped. While it did not bring a huge benefit, experimentally it showed better performance than if we leave this information out.

YMUDRL commented 7 months ago

Thanks for the reply, but I still don't understand what you're saying about that:

If a robot is already moving, the same new action might bring it farther than if the robot has stopped.

reiniscimurs commented 7 months ago

Lets say your robots current velocity is 1m/s. Now you want to apply action that states 1m/s for 1 second. Since you have constant velocity, after this 1 second your robot will have moved 1 meter. On the other hand, lets assume a different scenario where your robots current velocity is 0m/s and you apply action 1m/s for 1 second. Since your robot has some weight, it will not be able to move with the velocity of 1m/s right away, there will be some ramp-up period until it reaches this speed. So after 1 second you will not have moved 1m but some measure below that.

So even though the rest of the state was the same and even action is the same, you get a slightly different outcome, because the internal robot state is different, depending on if the robot is in motion or not.

YMUDRL commented 7 months ago

Thank you for your patience! I seem to understand what you mean, what you mean is that the robot's own state should also be input into the neural network as a state he learns, but can he really learn this, the previous comparison had laser_state's 20 dimensions of inputs, did you validate this experimentally?

reiniscimurs commented 7 months ago

I do not understand the question.

The state is represented by laser_state + robot_state. This is what the model is trained on and used in deployment.