reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
MIT License
571 stars 119 forks source link

Training convergence problem #86

Open peipei518 opened 10 months ago

peipei518 commented 10 months ago

Hello, I tried to change the vehicle model and started training, but now I am encountering some issues. My avg_ Reward has always been an integer, and I remember when using the original model before, there were also numbers after the decimal point. And my project was unable to converge after multiple attempts, and the effect was not significant after I tried to change the seed value. What is the reason for this. 微信图片_20231123093218

reiniscimurs commented 10 months ago

Hi, you are right that the average reward should be a float value instead of an integer. I would guess that some changes you have made either touch the calculation of the average reward during the evaluation step or the reward calculation.

You should check the returned values for the reward calculation and see if any of your changes affect anything.

peipei518 commented 7 months ago

Thank you for your reply. I changed another robot model. The model is a four-wheel differential model. The changes include changing the robot model launch file in train_velodyne_td3.py, the robot model name, as well as the cmd_vel topic and the radar and odom topics. Change to /cmd_vel,/scan,/odom, and leave the rest unchanged. The noetic version of the code is used, but since my robot is a 2D lidar, I copied the velodyne_env.py code from melodic and used it. After starting training, I found that the average reward was still an integer instead of a floating point number, and the training After one night, it still cannot converge and there is no convergence trend. I would like to ask where the problem is. Is it because the code does not support the four-wheel differential model or some other reason. I look forward to your reply!

peipei518 commented 7 months ago

2024-02-29 16-30-42屏幕截图 2024-02-29 16-33-10屏幕截图 I added print("Current avg_reward:", avg_reward / count) to output the value of avg_reward, and found that it is also a floating point number, but print( "Average Reward over %i Evaluation Episodes, Epoch %i: %.6f, %.6f" % (eval_episodes, epoch, avg_reward, avg_col) ) still outputs integers. I don’t understand why. I hope you can help me figure it out. Thank you very much!

reiniscimurs commented 7 months ago

What is your reward function? I would guess you are outputting only collision rewards and every step reward is 0

peipei518 commented 7 months ago

2024-02-29 17-32-35屏幕截图 I have not made any changes to the reward function, it is still the original function.

peipei518 commented 7 months ago

2024-02-29 17-36-27屏幕截图

reiniscimurs commented 7 months ago

Even if validation is wrong that should not be an issue as no training is done in these steps. But it might be indicative that something is wrong with the training process.

I can see that you have updated the lidar FOV to 360 degrees but the code is hardcoded to use 180 degree FOV. In any case, sounds like you have made a lot of adjustments to the code which I would not be able to check.In theory, the change of robot model should be a problem but do check your code for possible mistakes. Only thing I can suggest is to debug through the code and especially pay attention to the reward function.

peipei518 commented 7 months ago

av Q loss max Q I tried to modify the noetic version of the code and replace the three-dimensional radar data with two-dimensional lidar. It seemed to solve the problem of floating point numbers, but after three days of training, there was still no convergence trend. This is a graph of my Q value and loss. , please tell me where should I look for the problem, what should be the correct trend of these three values?

reiniscimurs commented 7 months ago

av Q loss max Q I tried to modify the noetic version of the code and replace the three-dimensional radar data with two-dimensional lidar. It seemed to solve the problem of floating point numbers, but after three days of training, there was still no convergence trend. This is a graph of my Q value and loss. , please tell me where should I look for the problem, what should be the correct trend of these three values?

Sorry, I do not know what exact changes were made. I would not be able to tell you where to look for issues.

peipei518 commented 7 months ago

I'm very sorry that my description is not clear enough. I will take a screenshot of the modifications I made below and hope to get your opinions. change1 First I added the binning function in the mdelodic version of the code. change2 Then because I'm using 2D lidar, I commented out the velodyne_callback function. change3 calculate_observation Here I added the code to receive /scan lidar messages.And replace the observe_collision function with the calculate_observation function. change4 Code for processing radar data has been added here.I haven't changed the rest of the code.

reiniscimurs commented 7 months ago

Can you post the full file please? I do not see if the pause function has been deleted or just left out in the screenshots.

I would suggest not using calculate_observation over observe_calculation. The former is just a slower and worse version of the latter. You simply need to pass in data.ranges in this case.

peipei518 commented 7 months ago

I have sent the complete code file to your email reinis.cimurs@de.bosch.com, my email address is hanlaoerqq@gmail.com. My purpose is to change the robot model to a four-wheel differential model using 2D lidar data. However, since my coding foundation is still relatively weak, it is still difficult for me to modify the code, so I have a series of problems. I hope you can help me When you have time, help me find out where the problem is and give me some suggestions. Thank you very much!

reiniscimurs commented 7 months ago

When calculating observation in step your function will return the wrong laser reading. See:

laser_state = np.array(data.ranges[:])
v_state = []
v_state[:] = self.velodyne_data[:]
laser_state = [v_state]

Your laser_state is just copying values from self.velodyne_data. Since you actually do not update these values, they are taken from where you defined self.velodyne_data = np.ones(self.environment_dim) * 10 So laser_state is constant in your step function. You have implemented the change correctly in the reset function, but not in step function so your model can not learn from laser information.

Change the aformentioned lines to what you have in reset function:

laser_state = np.array(data.ranges[:])
laser_state[laser_state == inf] = 10
laser_state = binning(0, laser_state, 20)
peipei518 commented 6 months ago

Thank you for your suggestion. This seems to work. Now the robot will occasionally stop and turn when it encounters an obstacle, but most of the turns will still hit the obstacle. After I trained for 100 Epochs, the average reward still remains at -80. When it reaches about -100, should I adjust the reward function or do other work next? Or is it because my radar FOV is 360 degrees? You seem to have mentioned above that the code is hardcoded to use 180 degree FOV. If it is FOV As a result, how should I modify the code to make it suitable for my robot while ensuring that the FOV of my radar remains unchanged?

reiniscimurs commented 6 months ago

Hi,

Please don't confuse radar with lidar.

For one, gaps is only calculated for the range of -pi to pi: https://github.com/reiniscimurs/DRL-robot-navigation/blob/main/TD3/velodyne_env.py#L91

This is probably the major change. But you should carefully go through the code and see where laser state is present and debug if the values you are expecting are actually there.

peipei518 commented 6 months ago

Hi, I found that the gaps parameter seems to be only used in the velodyne_callback function, but since I changed it to 2D lidar , I commented out this function and used this code to obtain lidar data. while data is None: try: data = rospy.wait_for_message('/scan', LaserScan, timeout=0.5) except: pass laser_state = np.array(data.ranges[:]) laser_state[laser_state == inf] = 10 laser_state = binning(0, laser_state, 20) In addition, I would like to ask if the network I once trained using source code 3D lidar can be used on my current 2D lidar model, because after I use it, the robot stays in place.

reiniscimurs commented 6 months ago

This binning function should work.

Yes 3d trained model can be used with 2d later. From networks perspective, what it gets as an input is a 1D vector of laser data. This is returned from both, 3D and 2D lidars. You just have to make sure that data is represented the same way.

peipei518 commented 6 months ago

I tried to use the multi_robot_scenario robot and adjusted the hukuyo lidar to 360 degrees, and it was successfully trained. This proves that there is nothing wrong with my code. But when I replaced my four-wheel differential robot model, the training always failed. When I used the previously trained model to control my four-wheel differential robot, I found that it consistently shook left and right in place instead of moving towards the target point. ,why is that?

reiniscimurs commented 6 months ago

Sorry, that is outside the scope of this repo and I have no experience with that. I don't have enough information or experience here to make a guess.