reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
MIT License
635 stars 126 forks source link

Problem with testing in another envrionment #148

Open sattlelite opened 5 months ago

sattlelite commented 5 months ago

Hello, Mr. Reiniscimurs, I have done the test after training, bleow is the average reward function generated by my training of the improved method.In addition, I conducted tests on the original map, and basically reached the target, although there was a collision in a few cases. I then created a 30x30 size map with some sparse and regular obstacles in it and replaced the TD3.world file, and I modified the velodyne_env.py file so that the goal would not fall in the obstacles.The question is, when everything is running properly, the robot will stay in place and rotate at a small Angle to the left and right, and will not reach its destination. The answer seems to be obvious that the robot has never been trained beyond the present situation, so,dose it needs to be trained in a new environment until the positive reward is stable, right? I also wonder if the new envrionment requires more parameters for the TD3 network? If I increase the number of states, do I need a large number of network parameters? Looking forward to your reply. Thank you so much! 改进2-2

sattlelite commented 5 months ago

Sorry, the top two curves in this graph are 2000 real_time_rate and the bottom is raw time.

reiniscimurs commented 5 months ago

Hi, No there is no explicit need to train the model in the new environment. The issue here could be that either it has encountered local optima, or the input sensor values are outside of the trained range. For the latter, you could just cap the max sensor values to the max values in training. I.E. distance = min(distance, max_dist_in_training) and same with laser sensors. This should give you a reasonable performance in local planning setting. You of course can also re-train the model. But I am not sure if you would need to increase the model size. The only thing that would realistically change is the distance to the goal as max distance there would grow. And since it would take longer to actually get to the goal, it might be useful to figure out the proper discount factor and learning rate.

sattlelite commented 5 months ago

Hello Mr. reincimurs, I agree with you, I hope to try in the next experiment, and plan to train the model in the new map.But at the moment I have two questions: 1.do you mean to only get the maximum when you get the state? Do I need to modify this part of the code in "velodyne_env"? 2024-07-02 14-11-51 2.I am having trouble opening the world file while modifying the gazebo environment which I had changed from the original environment. I searched the Internet for a solution,and some said that gazebo needed to load materials online, and network issue were preventing them,so I installed the material pack locally. But it still reports an error. Here is the information running. 2024-07-02 13-52-28 and my file is here.

reiniscimurs commented 5 months ago
  1. I am not sure if I understand the questionn, but if you do not modify anything in laser sensor itself, then this should work just fine in larger environments as well.
  2. This is outside the scope of this repo so i could not consult you with your issue here. I have no way to check your world file at the moment.
sattlelite commented 5 months ago

Thank you for your reply. Fortunately, I have resolved the second question. I just carefully recreated another environment. I suspect I just broke the primal files. And now I am training in the 30*30 map. Before this, I modify the hokuyo.xacro file and self.velodyne_data as follows: image So I let laser_state get a 20m range of data. After that, I design a tactic to force the robot to go straight when the robot sees the goal. The tactic will let the action consistent about 10 steps. Then I train the robot on the new map. In the early times, it could achieve the goal, but now it just runs a circle after 10 minutes.

https://github.com/reiniscimurs/DRL-robot-navigation/assets/130289635/db269bfa-a8b3-414e-8b7e-b24aff95218c

A larger map needs more time to train, but could you please give me some suggestions? Is there anything I dismiss?

reiniscimurs commented 4 months ago

Are you using the main branch? The laser sensor is only used to detect collisions. It will not affect how the state is represented.

sattlelite commented 4 months ago

Yes. I use the main branch, I use the state[8:-12]. Then I change two lines of code in velodyne_env.py, then I print the state[8:-12] and check the terminal:

image

image

image

Additionally, because the map I created have many cylinders, and everytime I train I find that after a while almost half of the posts fall down, I suspected that the posistion of the robots was reset to the cardboardbox, which caused them ot crash violently and affected the posts, so I set every one‘s tags <self_collide>1</self_collide> <kinematic>1</kinematic> on them in the world to fix them. Because I don't want a lot of changes in the envrionment to affect the training of the robot. Thanks for your patience and reply, I just want it to train successfully on this new map.

sattlelite commented 4 months ago

Hello Mr. Reiniscimurs, I have trained the model in the 30x30 map with 2x acceleration for 14 hours. And I found that the robot can’t find the right policy to get to the goal. And the average reward was very low in the end. I have added the OU noise with “sigma =0.7” and “dt=0.09””mu=0”.I have increase the goal_reach _dist from 0.3 to 0.6. I also multiplied the final noise by an attenuation factor (just use the expl_noise). ![Uploading 56AA9BFD-924B-4C81-8317-B9ABCCBAB321.jpeg…]() Should I need to adjust the max angular velocity?Could you please give some advice?

reiniscimurs commented 4 months ago

I think I am missing the overall picture of what changes you have made to the code here. Can you please explain all changes in detail is a structured way please? It is quite hard for me to follow all the details here.

For instance, you mention changing the max distance in laser sensor setup. However, laser sensor is not used to get state information. For that you would have to update the supporting velodyne puck sensor files, but that is not mentioned, so it is unclear to me if it was done or not. It is also unclear to me what you mean by OU noise.

sattlelite commented 2 months ago

Hello Mr Reinicimurs, I have finished this task, the status information I used is completely original, I keep all the configuration in the original state, but I created a new 8x8 environment for testing, I found it difficult for the robot to adapt. So I load the already trained model into a new environment for secondary training, because I find it very difficult to train from scratch in a new environment. So my question is, how do you ensure that the model can be trained successfully in a new environment? I tried a lot of randomseed values and the robot ended up spinning, and the OU noise I used was Ornstein-Uhlenbeck noise. Thanks.

reiniscimurs commented 2 months ago

Again, please provide as much information as possible with full list of changes (code snippets, images and so on). There should not be any major issues with moving to another environment for testing or training. You should just make sure to cap the laser readings and that state values are capped at maximally seen values during training.

sattlelite commented 2 months ago

Okay, the following image is the model I trained on the map you provided using the method you posted and the unmodified configuration, and then tested it on the map I created. I adjusted the environment file to ensure that random points do not appear within obstacles. I used your method and did not change the configuration of the Velodyne sensor. The reward value eventually converged. Most of the time, robots can execute characters normally, but sometimes they get stuck as shown in the video:Kazam_screencast_00000.webm velodyne configuration reward

The current results are directly tested using the trained model in a new environment, and the test results show that there were 64 collisions out of 800 tasks (excluding being trapped),and I am quite satisfied with this result. However, when I started training on a new map from scratch, the robot would exhibit the situation shown in the video after a period of time, and the OU noise I used is shown in the figure OU OU1 OU2 [Uploading Kazam_screencast_00000.webm…]()

and reward is -72.525149 in epoch 4: 图片

reiniscimurs commented 2 months ago

I see. Is this the only kind of scenario that the model struggles? Meaning, does it struggle when you place the goal point quite close to the obstacle?

If that is so (and it is what is happening here) then it seems to me the issue is same as i explain in another issue: https://github.com/reiniscimurs/DRL-robot-navigation/issues/157#issuecomment-2373319930

Essentially the goal point is located too close to the obstacle so when the model evaluates the state-action pair, it will base that more on the experience of collisions with obstacles than reaching a goal. It would be entirely about Q-value for the goal placement and not so much the environment itself.

sattlelite commented 2 months ago

Hello Mr. Reiniscimurs, Thank you very much for your answer. I have reviewed (# 157) and it has helped me eliminate some erroneous thoughts. So how can I solve this problem? If the update position of the random target point is excluded from the obstacle by a certain distance in the ENV.py file, it seems to fundamentally solve this problem. So is there a way to force the robot to perform an external mandatory action under certain circumstances, such as designing a strategy that forces the robot to perform a straight-line motion and maintain a distance of time steps from the endpoint when the state value "state [x: x]" in the middle of the front is greater than the distance from the robot to the endpoint. Because I have tried training on a 13x13 open map and used this strategy in the hope of promoting training success and reducing time. I think robots may have difficulty converging at longer target distances due to exploration. But I have never succeeded in training from scratch on a new map.

reiniscimurs commented 2 months ago

Yes that would be one option: Remove (or make smaller) the dead zone around obstacles where goal points cannot be placed. And forcing a logic that checks if the model can just go straight to the goal would also make the robot go there. It would probably not solve the issue of training the model to purely execute such actions. What you could do is increasy the sampling rate of such situations where goal is close to the obstacles. This would allow to train on such scenarios more often. However, these are just ideas and I have not tried them out in practice.