reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.
MIT License
486 stars 97 forks source link

Slow training speed and low GPU utilization #147

Open Sau1-Goodman opened 2 weeks ago

Sau1-Goodman commented 2 weeks ago

Hi, thank you for your work, it's amazing! I'm a student who just started DRL. I set up the simulation environment according to the tutorial and used your original program to train (by executing 'python3 train_velodyne_td3.py'). In RVIZ, I can see that the robot is running normally (just like the GIF image in the example). But the GPU usage is very low (power: 48W/170W, Memory-usage: 3074MiB/12050MiB), and the time between each epoch is also very long (about 10 minutes). My computer's CPU is AMD 5800X, GPU is RTX3060, nvidia driver is 470.256.02, and cudatoolkit 11.3.1 is installed in the anaconda environment. Execute 'torch.cuda.is_available()' in the python environment, and the output result is True. Is this training speed and GPU usage normal? Thank you very much for your answer! GPU_usage

reiniscimurs commented 2 weeks ago

Hi,

Cuda will mostly be used only during the train call of the model and memory consumption depends on your batch size/gradients/model size. There is no substantial reason why consumption should be high.

10 minutes between epochs seems reasonable. By default, epoch will run for approx. 5000 steps with step length 0.1 seconds. That means that 1 epoch will run for at least 8.3 minutes. So that makes sense to me.

Sau1-Goodman commented 2 weeks ago

Thank you very much for your response. I have two more queries that I'm hoping to clarify: 1、I'm unsure if it's feasible to make adjustments to some parameters, such as increasing the batch_size on line 119 of train_velodyne_td3.py , to enhance GPU utilization. This could potentially reduce training time and better leverage the computational power of the GPU. Do you think this approach is viable? Additionally, do you have any other suggestions to optimize this process? 2、Are the training results in the TD3 -> result & run folder? I'm trying to deploying these trained results onto an actual robot. Could you kindly offer me some tips on how to utilize these results? I would greatly appreciate any suggestions or pointers you might have. Many thanks!

reiniscimurs commented 2 weeks ago
  1. It would increase GPU consumption but only during the backpropagation. The average consumption will probably stay the same. It would not realistically speed up the training as most of the time is spent collecting samples/executing policy. See tutorial for details: https://medium.com/p/b744852345ac
  2. No, the weights are stored in pytorch_models, see description in each folder for what is stored in them. See test_velodyne_td3.py on how to load model weights. Deploying on real robot will depend entirely on the robot and sensors used, but you can adapt the env file with the proper topics once you have connected everything to ROS.
Sau1-Goodman commented 2 weeks ago

Thank you very much for your answer, I will work harder!