Slow training speed and low GPU utilization

reiniscimurs / DRL-robot-navigation

Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random goal point in a simulated environment while avoiding obstacles.

MIT License

635 stars 126 forks source link

Slow training speed and low GPU utilization #147

Open Sau1-Goodman opened 5 months ago

Sau1-Goodman commented 5 months ago

Hi, thank you for your work, it's amazing! I'm a student who just started DRL. I set up the simulation environment according to the tutorial and used your original program to train (by executing 'python3 train_velodyne_td3.py'). In RVIZ, I can see that the robot is running normally (just like the GIF image in the example). But the GPU usage is very low (power: 48W/170W, Memory-usage: 3074MiB/12050MiB), and the time between each epoch is also very long (about 10 minutes). My computer's CPU is AMD 5800X, GPU is RTX3060, nvidia driver is 470.256.02, and cudatoolkit 11.3.1 is installed in the anaconda environment. Execute 'torch.cuda.is_available()' in the python environment, and the output result is True. Is this training speed and GPU usage normal? Thank you very much for your answer! GPU_usage

reiniscimurs commented 5 months ago

Hi,

Cuda will mostly be used only during the train call of the model and memory consumption depends on your batch size/gradients/model size. There is no substantial reason why consumption should be high.

10 minutes between epochs seems reasonable. By default, epoch will run for approx. 5000 steps with step length 0.1 seconds. That means that 1 epoch will run for at least 8.3 minutes. So that makes sense to me.

Sau1-Goodman commented 5 months ago

Thank you very much for your response. I have two more queries that I'm hoping to clarify： 1、I'm unsure if it's feasible to make adjustments to some parameters, such as increasing the batch_size on line 119 of train_velodyne_td3.py , to enhance GPU utilization. This could potentially reduce training time and better leverage the computational power of the GPU. Do you think this approach is viable? Additionally, do you have any other suggestions to optimize this process? 2、Are the training results in the TD3 -> result & run folder? I'm trying to deploying these trained results onto an actual robot. Could you kindly offer me some tips on how to utilize these results? I would greatly appreciate any suggestions or pointers you might have. Many thanks!

reiniscimurs commented 5 months ago

It would increase GPU consumption but only during the backpropagation. The average consumption will probably stay the same. It would not realistically speed up the training as most of the time is spent collecting samples/executing policy. See tutorial for details: https://medium.com/p/b744852345ac
No, the weights are stored in pytorch_models, see description in each folder for what is stored in them. See test_velodyne_td3.py on how to load model weights. Deploying on real robot will depend entirely on the robot and sensors used, but you can adapt the env file with the proper topics once you have connected everything to ROS.

Sau1-Goodman commented 5 months ago

Thank you very much for your answer, I will work harder！