Open Sau1-Goodman opened 5 months ago
Hi,
Cuda will mostly be used only during the train
call of the model and memory consumption depends on your batch size/gradients/model size. There is no substantial reason why consumption should be high.
10 minutes between epochs seems reasonable. By default, epoch will run for approx. 5000 steps with step length 0.1 seconds. That means that 1 epoch will run for at least 8.3 minutes. So that makes sense to me.
Thank you very much for your response. I have two more queries that I'm hoping to clarify:
1、I'm unsure if it's feasible to make adjustments to some parameters, such as increasing the batch_size
on line 119 of train_velodyne_td3.py
, to enhance GPU utilization. This could potentially reduce training time and better leverage the computational power of the GPU. Do you think this approach is viable? Additionally, do you have any other suggestions to optimize this process?
2、Are the training results in the TD3
-> result
& run
folder? I'm trying to deploying these trained results onto an actual robot. Could you kindly offer me some tips on how to utilize these results? I would greatly appreciate any suggestions or pointers you might have.
Many thanks!
pytorch_models
, see description in each folder for what is stored in them. See test_velodyne_td3.py
on how to load model weights. Deploying on real robot will depend entirely on the robot and sensors used, but you can adapt the env
file with the proper topics once you have connected everything to ROS. Thank you very much for your answer, I will work harder!
Hi, thank you for your work, it's amazing! I'm a student who just started DRL. I set up the simulation environment according to the tutorial and used your original program to train (by executing 'python3 train_velodyne_td3.py'). In RVIZ, I can see that the robot is running normally (just like the GIF image in the example). But the GPU usage is very low (power: 48W/170W, Memory-usage: 3074MiB/12050MiB), and the time between each epoch is also very long (about 10 minutes). My computer's CPU is AMD 5800X, GPU is RTX3060, nvidia driver is 470.256.02, and cudatoolkit 11.3.1 is installed in the anaconda environment. Execute 'torch.cuda.is_available()' in the python environment, and the output result is True. Is this training speed and GPU usage normal? Thank you very much for your answer!