Closed penghao-wu closed 2 years ago
10,000,000 / 12288 * 6 = 4883 min = 3.4 day
By the way, the RL training is not a GPU-demanding task, running single CARLA server on multiple GPUs is less efficient due to the synchronization overhead. For RL training we run 6 CARLA servers on 1 GPU, each epoch (12288 steps) takes roughly 10 min on the aws g4 instance equipped with single NVIDIA T4 GPU.
Thank you very much. I misunderstood the "total_timesteps: 1e8" in the training config, the actual total steps needed are only 1e7 steps instead of 1e8 steps.
Hi, thank you for sharing your excellent work! I want to train your RL agent with a 6 1080Ti (6 Carla server on 6 different GPU) 56 cores machine. But it seems that it takes around 5-6 min for 12288 steps so the total 10M steps will take around 50 days which is not acceptable. Do you know what the possible reason is or how to improve the speed? Thank you!