Open SerialIterator opened 5 years ago
I think i have this backwards huh? Workers are on CPU and policy updates are done on Gpu?
Workers are on CPU and policy updates are done on Gpu?
yes. The simulation is run on the cpu and then gradient updates are done on the gpu (every n_steps
for ppo2).
You should also be aware that a CUDA core is quite different from a CPU core.
Thanks. So workers are on CPU and gradient updates are on GPU. From what I've seen from fiddling with different environments so far is that the amount of cpu cores and ram is the limiting factor for how many workers you can run. The more workers, the more "exploration" you can do for a given time period. But, the larger the environment (stacked frame CnnLstm for example) would use far more ram which would then be the limiting factor. The Gpu should be more efficiently used with more workers as data transfer is slower but throughput is higher right? The limiting factor of a GPU would be its memory to hold all the rollouts and/or the amount of parameters for the policy?
Describe the question As far as I understand, when using a GPU,
SubprocVecEnv
runs multiple workers each running their own environment on a GPU and then updates the model when it has gathered all synchronous rollouts. When I setn_cpu = 8
I should expect 8 workers (envs) to be initialized and ran on GPU. I assume this would be parallelized to 8 CUDA cores. When I run a PPO2 model withn_cpu=8
I can see the GPU utilized then the CPU as if the rollouts are being pushed to the CPU to update the model. My GPU has thousands of CUDA cores but I appear to only get a training speed-up untiln_cpu=32
after that, the CPU seems like it's running for a very long time betweennupdates
. From what I can see, my CPU is unable to handle the amount of rollouts fromn_cpu= >32
? Am I correct that the workers are on GPU and model updates are done on CPU? That would mean that many CPU cores are necessary to handle the increase in workers created at a linear rate to CUDA cores?Code example
System Info Describe the characteristic of your environment:
Additional context Trying to figure out a rough calculation for maximum hardware performance of stable-baselines PPO2 so I can efficiently fire up a cloud instance without too much trial and error.