Open ryanjulian opened 5 years ago
The issue you linked from pytorch was fixed quite a while ago. I think if garage's pytorch is slower than TF then most likely it has something to do with our implementation.
@lywong92 Since you added pytorch DDPG and PPO, I want to know if you have any observation on performance against TF? So we can know if this has something to do with pytorch in general or just TRPO itself.
The issue you linked from pytorch was fixed quite a while ago. I think if garage's pytorch is slower than TF then most likely it has something to do with our implementation.
@lywong92 Since you added pytorch DDPG and PPO, I want to know if you have any observation on performance against TF? So we can know if this has something to do with pytorch in general or just TRPO itself.
I didn't pay too much attention on the actual time it took to run DDPG in torch vs tf. Are we comparing the total time it takes to run the algorithms with the same parameters here?
This might be unrelated, but we found big performance differences between using llvm-openmp
vs intel-openmp
. Weirdly enough, this is observed even when we use GPU for both forward and backward pass.
Some strange dependency issue in conda is causing this (e.g. the newest version of libgcc runtime depends on package _openmp_mutex
which brings along the llvm openmp runtime instead of the Intel one). Worth checking which OpenMP implementation you're using.
I've encountered issues with the run time of pytorch on CPU before which have been improved by artificially limiting the number of threads utilized with a call such as torch.set_num_threads(4)
- I am not sure why exactly, but it seems that pytorch sometimes will incorrectly utilize the number of threads.
See https://github.com/pytorch/pytorch/issues/975 for more info
PyTorch TRPO appears 50% slower than TF. Not sure about PPO, but I expect the wall-clock time gap will be the same.
To fix this issue, make PyTorch perform at least as well as TF, or confirm that we've done the best we can on CPU with PyTorch.