Open vwxyzjn opened 2 months ago
To fix the default installation, I tried removing # torchrl-nightly # tensordict-nightly
from requirements/requirements-envpool.txt
and then install from it.
Then I was able to run
pip install --upgrade --pre torch --index-url https://download.pytorch.org/whl/nightly/cu124
pip install tensordict-nightly
python leanrl/ppo_atari_envpool_torchcompile.py \
--seed 1 \
--total-timesteps 50000 \
--compile \
--cudagraphs
However still ran into that torch._dynamo.exc.TorchRuntimeError: Failed running call_function mylib.step.default
issue.
same issues here! were you able to solve them @vwxyzjn ?
I am able to run the code only if I comment this CustomOp Line. However, then the code runs at 1.8k fps with compile
and cudagraphs
instead of the reported 6.8k
These should be fixed by #4 (hopefully!) LMK if it isn't!
@vmoens we can install and run the code now without any problems. However, I am currently unable to get the fps reported in the README. I am getting 400fps for ppo (cleanRL) and 1900 for ppo (leanRL with compile and cudagraphs).
@vmoens we can install and run the code now without any problems. However, I am currently unable to get the fps reported in the README. I am getting 400fps for ppo (cleanRL) and 1900 for ppo (leanRL with compile and cudagraphs).
That's even a better speed up than the one reported no?
Not really, but something was wrong on my end. After rebooting my computers, I am able to reproduce the results in the readme (at least for the compiled+cudagraphs version). Concretely, running on a machine with 32 cores I am getting:
ppo_atari_envpool.py
-- 3.4k sps
ppo_atari_envpool_torchcompile.py
-- 6.1k sps
As you see I am able to reproduce your results (which are awesome!) but the baseline speed is also faster on my end, probably because the number of cores I am using, which envpool can actually make good use of them. Great work! :)
There seem to be some issues with getting the environment set up. I tried two installation methods.
Installation 1
One is to do
which seems to get stuck finding a
torchrl_nightly
versionInstallation 2
I tried building an environment from scratch, with two separate machines with different GPUs, all getting the same error as follows: