Closed vwxyzjn closed 2 years ago
Finally got a finished run and looks close to their blog's result
Oh wow, this is really nice! How long did the experiment take?
Almost 11 days with envpool and 1080.
Oh wow that’s taking a really long time. I think given the insane amount of computing required, running it for three random seeds might not be necessary…
Yeah, I also don't want to spend so much time running it lol.
rnd_ppo.py
is a bit dated, and I recommend refactoring it to match other PPO style, which would include:rnd_ppo.py
toppo_rnd.py
from gym.wrappers.normalize import RunningMeanStd
instead of the implementing ourselves (note the implementation might be a bit different).make_env
function like https://github.com/vwxyzjn/cleanrl/blob/0b3f8eae7d07b90a0ee129ffe290bd82e5b57a14/cleanrl/ppo_atari.py#L88-L103ProbsVisualizationWrapper
)def get_value
anddef get_action_and_value
for theAgent
classcuriosity_reward
instead? https://github.com/vwxyzjn/cleanrl/blob/0b3f8eae7d07b90a0ee129ffe290bd82e5b57a14/cleanrl/rnd_ppo.py#L848total_reward_per_env
tocuriosity_return
https://github.com/vwxyzjn/cleanrl/blob/0b3f8eae7d07b90a0ee129ffe290bd82e5b57a14/cleanrl/rnd_ppo.py#L854Overall I suggest selecting
ppo_atari.py
andrnd_ppo.py
and useCompare Selected
on VSCode to see the file difference and minimize the file difference:Types of changes
Checklist:
pre-commit run --all-files
passes (required).If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments.
--capture-video
flag toggled on (required).mkdocs serve
.width=500
andheight=300
).