Add `rnd_ppo.py` documentation and refactor

vwxyzjn commented 2 years ago

rnd_ppo.py is a bit dated, and I recommend refactoring it to match other PPO style, which would include:

[x] change the name from rnd_ppo.py to ppo_rnd.py
[x] use from gym.wrappers.normalize import RunningMeanStd instead of the implementing ourselves (note the implementation might be a bit different).
[x] create a make_env function like https://github.com/vwxyzjn/cleanrl/blob/0b3f8eae7d07b90a0ee129ffe290bd82e5b57a14/cleanrl/ppo_atari.py#L88-L103
[x] remove the visualization (i.e., ProbsVisualizationWrapper)
[x] use def get_value and def get_action_and_value for the Agent class
[x] remove https://github.com/vwxyzjn/cleanrl/blob/0b3f8eae7d07b90a0ee129ffe290bd82e5b57a14/cleanrl/rnd_ppo.py#L706-L708
[x] maybe log the average curiosity_reward instead? https://github.com/vwxyzjn/cleanrl/blob/0b3f8eae7d07b90a0ee129ffe290bd82e5b57a14/cleanrl/rnd_ppo.py#L848
[x] name total_reward_per_env to curiosity_return https://github.com/vwxyzjn/cleanrl/blob/0b3f8eae7d07b90a0ee129ffe290bd82e5b57a14/cleanrl/rnd_ppo.py#L854
[x] Add SPS (steps per second) metric.

Overall I suggest selecting ppo_atari.py and rnd_ppo.py and use Compare Selected on VSCode to see the file difference and minimize the file difference:

Types of changes

[ ] Bug fix
[ ] New feature
[ ] New algorithm
[x] Documentation

Checklist:

[x] I've read the CONTRIBUTION guide (required).
[x] I have ensured pre-commit run --all-files passes (required).
[x] I have updated the documentation accordingly.
[ ] I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments.

[x] I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
[ ] I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
[ ] I have updated the documentation and previewed the changes via mkdocs serve.
- [x] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [x] I have added links to the original paper and related papers (if applicable).
- [ ] I have added links to the PR related to the algorithm.
- [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves (in PNG format with width=500 and height=300).
- [ ] I have added links to the tracked experiments.
[ ] I have updated the tests accordingly (if applicable).

yooceii commented 2 years ago

Finally got a finished run and looks close to their blog's result

vwxyzjn commented 2 years ago

Oh wow, this is really nice! How long did the experiment take?

yooceii commented 2 years ago

Almost 11 days with envpool and 1080.

vwxyzjn commented 2 years ago

Oh wow that’s taking a really long time. I think given the insane amount of computing required, running it for three random seeds might not be necessary…

yooceii commented 2 years ago

Yeah, I also don't want to spend so much time running it lol.

vwxyzjn / cleanrl

Add `rnd_ppo.py` documentation and refactor #127

Types of changes

Checklist: