Closed Howuhh closed 2 years ago
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Updated |
---|---|---|---|
cleanrl | ✅ Ready (Inspect) | Visit Preview | Jul 6, 2022 at 7:59PM (UTC) |
So here is the tricky part - the original implementation actually uses 0.999
for gamma, but 0.99
for the normalization wrapper. See https://github.com/openai/train-procgen/blob/1a2ae2194a61f76a733a39339530401c024c3ad8/train_procgen/train.py#L43
This would cause a performance change unfortunately. There are two ways to go forward
gym.wrappers.NormalizeReward(envs, gamma=args.gamma)
. https://github.com/vwxyzjn/cleanrl/blob/6387191dbee74927b2872b2eb1759c72361d806f/benchmark/ppo.sh#L39-L44
https://github.com/vwxyzjn/cleanrl/blob/6387191dbee74927b2872b2eb1759c72361d806f/benchmark/ppg.sh#L3-L8@Howuhh what do you think we should do?
@vwxyzjn To be honest, I think this is a bug in original code, not a feature, so it will be more accurate to rerun for correct results. However, procgen is image based env and for now I don't have resources to train on images.
Ok, no worries. I will take care from here. @Dipamc77 I don't have the GPU memory to run the PPG experiments. Would you mind running them with this PR? I can take care of the ppo procgen experiments.
Running the PPO experiments now. Also tried a fun thing by adding a wandb tag like
WANDB_TAGS=$(git describe --tags) xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids starpilot bossfight bigfish \
--command "poetry run python cleanrl/ppo_procgen.py --track --capture-video" \
--num-seeds 3 \
--workers 1
which produces runs like
@dosssman I think this tagging system could somehow help us phase out past openrlbenchmark experiments without deleting them. I will have to think about the workflow a bit more.
Following up here
The bigfish performance degradation could easily be due to a random seed.
No major performance regression. Going to document this change and merge.
I have just updated all of the experiments and documentation. @Howuhh @dosssman could you give it a pass, please? Thank you!
@vwxyzjn Seems okay to me. Thanks for redoing the experiments btw.
Description
Fixes incorrect gamma in reward normalization wrapper for non-default gamma's. See https://github.com/vwxyzjn/cleanrl/issues/203.
Types of changes
Checklist:
pre-commit run --all-files
passes (required).If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.
--capture-video
flag toggled on (required).mkdocs serve
.width=500
andheight=300
).