Closed vmoens closed 4 months ago
@wertyuilife2 can you confirm that this makes sense?
@vmoens I did some further search, found that many open-source libraries implementing PER maintain the historical maximum priority, such as dopamine.
But the original PER paper and some other source codes in my domain use the buffer's maximum priority approach, including EfficientZero, whose v2 is the SOTA data-efficient method in RL.
So, overall, both approaches make sense, and this is not a bug(my bad), I believe the buffer's maximum priority approach is more robust to the priority value.
I am raising the issue because I found in practice that during the early stages of training, when a transition is first time being sampled, its PER weight is typically 1e-8
(which is the value of epsilon) which is weird. This is because max_priority=1
and the bellman error I get in the early training stage is 0 (cause network output are init to 0 for stablility).
So, basically, this is more like an additional feature for better priority adaptation or similar to priority normalization. I think it is not an essential feature (if you find implementing it to be complicated), but it can be available as an option, I believe it can increase sample efficiency and training stability.
So should I make the erasing of the max_priority during extend optional?
yep, I think so. Maybe in some task the historical max will be better.
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2215
Note: Links to docs will display an error until the docs builds have been completed.
:heavy_exclamation_mark: 1 Active SEVs
There are 1 currently active SEVs. If your PR is affected, please view them below:
:x: 4 New Failures
As of commit 40f7b2d5f790dccfb224205b39d5a43420476379 with merge base 0813dc008aaaa25fb16af1bd76931350cf944237 ():
NEW FAILURES - The following jobs have failed:
* [Habitat Tests on Linux / tests (3.9, 12.1) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2215#25980127580) ([gh](https://github.com/pytorch/rl/actions/runs/9431457231/job/25980127580)) `RuntimeError: Command docker exec -t 198e6d02a3c72b3cb670cedd4984410f90fad8d15d59101565d21ebe1d4b159f /exec failed with exit code 139` * [Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2215#25980128774) ([gh](https://github.com/pytorch/rl/actions/runs/9431457238/job/25980128774)) `test/test_transforms.py::TestVecNorm::test_state_dict_vecnorm` * [Unit-tests on Linux / tests-optdeps (3.10, 12.1) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2215#25980128846) ([gh](https://github.com/pytorch/rl/actions/runs/9431457238/job/25980128846)) `RuntimeError: Command docker exec -t 1ecb9db5fc18483b992e0ad7ed5988866de2730aacf79a4fcb08c53b8843d7d2 /exec failed with exit code 1` * [Unit-tests on Windows / unittests-cpu / windows-job](https://hud.pytorch.org/pr/pytorch/rl/2215#25980127211) ([gh](https://github.com/pytorch/rl/actions/runs/9431457241/job/25980127211)) `The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128`
This comment was automatically generated by Dr. CI and updates every 15 minutes.