pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.25k stars 297 forks source link

[BugFix] Fix max-priority update #2215

Closed vmoens closed 4 months ago

pytorch-bot[bot] commented 4 months ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2215

Note: Links to docs will display an error until the docs builds have been completed.

:heavy_exclamation_mark: 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

:x: 4 New Failures

As of commit 40f7b2d5f790dccfb224205b39d5a43420476379 with merge base 0813dc008aaaa25fb16af1bd76931350cf944237 (image):

NEW FAILURES - The following jobs have failed:

* [Habitat Tests on Linux / tests (3.9, 12.1) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2215#25980127580) ([gh](https://github.com/pytorch/rl/actions/runs/9431457231/job/25980127580)) `RuntimeError: Command docker exec -t 198e6d02a3c72b3cb670cedd4984410f90fad8d15d59101565d21ebe1d4b159f /exec failed with exit code 139` * [Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2215#25980128774) ([gh](https://github.com/pytorch/rl/actions/runs/9431457238/job/25980128774)) `test/test_transforms.py::TestVecNorm::test_state_dict_vecnorm` * [Unit-tests on Linux / tests-optdeps (3.10, 12.1) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2215#25980128846) ([gh](https://github.com/pytorch/rl/actions/runs/9431457238/job/25980128846)) `RuntimeError: Command docker exec -t 1ecb9db5fc18483b992e0ad7ed5988866de2730aacf79a4fcb08c53b8843d7d2 /exec failed with exit code 1` * [Unit-tests on Windows / unittests-cpu / windows-job](https://hud.pytorch.org/pr/pytorch/rl/2215#25980127211) ([gh](https://github.com/pytorch/rl/actions/runs/9431457241/job/25980127211)) `The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128`

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens commented 4 months ago

@wertyuilife2 can you confirm that this makes sense?

github-actions[bot] commented 4 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ----------------------------------------------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_single | 0.1043s | 58.8602ms | 16.9894 Ops/s | 17.6957 Ops/s | $\color{#d91a1a}-3.99\\%$ | | test_sync | 37.1610ms | 30.7970ms | 32.4707 Ops/s | 32.5672 Ops/s | $\color{#d91a1a}-0.30\\%$ | | test_async | 48.3588ms | 28.8045ms | 34.7168 Ops/s | 33.7761 Ops/s | $\color{#35bf28}+2.79\\%$ | | test_simple | 0.4605s | 0.3977s | 2.5144 Ops/s | 2.6398 Ops/s | $\color{#d91a1a}-4.75\\%$ | | test_transformed | 0.5367s | 0.5360s | 1.8657 Ops/s | 1.8770 Ops/s | $\color{#d91a1a}-0.60\\%$ | | test_serial | 1.3348s | 1.2833s | 0.7792 Ops/s | 0.7788 Ops/s | $\color{#35bf28}+0.05\\%$ | | test_parallel | 1.1384s | 1.0853s | 0.9214 Ops/s | 0.9214 Ops/s | $+0.00\\%$ | | test_step_mdp_speed[True-True-True-True-True] | 0.1160ms | 21.2276μs | 47.1084 KOps/s | 46.5276 KOps/s | $\color{#35bf28}+1.25\\%$ | | test_step_mdp_speed[True-True-True-True-False] | 39.8250μs | 12.8734μs | 77.6798 KOps/s | 76.1392 KOps/s | $\color{#35bf28}+2.02\\%$ | | test_step_mdp_speed[True-True-True-False-True] | 46.6070μs | 12.5464μs | 79.7044 KOps/s | 78.3356 KOps/s | $\color{#35bf28}+1.75\\%$ | | test_step_mdp_speed[True-True-True-False-False] | 28.9740μs | 7.6003μs | 131.5735 KOps/s | 128.7339 KOps/s | $\color{#35bf28}+2.21\\%$ | | test_step_mdp_speed[True-True-False-True-True] | 46.5370μs | 22.4981μs | 44.4482 KOps/s | 43.8488 KOps/s | $\color{#35bf28}+1.37\\%$ | | test_step_mdp_speed[True-True-False-True-False] | 69.5100μs | 14.1474μs | 70.6842 KOps/s | 69.3626 KOps/s | $\color{#35bf28}+1.91\\%$ | | test_step_mdp_speed[True-True-False-False-True] | 35.7170μs | 13.8797μs | 72.0474 KOps/s | 71.8321 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_step_mdp_speed[True-True-False-False-False] | 39.9420μs | 8.7764μs | 113.9417 KOps/s | 109.6297 KOps/s | $\color{#35bf28}+3.93\\%$ | | test_step_mdp_speed[True-False-True-True-True] | 50.8950μs | 24.1007μs | 41.4925 KOps/s | 40.9210 KOps/s | $\color{#35bf28}+1.40\\%$ | | test_step_mdp_speed[True-False-True-True-False] | 65.9630μs | 15.4680μs | 64.6494 KOps/s | 64.0287 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_step_mdp_speed[True-False-True-False-True] | 39.3730μs | 13.7240μs | 72.8650 KOps/s | 71.5930 KOps/s | $\color{#35bf28}+1.78\\%$ | | test_step_mdp_speed[True-False-True-False-False] | 36.8590μs | 8.7441μs | 114.3624 KOps/s | 111.6765 KOps/s | $\color{#35bf28}+2.41\\%$ | | test_step_mdp_speed[True-False-False-True-True] | 53.2990μs | 25.0813μs | 39.8703 KOps/s | 39.4315 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_step_mdp_speed[True-False-False-True-False] | 55.1830μs | 16.7056μs | 59.8600 KOps/s | 59.4001 KOps/s | $\color{#35bf28}+0.77\\%$ | | test_step_mdp_speed[True-False-False-False-True] | 34.7750μs | 14.9684μs | 66.8073 KOps/s | 65.2349 KOps/s | $\color{#35bf28}+2.41\\%$ | | test_step_mdp_speed[True-False-False-False-False] | 40.2550μs | 9.9874μs | 100.1260 KOps/s | 99.7417 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_step_mdp_speed[False-True-True-True-True] | 51.6860μs | 23.9312μs | 41.7864 KOps/s | 41.7730 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_step_mdp_speed[False-True-True-True-False] | 35.8170μs | 15.5180μs | 64.4414 KOps/s | 63.7693 KOps/s | $\color{#35bf28}+1.05\\%$ | | test_step_mdp_speed[False-True-True-False-True] | 47.1780μs | 16.0036μs | 62.4860 KOps/s | 62.1575 KOps/s | $\color{#35bf28}+0.53\\%$ | | test_step_mdp_speed[False-True-True-False-False] | 45.7350μs | 10.0420μs | 99.5819 KOps/s | 100.2382 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_step_mdp_speed[False-True-False-True-True] | 81.6520μs | 24.7959μs | 40.3292 KOps/s | 39.6193 KOps/s | $\color{#35bf28}+1.79\\%$ | | test_step_mdp_speed[False-True-False-True-False] | 37.5500μs | 16.7234μs | 59.7966 KOps/s | 59.6463 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_step_mdp_speed[False-True-False-False-True] | 41.3060μs | 17.2385μs | 58.0098 KOps/s | 58.3641 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_step_mdp_speed[False-True-False-False-False] | 32.9820μs | 11.2561μs | 88.8406 KOps/s | 89.0066 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_step_mdp_speed[False-False-True-True-True] | 58.1190μs | 26.4074μs | 37.8682 KOps/s | 37.6319 KOps/s | $\color{#35bf28}+0.63\\%$ | | test_step_mdp_speed[False-False-True-True-False] | 65.8530μs | 17.9931μs | 55.5767 KOps/s | 55.0455 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_step_mdp_speed[False-False-True-False-True] | 44.6330μs | 17.2401μs | 58.0042 KOps/s | 58.0825 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_step_mdp_speed[False-False-True-False-False] | 39.1730μs | 11.2879μs | 88.5904 KOps/s | 88.6854 KOps/s | $\color{#d91a1a}-0.11\\%$ | | test_step_mdp_speed[False-False-False-True-True] | 40.9660μs | 28.0060μs | 35.7067 KOps/s | 35.7087 KOps/s | $-0.01\\%$ | | test_step_mdp_speed[False-False-False-True-False] | 49.6520μs | 19.1233μs | 52.2923 KOps/s | 51.7268 KOps/s | $\color{#35bf28}+1.09\\%$ | | test_step_mdp_speed[False-False-False-False-True] | 48.4210μs | 18.3601μs | 54.4660 KOps/s | 55.1702 KOps/s | $\color{#d91a1a}-1.28\\%$ | | test_step_mdp_speed[False-False-False-False-False] | 34.7240μs | 12.2876μs | 81.3830 KOps/s | 82.1562 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_values[generalized_advantage_estimate-True-True] | 12.7881ms | 9.7993ms | 102.0479 Ops/s | 104.3926 Ops/s | $\color{#d91a1a}-2.25\\%$ | | test_values[vec_generalized_advantage_estimate-True-True] | 37.7328ms | 35.1124ms | 28.4800 Ops/s | 27.9398 Ops/s | $\color{#35bf28}+1.93\\%$ | | test_values[td0_return_estimate-False-False] | 0.2215ms | 0.1686ms | 5.9328 KOps/s | 5.8474 KOps/s | $\color{#35bf28}+1.46\\%$ | | test_values[td1_return_estimate-False-False] | 24.4535ms | 24.0948ms | 41.5028 Ops/s | 43.0704 Ops/s | $\color{#d91a1a}-3.64\\%$ | | test_values[vec_td1_return_estimate-False-False] | 49.8435ms | 35.7092ms | 28.0040 Ops/s | 28.4233 Ops/s | $\color{#d91a1a}-1.48\\%$ | | test_values[td_lambda_return_estimate-True-False] | 36.4228ms | 35.1292ms | 28.4663 Ops/s | 29.4509 Ops/s | $\color{#d91a1a}-3.34\\%$ | | test_values[vec_td_lambda_return_estimate-True-False] | 39.3531ms | 35.0850ms | 28.5022 Ops/s | 28.2981 Ops/s | $\color{#35bf28}+0.72\\%$ | | test_gae_speed[generalized_advantage_estimate-False-1-512] | 8.6676ms | 8.5767ms | 116.5947 Ops/s | 121.5702 Ops/s | $\color{#d91a1a}-4.09\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 2.5040ms | 2.0047ms | 498.8180 Ops/s | 491.3307 Ops/s | $\color{#35bf28}+1.52\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.5437ms | 0.3633ms | 2.7528 KOps/s | 2.8373 KOps/s | $\color{#d91a1a}-2.98\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 47.0912ms | 45.0314ms | 22.2067 Ops/s | 21.1452 Ops/s | $\textbf{\color{#35bf28}+5.02\\%}$ | | test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 3.7554ms | 3.0333ms | 329.6766 Ops/s | 330.0294 Ops/s | $\color{#d91a1a}-0.11\\%$ | | test_dqn_speed | 1.6946ms | 1.3564ms | 737.2303 Ops/s | 716.7874 Ops/s | $\color{#35bf28}+2.85\\%$ | | test_ddpg_speed | 3.8097ms | 2.9013ms | 344.6734 Ops/s | 343.1917 Ops/s | $\color{#35bf28}+0.43\\%$ | | test_sac_speed | 9.0322ms | 8.5376ms | 117.1293 Ops/s | 114.0446 Ops/s | $\color{#35bf28}+2.70\\%$ | | test_redq_speed | 15.2159ms | 13.4464ms | 74.3695 Ops/s | 74.0607 Ops/s | $\color{#35bf28}+0.42\\%$ | | test_redq_deprec_speed | 15.4307ms | 13.9122ms | 71.8795 Ops/s | 72.2616 Ops/s | $\color{#d91a1a}-0.53\\%$ | | test_td3_speed | 16.5480ms | 8.5518ms | 116.9342 Ops/s | 116.7568 Ops/s | $\color{#35bf28}+0.15\\%$ | | test_cql_speed | 37.7928ms | 36.8340ms | 27.1488 Ops/s | 27.1071 Ops/s | $\color{#35bf28}+0.15\\%$ | | test_a2c_speed | 8.6934ms | 7.6181ms | 131.2664 Ops/s | 130.9282 Ops/s | $\color{#35bf28}+0.26\\%$ | | test_ppo_speed | 9.6943ms | 7.9505ms | 125.7779 Ops/s | 128.8597 Ops/s | $\color{#d91a1a}-2.39\\%$ | | test_reinforce_speed | 7.7625ms | 6.7031ms | 149.1852 Ops/s | 149.8170 Ops/s | $\color{#d91a1a}-0.42\\%$ | | test_iql_speed | 34.0312ms | 33.2067ms | 30.1144 Ops/s | 30.2121 Ops/s | $\color{#d91a1a}-0.32\\%$ | | test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.3949ms | 3.5430ms | 282.2431 Ops/s | 285.0538 Ops/s | $\color{#d91a1a}-0.99\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.8825ms | 0.5050ms | 1.9804 KOps/s | 1.9775 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6163ms | 0.4758ms | 2.1018 KOps/s | 2.0805 KOps/s | $\color{#35bf28}+1.03\\%$ | | test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 4.9631ms | 3.5593ms | 280.9555 Ops/s | 283.8965 Ops/s | $\color{#d91a1a}-1.04\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.9706ms | 0.4924ms | 2.0310 KOps/s | 1.9989 KOps/s | $\color{#35bf28}+1.61\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6633ms | 0.4708ms | 2.1239 KOps/s | 2.1127 KOps/s | $\color{#35bf28}+0.53\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 2.0045ms | 1.7123ms | 584.0103 Ops/s | 581.9883 Ops/s | $\color{#35bf28}+0.35\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 2.3006ms | 1.6233ms | 616.0252 Ops/s | 610.1367 Ops/s | $\color{#35bf28}+0.97\\%$ | | test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 5.7005ms | 3.6661ms | 272.7678 Ops/s | 274.3353 Ops/s | $\color{#d91a1a}-0.57\\%$ | | test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.2082ms | 0.6191ms | 1.6152 KOps/s | 1.6091 KOps/s | $\color{#35bf28}+0.38\\%$ | | test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7831ms | 0.5947ms | 1.6814 KOps/s | 1.6827 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.2187ms | 3.5693ms | 280.1686 Ops/s | 282.1928 Ops/s | $\color{#d91a1a}-0.72\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.0418ms | 0.5114ms | 1.9555 KOps/s | 1.9904 KOps/s | $\color{#d91a1a}-1.76\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6347ms | 0.4776ms | 2.0937 KOps/s | 2.0867 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.8358ms | 3.4662ms | 288.4997 Ops/s | 289.4559 Ops/s | $\color{#d91a1a}-0.33\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7055ms | 0.5224ms | 1.9141 KOps/s | 1.9963 KOps/s | $\color{#d91a1a}-4.12\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 3.7849ms | 0.4994ms | 2.0025 KOps/s | 2.1062 KOps/s | $\color{#d91a1a}-4.92\\%$ | | test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 5.3947ms | 3.6576ms | 273.4053 Ops/s | 277.2647 Ops/s | $\color{#d91a1a}-1.39\\%$ | | test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.7862ms | 0.6199ms | 1.6133 KOps/s | 1.6006 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7625ms | 0.5975ms | 1.6737 KOps/s | 1.6558 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1011s | 7.5503ms | 132.4453 Ops/s | 169.5401 Ops/s | $\textbf{\color{#d91a1a}-21.88\\%}$ | | test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 14.6654ms | 12.5282ms | 79.8198 Ops/s | 68.7219 Ops/s | $\textbf{\color{#35bf28}+16.15\\%}$ | | test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 1.5349ms | 1.0512ms | 951.2659 Ops/s | 943.2838 Ops/s | $\color{#35bf28}+0.85\\%$ | | test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1001s | 5.6204ms | 177.9221 Ops/s | 177.9252 Ops/s | $-0.00\\%$ | | test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 15.0053ms | 12.5625ms | 79.6020 Ops/s | 78.7909 Ops/s | $\color{#35bf28}+1.03\\%$ | | test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 1.5962ms | 1.0469ms | 955.2409 Ops/s | 934.4257 Ops/s | $\color{#35bf28}+2.23\\%$ | | test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 99.7729ms | 5.7405ms | 174.2019 Ops/s | 123.1605 Ops/s | $\textbf{\color{#35bf28}+41.44\\%}$ | | test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 0.1086s | 14.7651ms | 67.7273 Ops/s | 77.9960 Ops/s | $\textbf{\color{#d91a1a}-13.17\\%}$ | | test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 1.8096ms | 1.1901ms | 840.2485 Ops/s | 828.8785 Ops/s | $\color{#35bf28}+1.37\\%$ |
github-actions[bot] commented 4 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ----------------------------------------------------------------------------------------- | --------- | --------- | -------------- | ------------------ | ----------------------------------- | | test_single | 0.1182s | 0.1178s | 8.4898 Ops/s | 8.4326 Ops/s | $\color{#35bf28}+0.68\\%$ | | test_sync | 0.1063s | 0.1056s | 9.4653 Ops/s | 9.5627 Ops/s | $\color{#d91a1a}-1.02\\%$ | | test_async | 0.1962s | 98.6730ms | 10.1345 Ops/s | 10.1655 Ops/s | $\color{#d91a1a}-0.31\\%$ | | test_single_pixels | 0.1295s | 0.1283s | 7.7964 Ops/s | 7.8389 Ops/s | $\color{#d91a1a}-0.54\\%$ | | test_sync_pixels | 83.8544ms | 80.1277ms | 12.4801 Ops/s | 12.3994 Ops/s | $\color{#35bf28}+0.65\\%$ | | test_async_pixels | 0.1595s | 69.5722ms | 14.3736 Ops/s | 14.6743 Ops/s | $\color{#d91a1a}-2.05\\%$ | | test_simple | 0.8963s | 0.8350s | 1.1976 Ops/s | 1.2244 Ops/s | $\color{#d91a1a}-2.19\\%$ | | test_transformed | 1.1391s | 1.0904s | 0.9171 Ops/s | 0.9344 Ops/s | $\color{#d91a1a}-1.85\\%$ | | test_serial | 2.5624s | 2.5099s | 0.3984 Ops/s | 0.4038 Ops/s | $\color{#d91a1a}-1.33\\%$ | | test_parallel | 2.4315s | 2.3633s | 0.4231 Ops/s | 0.4278 Ops/s | $\color{#d91a1a}-1.09\\%$ | | test_step_mdp_speed[True-True-True-True-True] | 0.1576ms | 33.6553μs | 29.7130 KOps/s | 28.4457 KOps/s | $\color{#35bf28}+4.46\\%$ | | test_step_mdp_speed[True-True-True-True-False] | 45.3810μs | 19.6828μs | 50.8058 KOps/s | 48.8177 KOps/s | $\color{#35bf28}+4.07\\%$ | | test_step_mdp_speed[True-True-True-False-True] | 0.1397ms | 19.2062μs | 52.0664 KOps/s | 49.9615 KOps/s | $\color{#35bf28}+4.21\\%$ | | test_step_mdp_speed[True-True-True-False-False] | 0.1345ms | 11.5670μs | 86.4528 KOps/s | 86.4352 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_step_mdp_speed[True-True-False-True-True] | 92.1620μs | 35.8353μs | 27.9054 KOps/s | 27.0577 KOps/s | $\color{#35bf28}+3.13\\%$ | | test_step_mdp_speed[True-True-False-True-False] | 46.8910μs | 22.1102μs | 45.2279 KOps/s | 45.3198 KOps/s | $\color{#d91a1a}-0.20\\%$ | | test_step_mdp_speed[True-True-False-False-True] | 52.4820μs | 21.2134μs | 47.1400 KOps/s | 45.5795 KOps/s | $\color{#35bf28}+3.42\\%$ | | test_step_mdp_speed[True-True-False-False-False] | 29.0810μs | 13.3718μs | 74.7841 KOps/s | 74.0719 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_step_mdp_speed[True-False-True-True-True] | 62.5710μs | 37.3409μs | 26.7803 KOps/s | 26.0053 KOps/s | $\color{#35bf28}+2.98\\%$ | | test_step_mdp_speed[True-False-True-True-False] | 0.2080ms | 23.7885μs | 42.0372 KOps/s | 42.0319 KOps/s | $\color{#35bf28}+0.01\\%$ | | test_step_mdp_speed[True-False-True-False-True] | 0.1941ms | 21.4292μs | 46.6654 KOps/s | 47.1253 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_step_mdp_speed[True-False-True-False-False] | 0.1257ms | 13.3476μs | 74.9200 KOps/s | 74.2369 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_step_mdp_speed[True-False-False-True-True] | 0.2221ms | 39.9251μs | 25.0469 KOps/s | 24.7500 KOps/s | $\color{#35bf28}+1.20\\%$ | | test_step_mdp_speed[True-False-False-True-False] | 43.7110μs | 25.9037μs | 38.6046 KOps/s | 38.5737 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_step_mdp_speed[True-False-False-False-True] | 0.1164ms | 23.2000μs | 43.1034 KOps/s | 42.5682 KOps/s | $\color{#35bf28}+1.26\\%$ | | test_step_mdp_speed[True-False-False-False-False] | 70.7920μs | 15.2094μs | 65.7490 KOps/s | 65.0622 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_step_mdp_speed[False-True-True-True-True] | 0.1174ms | 38.1813μs | 26.1908 KOps/s | 25.8177 KOps/s | $\color{#35bf28}+1.45\\%$ | | test_step_mdp_speed[False-True-True-True-False] | 43.3110μs | 23.7328μs | 42.1358 KOps/s | 40.7886 KOps/s | $\color{#35bf28}+3.30\\%$ | | test_step_mdp_speed[False-True-True-False-True] | 74.6810μs | 25.1655μs | 39.7369 KOps/s | 39.1705 KOps/s | $\color{#35bf28}+1.45\\%$ | | test_step_mdp_speed[False-True-True-False-False] | 0.1462ms | 15.2698μs | 65.4889 KOps/s | 62.7860 KOps/s | $\color{#35bf28}+4.30\\%$ | | test_step_mdp_speed[False-True-False-True-True] | 64.7910μs | 39.7635μs | 25.1487 KOps/s | 25.7441 KOps/s | $\color{#d91a1a}-2.31\\%$ | | test_step_mdp_speed[False-True-False-True-False] | 0.1003ms | 25.4460μs | 39.2989 KOps/s | 37.8416 KOps/s | $\color{#35bf28}+3.85\\%$ | | test_step_mdp_speed[False-True-False-False-True] | 0.2215ms | 26.8060μs | 37.3050 KOps/s | 36.5015 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_step_mdp_speed[False-True-False-False-False] | 0.1940ms | 17.0718μs | 58.5760 KOps/s | 56.9456 KOps/s | $\color{#35bf28}+2.86\\%$ | | test_step_mdp_speed[False-False-True-True-True] | 0.2120ms | 41.5007μs | 24.0960 KOps/s | 23.3437 KOps/s | $\color{#35bf28}+3.22\\%$ | | test_step_mdp_speed[False-False-True-True-False] | 50.5410μs | 27.7592μs | 36.0241 KOps/s | 35.1016 KOps/s | $\color{#35bf28}+2.63\\%$ | | test_step_mdp_speed[False-False-True-False-True] | 58.1710μs | 27.0750μs | 36.9344 KOps/s | 36.7259 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_step_mdp_speed[False-False-True-False-False] | 40.7010μs | 17.0810μs | 58.5444 KOps/s | 56.0468 KOps/s | $\color{#35bf28}+4.46\\%$ | | test_step_mdp_speed[False-False-False-True-True] | 64.1320μs | 43.9558μs | 22.7501 KOps/s | 22.3458 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_step_mdp_speed[False-False-False-True-False] | 69.6110μs | 30.1445μs | 33.1735 KOps/s | 33.3447 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_step_mdp_speed[False-False-False-False-True] | 48.5210μs | 28.8129μs | 34.7067 KOps/s | 34.5105 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_step_mdp_speed[False-False-False-False-False] | 0.1100ms | 19.1191μs | 52.3037 KOps/s | 51.0965 KOps/s | $\color{#35bf28}+2.36\\%$ | | test_values[generalized_advantage_estimate-True-True] | 25.3792ms | 24.2590ms | 41.2219 Ops/s | 42.0674 Ops/s | $\color{#d91a1a}-2.01\\%$ | | test_values[vec_generalized_advantage_estimate-True-True] | 88.8726ms | 2.6679ms | 374.8328 Ops/s | 363.8501 Ops/s | $\color{#35bf28}+3.02\\%$ | | test_values[td0_return_estimate-False-False] | 93.9320μs | 67.3642μs | 14.8447 KOps/s | 15.1845 KOps/s | $\color{#d91a1a}-2.24\\%$ | | test_values[td1_return_estimate-False-False] | 54.1823ms | 53.1385ms | 18.8187 Ops/s | 18.8414 Ops/s | $\color{#d91a1a}-0.12\\%$ | | test_values[vec_td1_return_estimate-False-False] | 1.3892ms | 1.0717ms | 933.0713 Ops/s | 938.0656 Ops/s | $\color{#d91a1a}-0.53\\%$ | | test_values[td_lambda_return_estimate-True-False] | 88.0548ms | 84.8988ms | 11.7787 Ops/s | 11.7837 Ops/s | $\color{#d91a1a}-0.04\\%$ | | test_values[vec_td_lambda_return_estimate-True-False] | 1.3558ms | 1.0651ms | 938.8776 Ops/s | 942.9374 Ops/s | $\color{#d91a1a}-0.43\\%$ | | test_gae_speed[generalized_advantage_estimate-False-1-512] | 26.3019ms | 24.8714ms | 40.2068 Ops/s | 38.9117 Ops/s | $\color{#35bf28}+3.33\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 0.9590ms | 0.7061ms | 1.4162 KOps/s | 1.4160 KOps/s | $+0.01\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.8158ms | 0.6576ms | 1.5207 KOps/s | 1.5453 KOps/s | $\color{#d91a1a}-1.59\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.7308ms | 1.4580ms | 685.8562 Ops/s | 686.3101 Ops/s | $\color{#d91a1a}-0.07\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.8430ms | 0.6692ms | 1.4944 KOps/s | 1.5059 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_dqn_speed | 1.8264ms | 1.4540ms | 687.7703 Ops/s | 693.8680 Ops/s | $\color{#d91a1a}-0.88\\%$ | | test_ddpg_speed | 3.2972ms | 3.0095ms | 332.2797 Ops/s | 335.5814 Ops/s | $\color{#d91a1a}-0.98\\%$ | | test_sac_speed | 9.2288ms | 8.5807ms | 116.5407 Ops/s | 118.0311 Ops/s | $\color{#d91a1a}-1.26\\%$ | | test_redq_speed | 12.5131ms | 10.9948ms | 90.9525 Ops/s | 83.7090 Ops/s | $\textbf{\color{#35bf28}+8.65\\%}$ | | test_redq_deprec_speed | 12.9905ms | 12.2703ms | 81.4975 Ops/s | 86.4076 Ops/s | $\textbf{\color{#d91a1a}-5.68\\%}$ | | test_td3_speed | 17.6180ms | 8.5166ms | 117.4182 Ops/s | 121.3378 Ops/s | $\color{#d91a1a}-3.23\\%$ | | test_cql_speed | 28.5733ms | 26.7734ms | 37.3506 Ops/s | 37.9203 Ops/s | $\color{#d91a1a}-1.50\\%$ | | test_a2c_speed | 6.5701ms | 5.8716ms | 170.3112 Ops/s | 174.3038 Ops/s | $\color{#d91a1a}-2.29\\%$ | | test_ppo_speed | 6.6060ms | 6.1406ms | 162.8506 Ops/s | 165.9254 Ops/s | $\color{#d91a1a}-1.85\\%$ | | test_reinforce_speed | 5.2094ms | 4.8890ms | 204.5416 Ops/s | 207.8513 Ops/s | $\color{#d91a1a}-1.59\\%$ | | test_iql_speed | 21.4055ms | 20.5310ms | 48.7069 Ops/s | 48.1808 Ops/s | $\color{#35bf28}+1.09\\%$ | | test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.0514ms | 4.6413ms | 215.4564 Ops/s | 213.3100 Ops/s | $\color{#35bf28}+1.01\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.8181ms | 0.6116ms | 1.6350 KOps/s | 1.6142 KOps/s | $\color{#35bf28}+1.29\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 4.9341ms | 0.6007ms | 1.6649 KOps/s | 1.6730 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.0360ms | 4.6613ms | 214.5302 Ops/s | 216.7009 Ops/s | $\color{#d91a1a}-1.00\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7787ms | 0.6110ms | 1.6368 KOps/s | 1.6267 KOps/s | $\color{#35bf28}+0.62\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 4.9007ms | 0.5965ms | 1.6763 KOps/s | 1.6605 KOps/s | $\color{#35bf28}+0.95\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 2.4486ms | 2.1486ms | 465.4113 Ops/s | 477.6289 Ops/s | $\color{#d91a1a}-2.56\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 6.5543ms | 2.0827ms | 480.1438 Ops/s | 495.9933 Ops/s | $\color{#d91a1a}-3.20\\%$ | | test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 5.2402ms | 4.8140ms | 207.7254 Ops/s | 208.4852 Ops/s | $\color{#d91a1a}-0.36\\%$ | | test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.8062ms | 0.7497ms | 1.3339 KOps/s | 1.3405 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.9395ms | 0.7269ms | 1.3756 KOps/s | 1.3546 KOps/s | $\color{#35bf28}+1.56\\%$ | | test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.0441ms | 4.6706ms | 214.1048 Ops/s | 212.6052 Ops/s | $\color{#35bf28}+0.71\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.3934ms | 0.6137ms | 1.6294 KOps/s | 1.6222 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7980ms | 0.5915ms | 1.6907 KOps/s | 1.6956 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.2211ms | 4.6894ms | 213.2476 Ops/s | 216.3348 Ops/s | $\color{#d91a1a}-1.43\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.8002ms | 0.5991ms | 1.6693 KOps/s | 1.6561 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 4.7518ms | 0.5873ms | 1.7026 KOps/s | 1.2774 KOps/s | $\textbf{\color{#35bf28}+33.29\\%}$ | | test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 5.1273ms | 4.8011ms | 208.2839 Ops/s | 210.5258 Ops/s | $\color{#d91a1a}-1.06\\%$ | | test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.6579ms | 0.7386ms | 1.3539 KOps/s | 1.3666 KOps/s | $\color{#d91a1a}-0.93\\%$ | | test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.9390ms | 0.7195ms | 1.3899 KOps/s | 1.4062 KOps/s | $\color{#d91a1a}-1.16\\%$ | | test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1282s | 7.3544ms | 135.9732 Ops/s | 136.9164 Ops/s | $\color{#d91a1a}-0.69\\%$ | | test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 18.8038ms | 15.8194ms | 63.2137 Ops/s | 54.8889 Ops/s | $\textbf{\color{#35bf28}+15.17\\%}$ | | test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 1.6766ms | 1.3353ms | 748.8747 Ops/s | 770.2602 Ops/s | $\color{#d91a1a}-2.78\\%$ | | test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1205s | 9.5155ms | 105.0918 Ops/s | 139.3734 Ops/s | $\textbf{\color{#d91a1a}-24.60\\%}$ | | test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 18.5375ms | 15.8210ms | 63.2072 Ops/s | 62.9612 Ops/s | $\color{#35bf28}+0.39\\%$ | | test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 7.8991ms | 1.4568ms | 686.4375 Ops/s | 754.1752 Ops/s | $\textbf{\color{#d91a1a}-8.98\\%}$ | | test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1209s | 7.3746ms | 135.6005 Ops/s | 136.7233 Ops/s | $\color{#d91a1a}-0.82\\%$ | | test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 18.3290ms | 15.9232ms | 62.8013 Ops/s | 63.1243 Ops/s | $\color{#d91a1a}-0.51\\%$ | | test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 2.6162ms | 1.5173ms | 659.0750 Ops/s | 632.0176 Ops/s | $\color{#35bf28}+4.28\\%$ |
wertyuilife2 commented 4 months ago

@vmoens I did some further search, found that many open-source libraries implementing PER maintain the historical maximum priority, such as dopamine.

But the original PER paper and some other source codes in my domain use the buffer's maximum priority approach, including EfficientZero, whose v2 is the SOTA data-efficient method in RL.

So, overall, both approaches make sense, and this is not a bug(my bad), I believe the buffer's maximum priority approach is more robust to the priority value.

wertyuilife2 commented 4 months ago

I am raising the issue because I found in practice that during the early stages of training, when a transition is first time being sampled, its PER weight is typically 1e-8 (which is the value of epsilon) which is weird. This is because max_priority=1 and the bellman error I get in the early training stage is 0 (cause network output are init to 0 for stablility).

So, basically, this is more like an additional feature for better priority adaptation or similar to priority normalization. I think it is not an essential feature (if you find implementing it to be complicated), but it can be available as an option, I believe it can increase sample efficiency and training stability.

vmoens commented 4 months ago

So should I make the erasing of the max_priority during extend optional?

wertyuilife2 commented 4 months ago

yep, I think so. Maybe in some task the historical max will be better.