[BugFix] Fix strict length in PRB+SliceSampler

vmoens commented 1 month ago

I wrote dedicated tests under test_slice_sampler_prioritized

TODO:

[x] Check caching
[x] Add left-right span options
[ ] ~Aggregate/reduce priorities~

pytorch-bot[bot] commented 1 month ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2202

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:x: 12 New Failures, 5 Unrelated Failures

As of commit caa258f9e5c8d22571d49bcdddbbe68b81d353d4 with merge base 726e95955009c73dc0242424182222e59a9056d7 ():

NEW FAILURES - The following jobs have failed:

* [Habitat Tests on Linux / tests (3.9, 12.1) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2202#25940351256) ([gh](https://github.com/pytorch/rl/actions/runs/9416665747/job/25940351256)) `RuntimeError: Command docker exec -t bd9b40a89edf1030bfb9fa47d73d81cd2323326f53d7ab9a303f5d9b850d2d11 /exec failed with exit code 1` * [Libs Tests on Linux / unittests-gym (3.9, 12.1) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2202#25940293828) ([gh](https://github.com/pytorch/rl/actions/runs/9416665746/job/25940293828)) `RuntimeError: Command docker exec -t 385f35062da772aa27699b3a16718b96fe1d99318dcc6cc84d6f6955b3c1a1e2 /exec failed with exit code 1` * [Libs Tests on Linux / unittests-sklearn (3.9, 12.1) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2202#25940294029) ([gh](https://github.com/pytorch/rl/actions/runs/9416665746/job/25940294029)) `RuntimeError: Command docker exec -t 2ce6862202943d37144a99377297a686139fead1154668847367d9096482dd23 /exec failed with exit code 1` * [RLHF Tests on Linux / unittests (3.9, 12.1) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2202#25940307777) ([gh](https://github.com/pytorch/rl/actions/runs/9416665750/job/25940307777)) `RuntimeError: Command docker exec -t 659b77bfcaa7450ba47a9aca3dcd6d5226f3fa4a21385f40e89dc76fab1ee83e /exec failed with exit code 1` * [Unit-tests on Linux / tests-optdeps (3.10, 12.1) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2202#25940358798) ([gh](https://github.com/pytorch/rl/actions/runs/9416665739/job/25940358798)) `RuntimeError: Command docker exec -t f456252507690dea75b915ee7addfc29e39362dd61da55c82a7fc37506654739 /exec failed with exit code 1` * [Unit-tests on Windows / unittests-cpu / windows-job](https://hud.pytorch.org/pr/pytorch/rl/2202#25940269101) ([gh](https://github.com/pytorch/rl/actions/runs/9416665748/job/25940269101)) `The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128` * [Wheels / test-wheel (linux, ubuntu-20.04, 3.10)](https://hud.pytorch.org/pr/pytorch/rl/2202#25940348592) ([gh](https://github.com/pytorch/rl/actions/runs/9416665718/job/25940348592)) * [Wheels / test-wheel (linux, ubuntu-20.04, 3.11)](https://hud.pytorch.org/pr/pytorch/rl/2202#25940348824) ([gh](https://github.com/pytorch/rl/actions/runs/9416665718/job/25940348824)) `##[error]The operation was canceled.` * [Wheels / test-wheel (linux, ubuntu-20.04, 3.8)](https://hud.pytorch.org/pr/pytorch/rl/2202#25940347960) ([gh](https://github.com/pytorch/rl/actions/runs/9416665718/job/25940347960)) `##[error]The operation was canceled.` * [Wheels / test-wheel (linux, ubuntu-20.04, 3.9)](https://hud.pytorch.org/pr/pytorch/rl/2202#25940348242) ([gh](https://github.com/pytorch/rl/actions/runs/9416665718/job/25940348242)) `ModuleNotFoundError: No module named 'dm_env'` * [Wheels / test-wheel-windows (3.11)](https://hud.pytorch.org/pr/pytorch/rl/2202#25940447162) ([gh](https://github.com/pytorch/rl/actions/runs/9416665718/job/25940447162)) `ModuleNotFoundError: No module named 'dm_env'` * [Wheels / test-wheel-windows (3.8)](https://hud.pytorch.org/pr/pytorch/rl/2202#25940446452) ([gh](https://github.com/pytorch/rl/actions/runs/9416665718/job/25940446452))

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

* [Lint / c-source / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2202#25940268266) ([gh](https://github.com/pytorch/rl/actions/runs/9416665743/job/25940268266)) (matched **linux** rule in [flaky-rules.json](https://github.com/pytorch/test-infra/blob/generated-stats/stats/flaky-rules.json)) `The process '/usr/bin/git' failed with exit code 128` * [Lint / python-source-and-configs / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2202#25940267980) ([gh](https://github.com/pytorch/rl/actions/runs/9416665743/job/25940267980)) (matched **linux** rule in [flaky-rules.json](https://github.com/pytorch/test-infra/blob/generated-stats/stats/flaky-rules.json)) `The process '/usr/bin/git' failed with exit code 128` * [Wheels / test-wheel-windows (3.10)](https://hud.pytorch.org/pr/pytorch/rl/2202#25940446922) ([gh](https://github.com/pytorch/rl/actions/runs/9416665718/job/25940446922)) (matched **win** rule in [flaky-rules.json](https://github.com/pytorch/test-infra/blob/generated-stats/stats/flaky-rules.json)) `##[error]The operation was canceled.` * [Wheels / test-wheel-windows (3.9)](https://hud.pytorch.org/pr/pytorch/rl/2202#25940446686) ([gh](https://github.com/pytorch/rl/actions/runs/9416665718/job/25940446686)) (matched **win** rule in [flaky-rules.json](https://github.com/pytorch/test-infra/blob/generated-stats/stats/flaky-rules.json)) `##[error]The operation was canceled.`

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

* [Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job](https://hud.pytorch.org/pr/pytorch/rl/2202#25940359083) ([gh](https://github.com/pytorch/rl/actions/runs/9416665739/job/25940359083)) ([trunk failure](https://hud.pytorch.org/pytorch/rl/commit/726e95955009c73dc0242424182222e59a9056d7#25936887556)) `test/test_transforms.py::TestVecNorm::test_state_dict_vecnorm`

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results

| Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ----------------------------------------------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_single | 0.1208s | 60.2936ms | 16.5855 Ops/s | 17.5085 Ops/s | $\textbf{\color{#d91a1a}-5.27\\%}$ | | test_sync | 32.9541ms | 31.4586ms | 31.7878 Ops/s | 30.9688 Ops/s | $\color{#35bf28}+2.64\\%$ | | test_async | 51.6862ms | 28.7645ms | 34.7650 Ops/s | 32.7835 Ops/s | $\textbf{\color{#35bf28}+6.04\\%}$ | | test_simple | 0.4663s | 0.3988s | 2.5075 Ops/s | 2.6015 Ops/s | $\color{#d91a1a}-3.61\\%$ | | test_transformed | 0.5343s | 0.5330s | 1.8763 Ops/s | 1.8496 Ops/s | $\color{#35bf28}+1.45\\%$ | | test_serial | 1.3462s | 1.2768s | 0.7832 Ops/s | 0.7638 Ops/s | $\color{#35bf28}+2.54\\%$ | | test_parallel | 1.1475s | 1.0701s | 0.9345 Ops/s | 0.9178 Ops/s | $\color{#35bf28}+1.82\\%$ | | test_step_mdp_speed[True-True-True-True-True] | 0.1519ms | 21.9332μs | 45.5930 KOps/s | 44.9654 KOps/s | $\color{#35bf28}+1.40\\%$ | | test_step_mdp_speed[True-True-True-True-False] | 43.7410μs | 13.2627μs | 75.3995 KOps/s | 72.9356 KOps/s | $\color{#35bf28}+3.38\\%$ | | test_step_mdp_speed[True-True-True-False-True] | 50.7950μs | 12.8951μs | 77.5489 KOps/s | 75.8634 KOps/s | $\color{#35bf28}+2.22\\%$ | | test_step_mdp_speed[True-True-True-False-False] | 28.5830μs | 7.7329μs | 129.3171 KOps/s | 124.0830 KOps/s | $\color{#35bf28}+4.22\\%$ | | test_step_mdp_speed[True-True-False-True-True] | 50.2040μs | 23.1766μs | 43.1469 KOps/s | 41.9803 KOps/s | $\color{#35bf28}+2.78\\%$ | | test_step_mdp_speed[True-True-False-True-False] | 42.7900μs | 14.5901μs | 68.5395 KOps/s | 66.8154 KOps/s | $\color{#35bf28}+2.58\\%$ | | test_step_mdp_speed[True-True-False-False-True] | 61.5250μs | 14.1012μs | 70.9159 KOps/s | 70.2381 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_step_mdp_speed[True-True-False-False-False] | 0.1097ms | 9.1591μs | 109.1805 KOps/s | 109.1860 KOps/s | $-0.01\\%$ | | test_step_mdp_speed[True-False-True-True-True] | 77.8150μs | 24.3531μs | 41.0626 KOps/s | 39.8014 KOps/s | $\color{#35bf28}+3.17\\%$ | | test_step_mdp_speed[True-False-True-True-False] | 43.4310μs | 15.6751μs | 63.7954 KOps/s | 60.6316 KOps/s | $\textbf{\color{#35bf28}+5.22\\%}$ | | test_step_mdp_speed[True-False-True-False-True] | 46.3070μs | 14.0626μs | 71.1108 KOps/s | 70.4271 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_step_mdp_speed[True-False-True-False-False] | 34.5850μs | 8.9889μs | 111.2481 KOps/s | 107.6932 KOps/s | $\color{#35bf28}+3.30\\%$ | | test_step_mdp_speed[True-False-False-True-True] | 70.9120μs | 25.3908μs | 39.3843 KOps/s | 37.7643 KOps/s | $\color{#35bf28}+4.29\\%$ | | test_step_mdp_speed[True-False-False-True-False] | 63.1180μs | 16.9371μs | 59.0421 KOps/s | 56.5947 KOps/s | $\color{#35bf28}+4.32\\%$ | | test_step_mdp_speed[True-False-False-False-True] | 0.1140ms | 15.1232μs | 66.1237 KOps/s | 64.0829 KOps/s | $\color{#35bf28}+3.18\\%$ | | test_step_mdp_speed[True-False-False-False-False] | 62.8170μs | 10.1684μs | 98.3439 KOps/s | 95.6654 KOps/s | $\color{#35bf28}+2.80\\%$ | | test_step_mdp_speed[False-True-True-True-True] | 57.5880μs | 24.6384μs | 40.5870 KOps/s | 39.7984 KOps/s | $\color{#35bf28}+1.98\\%$ | | test_step_mdp_speed[False-True-True-True-False] | 49.3320μs | 15.6409μs | 63.9350 KOps/s | 61.4561 KOps/s | $\color{#35bf28}+4.03\\%$ | | test_step_mdp_speed[False-True-True-False-True] | 43.3210μs | 16.3187μs | 61.2793 KOps/s | 60.0312 KOps/s | $\color{#35bf28}+2.08\\%$ | | test_step_mdp_speed[False-True-True-False-False] | 47.6690μs | 10.2250μs | 97.7997 KOps/s | 94.6493 KOps/s | $\color{#35bf28}+3.33\\%$ | | test_step_mdp_speed[False-True-False-True-True] | 0.1425ms | 25.8084μs | 38.7471 KOps/s | 38.2912 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_step_mdp_speed[False-True-False-True-False] | 47.5590μs | 16.9487μs | 59.0016 KOps/s | 56.7464 KOps/s | $\color{#35bf28}+3.97\\%$ | | test_step_mdp_speed[False-True-False-False-True] | 50.3240μs | 17.4797μs | 57.2092 KOps/s | 55.4890 KOps/s | $\color{#35bf28}+3.10\\%$ | | test_step_mdp_speed[False-True-False-False-False] | 49.1920μs | 11.2218μs | 89.1125 KOps/s | 84.7316 KOps/s | $\textbf{\color{#35bf28}+5.17\\%}$ | | test_step_mdp_speed[False-False-True-True-True] | 73.1660μs | 26.7529μs | 37.3792 KOps/s | 36.1225 KOps/s | $\color{#35bf28}+3.48\\%$ | | test_step_mdp_speed[False-False-True-True-False] | 49.6030μs | 18.2185μs | 54.8893 KOps/s | 52.0644 KOps/s | $\textbf{\color{#35bf28}+5.43\\%}$ | | test_step_mdp_speed[False-False-True-False-True] | 44.0930μs | 17.3467μs | 57.6480 KOps/s | 56.1967 KOps/s | $\color{#35bf28}+2.58\\%$ | | test_step_mdp_speed[False-False-True-False-False] | 53.7200μs | 11.3586μs | 88.0387 KOps/s | 85.2963 KOps/s | $\color{#35bf28}+3.22\\%$ | | test_step_mdp_speed[False-False-False-True-True] | 64.5010μs | 28.2642μs | 35.3804 KOps/s | 33.9895 KOps/s | $\color{#35bf28}+4.09\\%$ | | test_step_mdp_speed[False-False-False-True-False] | 50.0530μs | 19.2955μs | 51.8255 KOps/s | 49.5738 KOps/s | $\color{#35bf28}+4.54\\%$ | | test_step_mdp_speed[False-False-False-False-True] | 61.0140μs | 18.5483μs | 53.9134 KOps/s | 52.7306 KOps/s | $\color{#35bf28}+2.24\\%$ | | test_step_mdp_speed[False-False-False-False-False] | 36.4580μs | 12.4880μs | 80.0772 KOps/s | 77.6497 KOps/s | $\color{#35bf28}+3.13\\%$ | | test_values[generalized_advantage_estimate-True-True] | 9.8847ms | 9.5006ms | 105.2565 Ops/s | 103.7266 Ops/s | $\color{#35bf28}+1.47\\%$ | | test_values[vec_generalized_advantage_estimate-True-True] | 37.1368ms | 33.5250ms | 29.8285 Ops/s | 28.0170 Ops/s | $\textbf{\color{#35bf28}+6.47\\%}$ | | test_values[td0_return_estimate-False-False] | 0.2409ms | 0.1869ms | 5.3492 KOps/s | 5.6202 KOps/s | $\color{#d91a1a}-4.82\\%$ | | test_values[td1_return_estimate-False-False] | 24.1805ms | 23.7064ms | 42.1828 Ops/s | 41.7577 Ops/s | $\color{#35bf28}+1.02\\%$ | | test_values[vec_td1_return_estimate-False-False] | 34.8153ms | 33.5480ms | 29.8081 Ops/s | 27.8852 Ops/s | $\textbf{\color{#35bf28}+6.90\\%}$ | | test_values[td_lambda_return_estimate-True-False] | 37.6744ms | 34.4483ms | 29.0290 Ops/s | 28.4918 Ops/s | $\color{#35bf28}+1.89\\%$ | | test_values[vec_td_lambda_return_estimate-True-False] | 35.4968ms | 33.6040ms | 29.7584 Ops/s | 27.8863 Ops/s | $\textbf{\color{#35bf28}+6.71\\%}$ | | test_gae_speed[generalized_advantage_estimate-False-1-512] | 10.5303ms | 8.3183ms | 120.2176 Ops/s | 119.2288 Ops/s | $\color{#35bf28}+0.83\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 2.4555ms | 2.0548ms | 486.6554 Ops/s | 518.5369 Ops/s | $\textbf{\color{#d91a1a}-6.15\\%}$ | | test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.4698ms | 0.3572ms | 2.7999 KOps/s | 2.8273 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 61.0930ms | 40.8234ms | 24.4957 Ops/s | 23.4359 Ops/s | $\color{#35bf28}+4.52\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 4.1481ms | 3.1758ms | 314.8775 Ops/s | 327.8606 Ops/s | $\color{#d91a1a}-3.96\\%$ | | test_dqn_speed | 1.9663ms | 1.3549ms | 738.0556 Ops/s | 730.6994 Ops/s | $\color{#35bf28}+1.01\\%$ | | test_ddpg_speed | 3.1856ms | 2.8627ms | 349.3255 Ops/s | 348.8161 Ops/s | $\color{#35bf28}+0.15\\%$ | | test_sac_speed | 9.8374ms | 8.6555ms | 115.5339 Ops/s | 115.8569 Ops/s | $\color{#d91a1a}-0.28\\%$ | | test_redq_speed | 15.1003ms | 14.2028ms | 70.4085 Ops/s | 69.4642 Ops/s | $\color{#35bf28}+1.36\\%$ | | test_redq_deprec_speed | 15.7243ms | 14.3517ms | 69.6781 Ops/s | 71.1845 Ops/s | $\color{#d91a1a}-2.12\\%$ | | test_td3_speed | 17.8033ms | 8.6953ms | 115.0052 Ops/s | 116.0640 Ops/s | $\color{#d91a1a}-0.91\\%$ | | test_cql_speed | 39.2086ms | 37.5990ms | 26.5965 Ops/s | 26.9204 Ops/s | $\color{#d91a1a}-1.20\\%$ | | test_a2c_speed | 8.7024ms | 7.9579ms | 125.6609 Ops/s | 130.7055 Ops/s | $\color{#d91a1a}-3.86\\%$ | | test_ppo_speed | 9.3618ms | 8.2551ms | 121.1373 Ops/s | 125.9343 Ops/s | $\color{#d91a1a}-3.81\\%$ | | test_reinforce_speed | 7.4765ms | 6.9735ms | 143.4001 Ops/s | 148.3900 Ops/s | $\color{#d91a1a}-3.36\\%$ | | test_iql_speed | 35.3114ms | 34.3890ms | 29.0790 Ops/s | 29.7416 Ops/s | $\color{#d91a1a}-2.23\\%$ | | test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 4.0108ms | 3.7705ms | 265.2179 Ops/s | 266.6471 Ops/s | $\color{#d91a1a}-0.54\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.0144ms | 0.5163ms | 1.9367 KOps/s | 1.9385 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7091ms | 0.4879ms | 2.0494 KOps/s | 2.0553 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 4.0112ms | 3.7191ms | 268.8841 Ops/s | 259.8266 Ops/s | $\color{#35bf28}+3.49\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.0148ms | 0.5118ms | 1.9539 KOps/s | 1.9383 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.7012ms | 0.4835ms | 2.0682 KOps/s | 2.0343 KOps/s | $\color{#35bf28}+1.67\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 2.2814ms | 1.7204ms | 581.2438 Ops/s | 581.8873 Ops/s | $\color{#d91a1a}-0.11\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 5.0558ms | 1.6452ms | 607.8377 Ops/s | 613.8492 Ops/s | $\color{#d91a1a}-0.98\\%$ | | test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 5.1939ms | 3.9316ms | 254.3467 Ops/s | 256.6209 Ops/s | $\color{#d91a1a}-0.89\\%$ | | test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.1987ms | 0.6328ms | 1.5803 KOps/s | 1.3562 KOps/s | $\textbf{\color{#35bf28}+16.53\\%}$ | | test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8193ms | 0.6125ms | 1.6327 KOps/s | 1.6377 KOps/s | $\color{#d91a1a}-0.30\\%$ | | test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 4.2727ms | 3.8243ms | 261.4838 Ops/s | 264.5320 Ops/s | $\color{#d91a1a}-1.15\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.0514ms | 0.5294ms | 1.8889 KOps/s | 1.9197 KOps/s | $\color{#d91a1a}-1.60\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7031ms | 0.4972ms | 2.0111 KOps/s | 2.0199 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 4.3938ms | 3.8791ms | 257.7899 Ops/s | 266.5017 Ops/s | $\color{#d91a1a}-3.27\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.6315ms | 0.5161ms | 1.9377 KOps/s | 1.9552 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 3.7940ms | 0.5009ms | 1.9964 KOps/s | 2.0517 KOps/s | $\color{#d91a1a}-2.69\\%$ | | test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 5.7865ms | 3.8960ms | 256.6732 Ops/s | 257.0392 Ops/s | $\color{#d91a1a}-0.14\\%$ | | test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.2009ms | 0.6393ms | 1.5643 KOps/s | 1.5582 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.9206ms | 0.6181ms | 1.6178 KOps/s | 1.6517 KOps/s | $\color{#d91a1a}-2.05\\%$ | | test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1231s | 6.1826ms | 161.7436 Ops/s | 117.9617 Ops/s | $\textbf{\color{#35bf28}+37.12\\%}$ | | test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 15.4418ms | 12.8410ms | 77.8759 Ops/s | 77.6478 Ops/s | $\color{#35bf28}+0.29\\%$ | | test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 1.2044ms | 1.0592ms | 944.1233 Ops/s | 931.8748 Ops/s | $\color{#35bf28}+1.31\\%$ | | test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1179s | 8.2859ms | 120.6875 Ops/s | 164.9117 Ops/s | $\textbf{\color{#d91a1a}-26.82\\%}$ | | test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 15.5485ms | 12.7168ms | 78.6362 Ops/s | 77.5598 Ops/s | $\color{#35bf28}+1.39\\%$ | | test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 4.6624ms | 1.1597ms | 862.3105 Ops/s | 936.3912 Ops/s | $\textbf{\color{#d91a1a}-7.91\\%}$ | | test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1187s | 6.1988ms | 161.3225 Ops/s | 159.1374 Ops/s | $\color{#35bf28}+1.37\\%$ | | test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 0.1251s | 15.2164ms | 65.7186 Ops/s | 64.1972 Ops/s | $\color{#35bf28}+2.37\\%$ | | test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 3.9092ms | 1.2809ms | 780.6856 Ops/s | 712.6531 Ops/s | $\textbf{\color{#35bf28}+9.55\\%}$ |

github-actions[bot] commented 1 month ago

$\color{#35bf28}\textsf{\Large\✔\kern{0.2cm}\normalsize OK}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}0$.

Expand to view detailed results

| Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ----------------------------------------------------------------------------------------- | --------- | --------- | -------------- | ------------------ | ----------------------------------- | | test_single | 0.1166s | 0.1161s | 8.6124 Ops/s | 8.4287 Ops/s | $\color{#35bf28}+2.18\\%$ | | test_sync | 0.1061s | 0.1053s | 9.4927 Ops/s | 9.5058 Ops/s | $\color{#d91a1a}-0.14\\%$ | | test_async | 0.1993s | 98.0876ms | 10.1950 Ops/s | 10.2892 Ops/s | $\color{#d91a1a}-0.92\\%$ | | test_single_pixels | 0.1277s | 0.1276s | 7.8357 Ops/s | 7.7913 Ops/s | $\color{#35bf28}+0.57\\%$ | | test_sync_pixels | 85.4421ms | 84.0634ms | 11.8958 Ops/s | 12.1972 Ops/s | $\color{#d91a1a}-2.47\\%$ | | test_async_pixels | 0.1548s | 67.1312ms | 14.8962 Ops/s | 14.5560 Ops/s | $\color{#35bf28}+2.34\\%$ | | test_simple | 0.8811s | 0.8256s | 1.2112 Ops/s | 1.2082 Ops/s | $\color{#35bf28}+0.24\\%$ | | test_transformed | 1.1270s | 1.0683s | 0.9361 Ops/s | 0.9229 Ops/s | $\color{#35bf28}+1.42\\%$ | | test_serial | 2.5263s | 2.4739s | 0.4042 Ops/s | 0.3970 Ops/s | $\color{#35bf28}+1.81\\%$ | | test_parallel | 2.4255s | 2.3646s | 0.4229 Ops/s | 0.4253 Ops/s | $\color{#d91a1a}-0.57\\%$ | | test_step_mdp_speed[True-True-True-True-True] | 59.7110μs | 34.6468μs | 28.8627 KOps/s | 29.2076 KOps/s | $\color{#d91a1a}-1.18\\%$ | | test_step_mdp_speed[True-True-True-True-False] | 38.9610μs | 20.2622μs | 49.3530 KOps/s | 49.1212 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_step_mdp_speed[True-True-True-False-True] | 36.4211μs | 19.3670μs | 51.6342 KOps/s | 49.8248 KOps/s | $\color{#35bf28}+3.63\\%$ | | test_step_mdp_speed[True-True-True-False-False] | 38.3200μs | 11.5087μs | 86.8905 KOps/s | 86.6357 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_step_mdp_speed[True-True-False-True-True] | 58.2710μs | 35.8018μs | 27.9315 KOps/s | 27.8685 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_step_mdp_speed[True-True-False-True-False] | 37.6810μs | 21.9199μs | 45.6207 KOps/s | 44.6396 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_step_mdp_speed[True-True-False-False-True] | 37.9210μs | 21.3047μs | 46.9381 KOps/s | 45.9864 KOps/s | $\color{#35bf28}+2.07\\%$ | | test_step_mdp_speed[True-True-False-False-False] | 30.3800μs | 13.3508μs | 74.9019 KOps/s | 74.5921 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_step_mdp_speed[True-False-True-True-True] | 54.8610μs | 37.4351μs | 26.7129 KOps/s | 26.0808 KOps/s | $\color{#35bf28}+2.42\\%$ | | test_step_mdp_speed[True-False-True-True-False] | 51.9010μs | 24.1130μs | 41.4715 KOps/s | 40.6057 KOps/s | $\color{#35bf28}+2.13\\%$ | | test_step_mdp_speed[True-False-True-False-True] | 42.2010μs | 21.0845μs | 47.4282 KOps/s | 46.5951 KOps/s | $\color{#35bf28}+1.79\\%$ | | test_step_mdp_speed[True-False-True-False-False] | 31.1410μs | 13.3728μs | 74.7789 KOps/s | 74.5029 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_step_mdp_speed[True-False-False-True-True] | 60.2710μs | 39.4791μs | 25.3298 KOps/s | 24.8911 KOps/s | $\color{#35bf28}+1.76\\%$ | | test_step_mdp_speed[True-False-False-True-False] | 56.8200μs | 25.7680μs | 38.8078 KOps/s | 38.5652 KOps/s | $\color{#35bf28}+0.63\\%$ | | test_step_mdp_speed[True-False-False-False-True] | 50.6800μs | 23.0355μs | 43.4113 KOps/s | 42.7017 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_step_mdp_speed[True-False-False-False-False] | 37.0700μs | 15.1842μs | 65.8578 KOps/s | 65.0887 KOps/s | $\color{#35bf28}+1.18\\%$ | | test_step_mdp_speed[False-True-True-True-True] | 57.0610μs | 38.0404μs | 26.2879 KOps/s | 26.1861 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_step_mdp_speed[False-True-True-True-False] | 41.7210μs | 23.8194μs | 41.9826 KOps/s | 40.7780 KOps/s | $\color{#35bf28}+2.95\\%$ | | test_step_mdp_speed[False-True-True-False-True] | 46.5010μs | 25.6234μs | 39.0269 KOps/s | 39.1101 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_step_mdp_speed[False-True-True-False-False] | 33.2700μs | 15.2911μs | 65.3974 KOps/s | 65.2395 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_step_mdp_speed[False-True-False-True-True] | 66.6410μs | 39.6752μs | 25.2047 KOps/s | 25.0429 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_step_mdp_speed[False-True-False-True-False] | 43.4600μs | 25.5977μs | 39.0660 KOps/s | 38.1326 KOps/s | $\color{#35bf28}+2.45\\%$ | | test_step_mdp_speed[False-True-False-False-True] | 97.8220μs | 26.7900μs | 37.3274 KOps/s | 36.3473 KOps/s | $\color{#35bf28}+2.70\\%$ | | test_step_mdp_speed[False-True-False-False-False] | 34.7610μs | 17.0514μs | 58.6462 KOps/s | 57.8477 KOps/s | $\color{#35bf28}+1.38\\%$ | | test_step_mdp_speed[False-False-True-True-True] | 63.8710μs | 41.5830μs | 24.0483 KOps/s | 23.9781 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_step_mdp_speed[False-False-True-True-False] | 53.4710μs | 27.7401μs | 36.0489 KOps/s | 35.1416 KOps/s | $\color{#35bf28}+2.58\\%$ | | test_step_mdp_speed[False-False-True-False-True] | 44.7010μs | 26.7685μs | 37.3573 KOps/s | 36.4047 KOps/s | $\color{#35bf28}+2.62\\%$ | | test_step_mdp_speed[False-False-True-False-False] | 35.5010μs | 17.0483μs | 58.6570 KOps/s | 58.1891 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_step_mdp_speed[False-False-False-True-True] | 67.8710μs | 43.3012μs | 23.0940 KOps/s | 22.4879 KOps/s | $\color{#35bf28}+2.70\\%$ | | test_step_mdp_speed[False-False-False-True-False] | 50.0600μs | 29.8225μs | 33.5317 KOps/s | 33.4874 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_step_mdp_speed[False-False-False-False-True] | 49.9310μs | 28.7825μs | 34.7434 KOps/s | 33.9468 KOps/s | $\color{#35bf28}+2.35\\%$ | | test_step_mdp_speed[False-False-False-False-False] | 45.4010μs | 19.0299μs | 52.5488 KOps/s | 52.7134 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_values[generalized_advantage_estimate-True-True] | 25.2427ms | 24.7372ms | 40.4250 Ops/s | 39.8450 Ops/s | $\color{#35bf28}+1.46\\%$ | | test_values[vec_generalized_advantage_estimate-True-True] | 91.3658ms | 3.3956ms | 294.4959 Ops/s | 309.4991 Ops/s | $\color{#d91a1a}-4.85\\%$ | | test_values[td0_return_estimate-False-False] | 91.8820μs | 64.6287μs | 15.4730 KOps/s | 15.4549 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_values[td1_return_estimate-False-False] | 56.7127ms | 53.0992ms | 18.8327 Ops/s | 18.4039 Ops/s | $\color{#35bf28}+2.33\\%$ | | test_values[vec_td1_return_estimate-False-False] | 2.0526ms | 1.7660ms | 566.2365 Ops/s | 565.6514 Ops/s | $\color{#35bf28}+0.10\\%$ | | test_values[td_lambda_return_estimate-True-False] | 89.6315ms | 84.6277ms | 11.8165 Ops/s | 11.4845 Ops/s | $\color{#35bf28}+2.89\\%$ | | test_values[vec_td_lambda_return_estimate-True-False] | 2.0833ms | 1.7632ms | 567.1461 Ops/s | 566.5285 Ops/s | $\color{#35bf28}+0.11\\%$ | | test_gae_speed[generalized_advantage_estimate-False-1-512] | 24.0156ms | 23.8697ms | 41.8940 Ops/s | 39.0933 Ops/s | $\textbf{\color{#35bf28}+7.16\\%}$ | | test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 0.8923ms | 0.6967ms | 1.4354 KOps/s | 1.4297 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7075ms | 0.6509ms | 1.5362 KOps/s | 1.5261 KOps/s | $\color{#35bf28}+0.67\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.4914ms | 1.4513ms | 689.0322 Ops/s | 687.1838 Ops/s | $\color{#35bf28}+0.27\\%$ | | test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.9343ms | 0.6682ms | 1.4966 KOps/s | 1.4887 KOps/s | $\color{#35bf28}+0.53\\%$ | | test_dqn_speed | 1.5873ms | 1.4131ms | 707.6404 Ops/s | 696.6794 Ops/s | $\color{#35bf28}+1.57\\%$ | | test_ddpg_speed | 3.1244ms | 2.9420ms | 339.9097 Ops/s | 339.8437 Ops/s | $\color{#35bf28}+0.02\\%$ | | test_sac_speed | 9.4216ms | 8.4242ms | 118.7051 Ops/s | 117.6530 Ops/s | $\color{#35bf28}+0.89\\%$ | | test_redq_speed | 12.5831ms | 10.6424ms | 93.9641 Ops/s | 84.7764 Ops/s | $\textbf{\color{#35bf28}+10.84\\%}$ | | test_redq_deprec_speed | 12.1425ms | 11.5552ms | 86.5412 Ops/s | 85.0523 Ops/s | $\color{#35bf28}+1.75\\%$ | | test_td3_speed | 17.2305ms | 8.4392ms | 118.4940 Ops/s | 118.9092 Ops/s | $\color{#d91a1a}-0.35\\%$ | | test_cql_speed | 26.1661ms | 25.7269ms | 38.8698 Ops/s | 38.5555 Ops/s | $\color{#35bf28}+0.82\\%$ | | test_a2c_speed | 5.9002ms | 5.6844ms | 175.9209 Ops/s | 175.9801 Ops/s | $\color{#d91a1a}-0.03\\%$ | | test_ppo_speed | 6.2760ms | 5.9751ms | 167.3625 Ops/s | 166.5778 Ops/s | $\color{#35bf28}+0.47\\%$ | | test_reinforce_speed | 4.8890ms | 4.6115ms | 216.8510 Ops/s | 214.5543 Ops/s | $\color{#35bf28}+1.07\\%$ | | test_iql_speed | 20.7991ms | 19.9293ms | 50.1775 Ops/s | 50.0486 Ops/s | $\color{#35bf28}+0.26\\%$ | | test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 4.8619ms | 4.6641ms | 214.4055 Ops/s | 211.0310 Ops/s | $\color{#35bf28}+1.60\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.8282ms | 0.5948ms | 1.6813 KOps/s | 1.6769 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 4.5270ms | 0.5723ms | 1.7473 KOps/s | 1.7420 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 4.8455ms | 4.6329ms | 215.8464 Ops/s | 214.1774 Ops/s | $\color{#35bf28}+0.78\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7000ms | 0.5846ms | 1.7105 KOps/s | 1.6978 KOps/s | $\color{#35bf28}+0.75\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 4.4426ms | 0.5616ms | 1.7807 KOps/s | 1.7557 KOps/s | $\color{#35bf28}+1.42\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 2.2017ms | 2.0705ms | 482.9797 Ops/s | 477.2070 Ops/s | $\color{#35bf28}+1.21\\%$ | | test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 5.8449ms | 1.9710ms | 507.3608 Ops/s | 498.6768 Ops/s | $\color{#35bf28}+1.74\\%$ | | test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 4.9417ms | 4.7774ms | 209.3197 Ops/s | 206.3803 Ops/s | $\color{#35bf28}+1.42\\%$ | | test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.7529ms | 0.7148ms | 1.3990 KOps/s | 1.3766 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8606ms | 0.6879ms | 1.4537 KOps/s | 1.4337 KOps/s | $\color{#35bf28}+1.39\\%$ | | test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 4.7542ms | 4.6384ms | 215.5905 Ops/s | 211.9812 Ops/s | $\color{#35bf28}+1.70\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.4314ms | 0.5937ms | 1.6843 KOps/s | 1.6632 KOps/s | $\color{#35bf28}+1.27\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7527ms | 0.5667ms | 1.7645 KOps/s | 1.7237 KOps/s | $\color{#35bf28}+2.37\\%$ | | test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 4.8780ms | 4.6029ms | 217.2555 Ops/s | 213.6178 Ops/s | $\color{#35bf28}+1.70\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7056ms | 0.5866ms | 1.7047 KOps/s | 1.6842 KOps/s | $\color{#35bf28}+1.22\\%$ | | test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 4.7734ms | 0.5721ms | 1.7479 KOps/s | 1.7344 KOps/s | $\color{#35bf28}+0.78\\%$ | | test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 4.8422ms | 4.7710ms | 209.6008 Ops/s | 206.1215 Ops/s | $\color{#35bf28}+1.69\\%$ | | test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.6769ms | 0.7221ms | 1.3848 KOps/s | 1.3602 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8345ms | 0.6956ms | 1.4376 KOps/s | 1.4199 KOps/s | $\color{#35bf28}+1.25\\%$ | | test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1286s | 7.4051ms | 135.0417 Ops/s | 131.3146 Ops/s | $\color{#35bf28}+2.84\\%$ | | test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 17.6501ms | 15.5218ms | 64.4255 Ops/s | 60.4447 Ops/s | $\textbf{\color{#35bf28}+6.59\\%}$ | | test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 1.3517ms | 1.2615ms | 792.6760 Ops/s | 754.4610 Ops/s | $\textbf{\color{#35bf28}+5.07\\%}$ | | test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1188s | 9.4649ms | 105.6537 Ops/s | 104.0167 Ops/s | $\color{#35bf28}+1.57\\%$ | | test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 17.6056ms | 15.4722ms | 64.6320 Ops/s | 61.7308 Ops/s | $\color{#35bf28}+4.70\\%$ | | test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 7.6730ms | 1.4016ms | 713.4716 Ops/s | 702.2936 Ops/s | $\color{#35bf28}+1.59\\%$ | | test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1188s | 7.3807ms | 135.4891 Ops/s | 131.4956 Ops/s | $\color{#35bf28}+3.04\\%$ | | test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 18.0116ms | 15.6351ms | 63.9587 Ops/s | 60.3444 Ops/s | $\textbf{\color{#35bf28}+5.99\\%}$ | | test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 2.3989ms | 1.4338ms | 697.4481 Ops/s | 587.7458 Ops/s | $\textbf{\color{#35bf28}+18.66\\%}$ |

vmoens commented 1 month ago

@wertyuilife2

Should we reduce the priorities of each traj while we're at it? I don't think it'd require much compute and it would make sure that all items are equally weighted within a traj

Take the following 2 trajs with associated priorities

Item: [0, 1, 2, 3, 4, 5, 6, 7] Traj: [0, 0, 0, 1, 1, 1, 1, 1] Priority: [10, 1, 1, 10, 1, 2, 1, 1]

Currently, item 0 and 3 having a higher priority they have more chances of being sampled as start points and hence you will get more trajs starting with these. If we reduce, we will have (10 + 1 + 1)/3=4 for the first and (10 + 1 + 2 + 1 +1)/5=3 for the second.

Priority: [4, 4, 4, 3, 3, 3, 3, 3]

At this point, the start point is equally likely within a traj but some trajs have a higher prob of being sampled (which seems to make more sense to me?)

I guess any solution will make someone unhappy...

wertyuilife2 commented 1 month ago

@vmoens I believe that when discussing PrioritizedSampler, there is only one correct approach: we should not reduce the priorities of each trajectory while we are at it.

The core idea of PER is that certain samples (not trajectories) are important (such as a critical action) and need to be learned frequently. Reducing the priorities of each trajectory would make it difficult for PER to focus on updating specific important samples.

When discussing PrioritizedSliceSampler, we face the choice of whether to reduce the priorities of each slice while we are at it. My suggestion is to leave this choice to the user, as the calculation of priorities and the calling of update_priority() are both handled by the user. In other words, we still should not reduce the priorities of each slice.

I think your thoughts are more likely associated with an "episodic buffer", but in my view, the current implementation of ReplayBuffer is not episodic, so there is no need to unify the priority of the entire trajectory.

vmoens commented 1 month ago

The core idea of PER is that certain samples (not trajectories) are important (such as a critical action) and need to be learned frequently. Reducing the priorities of each trajectory would make it difficult for PER to focus on updating specific important samples.

Got it thanks for that, indeed that's how I edited the docstring (users should be in charge of setting the proper priority). But when we say prioritized, slice sampler I can imagine someone imagining: I have a transition with high priority therefore there is a chance to find it anywhere (not just at the beginning) in my sample -- whereas now there is a higher chance to find it at the beginning of a slice than at the end

pytorch / rl