issues
search
pytorch
/
tensordict
TensorDict is a pytorch dedicated tensor container.
MIT License
832
stars
74
forks
source link
[Feature] ``split_keys``
#909
Closed
vmoens
closed
3 months ago
vmoens
commented
3 months ago
Stack from
ghstack
(oldest at bottom):
->
#909
cc @shagunsodhani
github-actions[bot]
commented
3 months ago
$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests
Total Benchmarks: 144. Improved: $\large\color{#35bf28}20$. Worsened: $\large\color{#d91a1a}5$.
Expand to view detailed results
| Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 59.3410μs | 21.9841μs | 45.4874 KOps/s | 43.0191 KOps/s | $\textbf{\color{#35bf28}+5.74\\%}$ | | test_plain_set_stack_nested | 58.6690μs | 22.0295μs | 45.3937 KOps/s | 42.7256 KOps/s | $\textbf{\color{#35bf28}+6.24\\%}$ | | test_plain_set_nested_inplace | 62.6680μs | 24.0002μs | 41.6663 KOps/s | 39.3344 KOps/s | $\textbf{\color{#35bf28}+5.93\\%}$ | | test_plain_set_stack_nested_inplace | 66.0740μs | 23.8505μs | 41.9278 KOps/s | 39.6558 KOps/s | $\textbf{\color{#35bf28}+5.73\\%}$ | | test_items | 41.7180μs | 2.6529μs | 376.9516 KOps/s | 372.2480 KOps/s | $\color{#35bf28}+1.26\\%$ | | test_items_nested | 1.3480ms | 0.3739ms | 2.6744 KOps/s | 2.7137 KOps/s | $\color{#d91a1a}-1.45\\%$ | | test_items_nested_locked | 0.6613ms | 0.3698ms | 2.7041 KOps/s | 2.7017 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_items_nested_leaf | 0.1927ms | 87.6998μs | 11.4025 KOps/s | 11.7473 KOps/s | $\color{#d91a1a}-2.93\\%$ | | test_items_stack_nested | 0.4789ms | 0.3740ms | 2.6740 KOps/s | 2.6621 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_items_stack_nested_leaf | 0.1733ms | 86.9908μs | 11.4955 KOps/s | 11.6350 KOps/s | $\color{#d91a1a}-1.20\\%$ | | test_items_stack_nested_locked | 0.6073ms | 0.3729ms | 2.6820 KOps/s | 2.7223 KOps/s | $\color{#d91a1a}-1.48\\%$ | | test_keys | 57.7850μs | 4.8000μs | 208.3323 KOps/s | 233.6138 KOps/s | $\textbf{\color{#d91a1a}-10.82\\%}$ | | test_keys_nested | 0.2546ms | 0.1441ms | 6.9381 KOps/s | 6.9491 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_keys_nested_locked | 0.6778ms | 0.1492ms | 6.7046 KOps/s | 6.6563 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_keys_nested_leaf | 0.2568ms | 0.1240ms | 8.0658 KOps/s | 8.0590 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_keys_stack_nested | 0.2493ms | 0.1431ms | 6.9895 KOps/s | 6.9627 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_keys_stack_nested_leaf | 0.2165ms | 0.1222ms | 8.1807 KOps/s | 8.1891 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_keys_stack_nested_locked | 0.2981ms | 0.1493ms | 6.6977 KOps/s | 6.7109 KOps/s | $\color{#d91a1a}-0.20\\%$ | | test_values | 8.1877μs | 1.1678μs | 856.3226 KOps/s | 867.1853 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_values_nested | 0.1063ms | 50.0283μs | 19.9887 KOps/s | 19.6231 KOps/s | $\color{#35bf28}+1.86\\%$ | | test_values_nested_locked | 0.1601ms | 49.7069μs | 20.1179 KOps/s | 19.4743 KOps/s | $\color{#35bf28}+3.31\\%$ | | test_values_nested_leaf | 83.3870μs | 44.5798μs | 22.4317 KOps/s | 22.1473 KOps/s | $\color{#35bf28}+1.28\\%$ | | test_values_stack_nested | 0.1093ms | 51.4623μs | 19.4317 KOps/s | 19.3686 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_values_stack_nested_leaf | 82.5740μs | 44.2510μs | 22.5984 KOps/s | 22.1676 KOps/s | $\color{#35bf28}+1.94\\%$ | | test_values_stack_nested_locked | 94.4270μs | 50.8094μs | 19.6814 KOps/s | 19.3292 KOps/s | $\color{#35bf28}+1.82\\%$ | | test_membership | 1.9993μs | 0.7340μs | 1.3623 MOps/s | 1.0758 MOps/s | $\textbf{\color{#35bf28}+26.63\\%}$ | | test_membership_nested | 24.2250μs | 2.7305μs | 366.2382 KOps/s | 358.1389 KOps/s | $\color{#35bf28}+2.26\\%$ | | test_membership_nested_leaf | 36.5380μs | 2.7578μs | 362.6073 KOps/s | 369.1612 KOps/s | $\color{#d91a1a}-1.78\\%$ | | test_membership_stacked_nested | 34.9150μs | 2.6806μs | 373.0534 KOps/s | 369.5835 KOps/s | $\color{#35bf28}+0.94\\%$ | | test_membership_stacked_nested_leaf | 24.7060μs | 2.7132μs | 368.5747 KOps/s | 368.4246 KOps/s | $\color{#35bf28}+0.04\\%$ | | test_membership_nested_last | 38.8330μs | 3.9789μs | 251.3288 KOps/s | 252.3909 KOps/s | $\color{#d91a1a}-0.42\\%$ | | test_membership_nested_leaf_last | 41.8680μs | 4.0097μs | 249.3936 KOps/s | 247.1798 KOps/s | $\color{#35bf28}+0.90\\%$ | | test_membership_stacked_nested_last | 27.2910μs | 4.5731μs | 218.6700 KOps/s | 217.6217 KOps/s | $\color{#35bf28}+0.48\\%$ | | test_membership_stacked_nested_leaf_last | 46.5670μs | 4.5724μs | 218.7040 KOps/s | 215.0543 KOps/s | $\color{#35bf28}+1.70\\%$ | | test_nested_getleaf | 36.6080μs | 10.5926μs | 94.4057 KOps/s | 92.5206 KOps/s | $\color{#35bf28}+2.04\\%$ | | test_nested_get | 42.3590μs | 9.9368μs | 100.6359 KOps/s | 95.8062 KOps/s | $\textbf{\color{#35bf28}+5.04\\%}$ | | test_stacked_getleaf | 41.9780μs | 10.9856μs | 91.0280 KOps/s | 92.3985 KOps/s | $\color{#d91a1a}-1.48\\%$ | | test_stacked_get | 40.9560μs | 9.8802μs | 101.2122 KOps/s | 99.5072 KOps/s | $\color{#35bf28}+1.71\\%$ | | test_nested_getitemleaf | 49.7830μs | 11.0509μs | 90.4903 KOps/s | 88.9862 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_nested_getitem | 45.6060μs | 10.1037μs | 98.9733 KOps/s | 95.8041 KOps/s | $\color{#35bf28}+3.31\\%$ | | test_stacked_getitemleaf | 55.2330μs | 10.9712μs | 91.1474 KOps/s | 88.3494 KOps/s | $\color{#35bf28}+3.17\\%$ | | test_stacked_getitem | 49.1120μs | 10.0346μs | 99.6557 KOps/s | 95.9775 KOps/s | $\color{#35bf28}+3.83\\%$ | | test_lock_nested | 1.0646ms | 0.5195ms | 1.9251 KOps/s | 1.6760 KOps/s | $\textbf{\color{#35bf28}+14.86\\%}$ | | test_lock_stack_nested | 0.7115ms | 0.4764ms | 2.0990 KOps/s | 2.1082 KOps/s | $\color{#d91a1a}-0.44\\%$ | | test_unlock_nested | 0.8266ms | 0.4439ms | 2.2527 KOps/s | 2.3213 KOps/s | $\color{#d91a1a}-2.96\\%$ | | test_unlock_stack_nested | 0.6749ms | 0.3921ms | 2.5505 KOps/s | 2.5597 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_flatten_speed | 0.2578ms | 0.1080ms | 9.2574 KOps/s | 9.6909 KOps/s | $\color{#d91a1a}-4.47\\%$ | | test_unflatten_speed | 0.5994ms | 0.4313ms | 2.3185 KOps/s | 2.2890 KOps/s | $\color{#35bf28}+1.29\\%$ | | test_common_ops | 3.9419ms | 1.1482ms | 870.9232 Ops/s | 832.0812 Ops/s | $\color{#35bf28}+4.67\\%$ | | test_creation | 21.1490μs | 2.5387μs | 393.8984 KOps/s | 400.0343 KOps/s | $\color{#d91a1a}-1.53\\%$ | | test_creation_empty | 61.1440μs | 19.2217μs | 52.0245 KOps/s | 46.7166 KOps/s | $\textbf{\color{#35bf28}+11.36\\%}$ | | test_creation_nested_1 | 62.2860μs | 22.4404μs | 44.5624 KOps/s | 40.0158 KOps/s | $\textbf{\color{#35bf28}+11.36\\%}$ | | test_creation_nested_2 | 77.0840μs | 26.5287μs | 37.6950 KOps/s | 34.4210 KOps/s | $\textbf{\color{#35bf28}+9.51\\%}$ | | test_clone | 0.1196ms | 17.3552μs | 57.6195 KOps/s | 56.5500 KOps/s | $\color{#35bf28}+1.89\\%$ | | test_getitem[int] | 1.0983ms | 12.5837μs | 79.4678 KOps/s | 77.9481 KOps/s | $\color{#35bf28}+1.95\\%$ | | test_getitem[slice_int] | 0.1290ms | 31.9771μs | 31.2723 KOps/s | 30.7607 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_getitem[range] | 0.3248ms | 57.1626μs | 17.4939 KOps/s | 17.5140 KOps/s | $\color{#d91a1a}-0.11\\%$ | | test_getitem[tuple] | 0.1421ms | 26.5870μs | 37.6123 KOps/s | 37.3992 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_getitem[list] | 0.3472ms | 52.2945μs | 19.1225 KOps/s | 18.9965 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_setitem_dim[int] | 68.4580μs | 34.4536μs | 29.0246 KOps/s | 27.0427 KOps/s | $\textbf{\color{#35bf28}+7.33\\%}$ | | test_setitem_dim[slice_int] | 0.1324ms | 73.2277μs | 13.6560 KOps/s | 13.0901 KOps/s | $\color{#35bf28}+4.32\\%$ | | test_setitem_dim[range] | 0.1477ms | 93.0380μs | 10.7483 KOps/s | 10.1080 KOps/s | $\textbf{\color{#35bf28}+6.33\\%}$ | | test_setitem_dim[tuple] | 0.1075ms | 60.1957μs | 16.6125 KOps/s | 15.8294 KOps/s | $\color{#35bf28}+4.95\\%$ | | test_setitem | 0.2184ms | 31.3193μs | 31.9292 KOps/s | 31.0685 KOps/s | $\color{#35bf28}+2.77\\%$ | | test_set | 0.1630ms | 29.9880μs | 33.3467 KOps/s | 32.0258 KOps/s | $\color{#35bf28}+4.12\\%$ | | test_set_shared | 1.2601ms | 0.2169ms | 4.6108 KOps/s | 4.4450 KOps/s | $\color{#35bf28}+3.73\\%$ | | test_update | 0.2058ms | 36.6177μs | 27.3092 KOps/s | 24.7186 KOps/s | $\textbf{\color{#35bf28}+10.48\\%}$ | | test_update_nested | 0.1889ms | 46.8112μs | 21.3624 KOps/s | 19.7944 KOps/s | $\textbf{\color{#35bf28}+7.92\\%}$ | | test_update__nested | 0.1803ms | 34.7315μs | 28.7923 KOps/s | 28.1918 KOps/s | $\color{#35bf28}+2.13\\%$ | | test_set_nested | 0.1795ms | 31.5403μs | 31.7055 KOps/s | 29.4764 KOps/s | $\textbf{\color{#35bf28}+7.56\\%}$ | | test_set_nested_new | 0.1940ms | 36.9629μs | 27.0542 KOps/s | 25.7492 KOps/s | $\textbf{\color{#35bf28}+5.07\\%}$ | | test_select | 0.2001ms | 53.9389μs | 18.5395 KOps/s | 18.0954 KOps/s | $\color{#35bf28}+2.45\\%$ | | test_select_nested | 0.1762ms | 60.6338μs | 16.4925 KOps/s | 16.5080 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_exclude_nested | 0.9402ms | 82.0760μs | 12.1838 KOps/s | 12.3856 KOps/s | $\color{#d91a1a}-1.63\\%$ | | test_empty[True] | 0.4481ms | 0.3436ms | 2.9100 KOps/s | 2.9082 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_empty[False] | 14.4773μs | 1.2682μs | 788.5314 KOps/s | 801.7494 KOps/s | $\color{#d91a1a}-1.65\\%$ | | test_unbind_speed | 0.4046ms | 0.3266ms | 3.0621 KOps/s | 3.0121 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_unbind_speed_stack0 | 0.4603ms | 0.3147ms | 3.1776 KOps/s | 3.2017 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_unbind_speed_stack1 | 86.1493ms | 0.8261ms | 1.2104 KOps/s | 1.4294 KOps/s | $\textbf{\color{#d91a1a}-15.32\\%}$ | | test_split | 86.7829ms | 2.2216ms | 450.1228 Ops/s | 404.6044 Ops/s | $\textbf{\color{#35bf28}+11.25\\%}$ | | test_chunk | 79.9746ms | 2.2228ms | 449.8794 Ops/s | 475.3857 Ops/s | $\textbf{\color{#d91a1a}-5.37\\%}$ | | test_creation[device0] | 3.7805ms | 0.1215ms | 8.2315 KOps/s | 8.2677 KOps/s | $\color{#d91a1a}-0.44\\%$ | | test_creation_from_tensor | 0.2900ms | 0.1205ms | 8.3019 KOps/s | 8.0396 KOps/s | $\color{#35bf28}+3.26\\%$ | | test_add_one[memmap_tensor0] | 0.2520ms | 7.9495μs | 125.7936 KOps/s | 122.1732 KOps/s | $\color{#35bf28}+2.96\\%$ | | test_contiguous[memmap_tensor0] | 21.0490μs | 2.2225μs | 449.9388 KOps/s | 450.2586 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_stack[memmap_tensor0] | 50.5140μs | 5.8676μs | 170.4267 KOps/s | 162.5466 KOps/s | $\color{#35bf28}+4.85\\%$ | | test_memmaptd_index | 0.6384ms | 0.4342ms | 2.3032 KOps/s | 2.2724 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_memmaptd_index_astensor | 0.7804ms | 0.5138ms | 1.9464 KOps/s | 1.9177 KOps/s | $\color{#35bf28}+1.50\\%$ | | test_memmaptd_index_op | 1.9054ms | 1.0952ms | 913.0619 Ops/s | 878.6655 Ops/s | $\color{#35bf28}+3.91\\%$ | | test_serialize_model | 0.2045s | 0.1394s | 7.1713 Ops/s | 7.4600 Ops/s | $\color{#d91a1a}-3.87\\%$ | | test_serialize_model_pickle | 0.4669s | 0.3945s | 2.5349 Ops/s | 2.4756 Ops/s | $\color{#35bf28}+2.40\\%$ | | test_serialize_weights | 0.1587s | 0.1301s | 7.6872 Ops/s | 7.8218 Ops/s | $\color{#d91a1a}-1.72\\%$ | | test_serialize_weights_returnearly | 0.1738s | 0.1652s | 6.0527 Ops/s | 5.9375 Ops/s | $\color{#35bf28}+1.94\\%$ | | test_serialize_weights_pickle | 1.2515s | 0.8758s | 1.1418 Ops/s | 2.4343 Ops/s | $\textbf{\color{#d91a1a}-53.10\\%}$ | | test_serialize_weights_filesystem | 0.1500s | 0.1431s | 6.9887 Ops/s | 6.9767 Ops/s | $\color{#35bf28}+0.17\\%$ | | test_serialize_model_filesystem | 0.1547s | 0.1444s | 6.9244 Ops/s | 5.9630 Ops/s | $\textbf{\color{#35bf28}+16.12\\%}$ | | test_reshape_pytree | 86.1710μs | 39.4521μs | 25.3472 KOps/s | 25.1581 KOps/s | $\color{#35bf28}+0.75\\%$ | | test_reshape_td | 0.1093ms | 49.1754μs | 20.3354 KOps/s | 20.1268 KOps/s | $\color{#35bf28}+1.04\\%$ | | test_view_pytree | 81.5020μs | 39.2581μs | 25.4724 KOps/s | 25.4645 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_view_td | 0.1080ms | 56.0340μs | 17.8463 KOps/s | 18.3847 KOps/s | $\color{#d91a1a}-2.93\\%$ | | test_unbind_pytree | 0.1031ms | 36.8761μs | 27.1178 KOps/s | 27.4261 KOps/s | $\color{#d91a1a}-1.12\\%$ | | test_unbind_td | 0.3666ms | 49.2271μs | 20.3140 KOps/s | 20.7694 KOps/s | $\color{#d91a1a}-2.19\\%$ | | test_split_pytree | 0.1313ms | 39.2364μs | 25.4865 KOps/s | 24.6808 KOps/s | $\color{#35bf28}+3.26\\%$ | | test_split_td | 0.1964ms | 59.7415μs | 16.7388 KOps/s | 16.4527 KOps/s | $\color{#35bf28}+1.74\\%$ | | test_add_pytree | 0.1006ms | 44.6771μs | 22.3828 KOps/s | 22.3288 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_add_td | 0.2243ms | 85.0301μs | 11.7605 KOps/s | 11.0640 KOps/s | $\textbf{\color{#35bf28}+6.30\\%}$ | | test_distributed | 0.2563ms | 0.1283ms | 7.7924 KOps/s | 7.4699 KOps/s | $\color{#35bf28}+4.32\\%$ | | test_tdmodule | 59.0200μs | 18.4019μs | 54.3422 KOps/s | 55.1023 KOps/s | $\color{#d91a1a}-1.38\\%$ | | test_tdmodule_dispatch | 86.0710μs | 37.8458μs | 26.4230 KOps/s | 25.8744 KOps/s | $\color{#35bf28}+2.12\\%$ | | test_tdseq | 56.2350μs | 19.2914μs | 51.8365 KOps/s | 49.8366 KOps/s | $\color{#35bf28}+4.01\\%$ | | test_tdseq_dispatch | 59.9130μs | 40.6168μs | 24.6204 KOps/s | 22.7458 KOps/s | $\textbf{\color{#35bf28}+8.24\\%}$ | | test_instantiation_functorch | 2.4404ms | 1.5730ms | 635.7466 Ops/s | 639.0671 Ops/s | $\color{#d91a1a}-0.52\\%$ | | test_instantiation_td | 1.8662ms | 1.1510ms | 868.7751 Ops/s | 869.0379 Ops/s | $\color{#d91a1a}-0.03\\%$ | | test_exec_functorch | 0.2948ms | 0.1824ms | 5.4818 KOps/s | 5.4248 KOps/s | $\color{#35bf28}+1.05\\%$ | | test_exec_functional_call | 0.3537ms | 0.1727ms | 5.7920 KOps/s | 5.5248 KOps/s | $\color{#35bf28}+4.84\\%$ | | test_exec_td | 0.4116ms | 0.1776ms | 5.6293 KOps/s | 5.6338 KOps/s | $\color{#d91a1a}-0.08\\%$ | | test_exec_td_decorator | 0.4003ms | 0.2547ms | 3.9269 KOps/s | 3.8120 KOps/s | $\color{#35bf28}+3.01\\%$ | | test_vmap_mlp_speed[True-True] | 1.0072ms | 0.6082ms | 1.6442 KOps/s | 1.6198 KOps/s | $\color{#35bf28}+1.50\\%$ | | test_vmap_mlp_speed[True-False] | 0.9195ms | 0.5988ms | 1.6701 KOps/s | 1.6277 KOps/s | $\color{#35bf28}+2.61\\%$ | | test_vmap_mlp_speed[False-True] | 0.8293ms | 0.4946ms | 2.0219 KOps/s | 1.9914 KOps/s | $\color{#35bf28}+1.53\\%$ | | test_vmap_mlp_speed[False-False] | 0.8525ms | 0.4930ms | 2.0283 KOps/s | 1.9817 KOps/s | $\color{#35bf28}+2.35\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1402ms | 0.6880ms | 1.4534 KOps/s | 1.4142 KOps/s | $\color{#35bf28}+2.77\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9234ms | 0.6905ms | 1.4481 KOps/s | 1.4088 KOps/s | $\color{#35bf28}+2.79\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8382ms | 0.5723ms | 1.7474 KOps/s | 1.7199 KOps/s | $\color{#35bf28}+1.60\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8732ms | 0.5738ms | 1.7427 KOps/s | 1.6730 KOps/s | $\color{#35bf28}+4.16\\%$ | | test_to_module_speed[True] | 2.0073ms | 1.8084ms | 552.9779 Ops/s | 550.8552 Ops/s | $\color{#35bf28}+0.39\\%$ | | test_to_module_speed[False] | 2.3171ms | 1.7588ms | 568.5854 Ops/s | 565.4237 Ops/s | $\color{#35bf28}+0.56\\%$ | | test_tc_init | 94.1360μs | 45.5000μs | 21.9780 KOps/s | 21.5506 KOps/s | $\color{#35bf28}+1.98\\%$ | | test_tc_init_nested | 0.1626ms | 91.5509μs | 10.9229 KOps/s | 10.7602 KOps/s | $\color{#35bf28}+1.51\\%$ | | test_tc_first_layer_tensor | 48.4510μs | 8.9289μs | 111.9960 KOps/s | 112.5495 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_tc_first_layer_nontensor | 50.5150μs | 8.8573μs | 112.9006 KOps/s | 112.3408 KOps/s | $\color{#35bf28}+0.50\\%$ | | test_tc_second_layer_tensor | 39.9950μs | 2.7677μs | 361.3133 KOps/s | 357.8207 KOps/s | $\color{#35bf28}+0.98\\%$ | | test_tc_second_layer_nontensor | 45.9060μs | 10.0021μs | 99.9794 KOps/s | 100.3131 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_unbind | 0.1112s | 14.3823ms | 69.5298 Ops/s | 71.9933 Ops/s | $\color{#d91a1a}-3.42\\%$ | | test_full_like | 9.1689ms | 8.1587ms | 122.5687 Ops/s | 119.4839 Ops/s | $\color{#35bf28}+2.58\\%$ | | test_zeros_like | 13.7204ms | 7.1133ms | 140.5809 Ops/s | 155.2246 Ops/s | $\textbf{\color{#d91a1a}-9.43\\%}$ | | test_ones_like | 12.5549ms | 7.3979ms | 135.1735 Ops/s | 131.6575 Ops/s | $\color{#35bf28}+2.67\\%$ | | test_clone | 14.5183ms | 9.3802ms | 106.6074 Ops/s | 108.4671 Ops/s | $\color{#d91a1a}-1.71\\%$ | | test_squeeze | 70.9730μs | 14.3891μs | 69.4970 KOps/s | 66.4957 KOps/s | $\color{#35bf28}+4.51\\%$ | | test_unsqueeze | 0.2433ms | 97.0159μs | 10.3076 KOps/s | 10.1945 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_split | 0.4351ms | 0.2062ms | 4.8503 KOps/s | 4.7704 KOps/s | $\color{#35bf28}+1.68\\%$ | | test_permute | 0.3689ms | 0.2231ms | 4.4832 KOps/s | 4.5161 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_stack | 29.0349ms | 24.8992ms | 40.1620 Ops/s | 38.8452 Ops/s | $\color{#35bf28}+3.39\\%$ | | test_cat | 30.2031ms | 25.1404ms | 39.7766 Ops/s | 39.5959 Ops/s | $\color{#35bf28}+0.46\\%$ |
github-actions[bot]
commented
3 months ago
$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests
Total Benchmarks: 219. Improved: $\large\color{#35bf28}28$. Worsened: $\large\color{#d91a1a}22$.
Expand to view detailed results
| Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 27.3710μs | 15.1414μs | 66.0441 KOps/s | 56.9060 KOps/s | $\textbf{\color{#35bf28}+16.06\\%}$ | | test_plain_set_stack_nested | 32.9500μs | 15.2527μs | 65.5623 KOps/s | 55.9915 KOps/s | $\textbf{\color{#35bf28}+17.09\\%}$ | | test_plain_set_nested_inplace | 30.7800μs | 16.2390μs | 61.5803 KOps/s | 53.1371 KOps/s | $\textbf{\color{#35bf28}+15.89\\%}$ | | test_plain_set_stack_nested_inplace | 40.5900μs | 16.2598μs | 61.5013 KOps/s | 53.0424 KOps/s | $\textbf{\color{#35bf28}+15.95\\%}$ | | test_items | 21.5810μs | 4.7301μs | 211.4124 KOps/s | 215.9728 KOps/s | $\color{#d91a1a}-2.11\\%$ | | test_items_nested | 0.4209ms | 0.3916ms | 2.5537 KOps/s | 2.5719 KOps/s | $\color{#d91a1a}-0.71\\%$ | | test_items_nested_locked | 0.4411ms | 0.3991ms | 2.5058 KOps/s | 2.5396 KOps/s | $\color{#d91a1a}-1.33\\%$ | | test_items_nested_leaf | 0.1002ms | 85.8826μs | 11.6438 KOps/s | 11.5775 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_items_stack_nested | 0.4381ms | 0.3939ms | 2.5388 KOps/s | 2.5336 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_items_stack_nested_leaf | 0.1064ms | 84.9205μs | 11.7757 KOps/s | 11.3560 KOps/s | $\color{#35bf28}+3.70\\%$ | | test_items_stack_nested_locked | 0.4431ms | 0.3958ms | 2.5266 KOps/s | 2.5448 KOps/s | $\color{#d91a1a}-0.72\\%$ | | test_keys | 18.9200μs | 4.3799μs | 228.3158 KOps/s | 227.1586 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_keys_nested | 82.2910μs | 65.7747μs | 15.2034 KOps/s | 15.1743 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_keys_nested_locked | 0.7448ms | 73.3553μs | 13.6323 KOps/s | 13.7854 KOps/s | $\color{#d91a1a}-1.11\\%$ | | test_keys_nested_leaf | 75.9120μs | 57.7254μs | 17.3234 KOps/s | 17.2827 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_keys_stack_nested | 86.4710μs | 66.7233μs | 14.9873 KOps/s | 15.0741 KOps/s | $\color{#d91a1a}-0.58\\%$ | | test_keys_stack_nested_leaf | 75.4020μs | 57.9562μs | 17.2544 KOps/s | 17.4592 KOps/s | $\color{#d91a1a}-1.17\\%$ | | test_keys_stack_nested_locked | 96.5410μs | 71.9576μs | 13.8971 KOps/s | 13.6545 KOps/s | $\color{#35bf28}+1.78\\%$ | | test_values | 6.7270μs | 1.7654μs | 566.4363 KOps/s | 568.8099 KOps/s | $\color{#d91a1a}-0.42\\%$ | | test_values_nested | 57.7110μs | 33.8075μs | 29.5793 KOps/s | 29.1466 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_values_nested_locked | 51.4710μs | 35.8068μs | 27.9277 KOps/s | 27.7797 KOps/s | $\color{#35bf28}+0.53\\%$ | | test_values_nested_leaf | 47.3210μs | 30.0946μs | 33.2285 KOps/s | 33.1844 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_values_stack_nested | 56.2010μs | 34.4410μs | 29.0351 KOps/s | 28.5758 KOps/s | $\color{#35bf28}+1.61\\%$ | | test_values_stack_nested_leaf | 47.4000μs | 30.5866μs | 32.6940 KOps/s | 32.4504 KOps/s | $\color{#35bf28}+0.75\\%$ | | test_values_stack_nested_locked | 53.8800μs | 36.6364μs | 27.2953 KOps/s | 27.3799 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_membership | 1.4300μs | 0.5432μs | 1.8409 MOps/s | 1.8184 MOps/s | $\color{#35bf28}+1.24\\%$ | | test_membership_nested | 17.6000μs | 2.1026μs | 475.6073 KOps/s | 484.3089 KOps/s | $\color{#d91a1a}-1.80\\%$ | | test_membership_nested_leaf | 12.7900μs | 2.0160μs | 496.0348 KOps/s | 500.5605 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_membership_stacked_nested | 20.2200μs | 2.0540μs | 486.8443 KOps/s | 492.0337 KOps/s | $\color{#d91a1a}-1.05\\%$ | | test_membership_stacked_nested_leaf | 20.7910μs | 2.0654μs | 484.1714 KOps/s | 486.1489 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_membership_nested_last | 20.2600μs | 2.9800μs | 335.5740 KOps/s | 333.9366 KOps/s | $\color{#35bf28}+0.49\\%$ | | test_membership_nested_leaf_last | 16.5000μs | 2.9780μs | 335.7950 KOps/s | 334.8886 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_membership_stacked_nested_last | 29.2300μs | 9.1698μs | 109.0540 KOps/s | 266.7679 KOps/s | $\textbf{\color{#d91a1a}-59.12\\%}$ | | test_membership_stacked_nested_leaf_last | 22.3210μs | 9.1891μs | 108.8241 KOps/s | 265.7073 KOps/s | $\textbf{\color{#d91a1a}-59.04\\%}$ | | test_nested_getleaf | 25.5200μs | 7.9590μs | 125.6445 KOps/s | 124.2156 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_nested_get | 29.3610μs | 7.5696μs | 132.1070 KOps/s | 131.5146 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_stacked_getleaf | 24.7300μs | 8.0595μs | 124.0771 KOps/s | 124.5102 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_stacked_get | 23.2810μs | 7.5337μs | 132.7373 KOps/s | 132.3468 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_nested_getitemleaf | 23.1800μs | 8.1517μs | 122.6738 KOps/s | 121.7048 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_nested_getitem | 26.1400μs | 7.7070μs | 129.7525 KOps/s | 129.6167 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_stacked_getitemleaf | 25.2000μs | 8.1645μs | 122.4812 KOps/s | 120.9418 KOps/s | $\color{#35bf28}+1.27\\%$ | | test_stacked_getitem | 28.6300μs | 7.7472μs | 129.0785 KOps/s | 129.5912 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_lock_nested | 9.5736ms | 0.4847ms | 2.0632 KOps/s | 2.1069 KOps/s | $\color{#d91a1a}-2.08\\%$ | | test_lock_stack_nested | 0.4801ms | 0.4253ms | 2.3513 KOps/s | 2.3302 KOps/s | $\color{#35bf28}+0.91\\%$ | | test_unlock_nested | 0.8755ms | 0.3976ms | 2.5153 KOps/s | 2.5473 KOps/s | $\color{#d91a1a}-1.26\\%$ | | test_unlock_stack_nested | 0.3852ms | 0.3456ms | 2.8935 KOps/s | 2.8586 KOps/s | $\color{#35bf28}+1.22\\%$ | | test_flatten_speed | 0.5164ms | 0.1057ms | 9.4627 KOps/s | 9.3900 KOps/s | $\color{#35bf28}+0.77\\%$ | | test_unflatten_speed | 0.3745ms | 0.2983ms | 3.3519 KOps/s | 3.3627 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_common_ops | 1.5009ms | 1.2727ms | 785.7325 Ops/s | 732.1976 Ops/s | $\textbf{\color{#35bf28}+7.31\\%}$ | | test_creation | 12.9810μs | 1.9920μs | 502.0124 KOps/s | 501.8603 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_creation_empty | 30.5900μs | 14.1625μs | 70.6092 KOps/s | 52.6688 KOps/s | $\textbf{\color{#35bf28}+34.06\\%}$ | | test_creation_nested_1 | 35.5810μs | 16.3726μs | 61.0777 KOps/s | 47.2739 KOps/s | $\textbf{\color{#35bf28}+29.20\\%}$ | | test_creation_nested_2 | 45.8700μs | 18.7632μs | 53.2957 KOps/s | 41.8751 KOps/s | $\textbf{\color{#35bf28}+27.27\\%}$ | | test_clone | 63.4410μs | 31.2997μs | 31.9491 KOps/s | 33.1140 KOps/s | $\color{#d91a1a}-3.52\\%$ | | test_getitem[int] | 1.1798ms | 17.5408μs | 57.0100 KOps/s | 59.1789 KOps/s | $\color{#d91a1a}-3.66\\%$ | | test_getitem[slice_int] | 0.1545ms | 29.3815μs | 34.0350 KOps/s | 35.7460 KOps/s | $\color{#d91a1a}-4.79\\%$ | | test_getitem[range] | 0.2897ms | 0.1170ms | 8.5447 KOps/s | 8.8840 KOps/s | $\color{#d91a1a}-3.82\\%$ | | test_getitem[tuple] | 90.7422ms | 31.7554μs | 31.4907 KOps/s | 40.2574 KOps/s | $\textbf{\color{#d91a1a}-21.78\\%}$ | | test_getitem[list] | 0.2209ms | 0.1058ms | 9.4528 KOps/s | 9.3472 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_setitem_dim[int] | 69.1710μs | 48.8945μs | 20.4522 KOps/s | 17.2388 KOps/s | $\textbf{\color{#35bf28}+18.64\\%}$ | | test_setitem_dim[slice_int] | 94.8010μs | 73.7816μs | 13.5535 KOps/s | 12.6462 KOps/s | $\textbf{\color{#35bf28}+7.17\\%}$ | | test_setitem_dim[range] | 0.1696ms | 0.1365ms | 7.3258 KOps/s | 7.0829 KOps/s | $\color{#35bf28}+3.43\\%$ | | test_setitem_dim[tuple] | 88.6020μs | 66.3972μs | 15.0609 KOps/s | 13.7305 KOps/s | $\textbf{\color{#35bf28}+9.69\\%}$ | | test_setitem | 69.6410μs | 43.1666μs | 23.1661 KOps/s | 21.4825 KOps/s | $\textbf{\color{#35bf28}+7.84\\%}$ | | test_set | 76.1810μs | 41.6784μs | 23.9932 KOps/s | 21.3725 KOps/s | $\textbf{\color{#35bf28}+12.26\\%}$ | | test_set_shared | 0.3623ms | 55.3823μs | 18.0563 KOps/s | 18.1430 KOps/s | $\color{#d91a1a}-0.48\\%$ | | test_update | 78.6410μs | 48.5991μs | 20.5765 KOps/s | 18.2136 KOps/s | $\textbf{\color{#35bf28}+12.97\\%}$ | | test_update_nested | 81.4320μs | 57.2262μs | 17.4745 KOps/s | 16.5794 KOps/s | $\textbf{\color{#35bf28}+5.40\\%}$ | | test_update__nested | 91.9910μs | 63.6018μs | 15.7228 KOps/s | 16.5414 KOps/s | $\color{#d91a1a}-4.95\\%$ | | test_set_nested | 72.2810μs | 44.7855μs | 22.3286 KOps/s | 21.7380 KOps/s | $\color{#35bf28}+2.72\\%$ | | test_set_nested_new | 85.1610μs | 48.1560μs | 20.7659 KOps/s | 20.0749 KOps/s | $\color{#35bf28}+3.44\\%$ | | test_select | 97.7510μs | 65.1728μs | 15.3438 KOps/s | 15.3615 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_select_nested | 0.2871ms | 52.0244μs | 19.2218 KOps/s | 19.1171 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_exclude_nested | 92.0800μs | 72.5222μs | 13.7889 KOps/s | 14.0325 KOps/s | $\color{#d91a1a}-1.74\\%$ | | test_empty[True] | 0.3478ms | 0.2944ms | 3.3971 KOps/s | 3.3224 KOps/s | $\color{#35bf28}+2.25\\%$ | | test_empty[False] | 2.3940μs | 0.9301μs | 1.0752 MOps/s | 1.0671 MOps/s | $\color{#35bf28}+0.75\\%$ | | test_to | 58.5410μs | 37.4401μs | 26.7093 KOps/s | 26.8413 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_to_nonblocking | 45.2910μs | 23.5774μs | 42.4135 KOps/s | 42.4475 KOps/s | $\color{#d91a1a}-0.08\\%$ | | test_unbind_speed | 1.0720ms | 0.2994ms | 3.3395 KOps/s | 3.3922 KOps/s | $\color{#d91a1a}-1.56\\%$ | | test_unbind_speed_stack0 | 0.3500ms | 0.2971ms | 3.3661 KOps/s | 3.3429 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_unbind_speed_stack1 | 90.2608ms | 0.7562ms | 1.3225 KOps/s | 1.2807 KOps/s | $\color{#35bf28}+3.26\\%$ | | test_split | 93.0321ms | 2.3577ms | 424.1448 Ops/s | 444.1165 Ops/s | $\color{#d91a1a}-4.50\\%$ | | test_chunk | 92.6370ms | 2.3539ms | 424.8271 Ops/s | 439.8925 Ops/s | $\color{#d91a1a}-3.42\\%$ | | test_creation[device0] | 0.1574ms | 0.1087ms | 9.2021 KOps/s | 9.0345 KOps/s | $\color{#35bf28}+1.86\\%$ | | test_creation_from_tensor | 0.1641ms | 0.1059ms | 9.4456 KOps/s | 9.3002 KOps/s | $\color{#35bf28}+1.56\\%$ | | test_add_one[memmap_tensor0] | 26.6800μs | 9.3508μs | 106.9424 KOps/s | 117.1469 KOps/s | $\textbf{\color{#d91a1a}-8.71\\%}$ | | test_contiguous[memmap_tensor0] | 20.5800μs | 2.1557μs | 463.8893 KOps/s | 464.5684 KOps/s | $\color{#d91a1a}-0.15\\%$ | | test_stack[memmap_tensor0] | 36.6710μs | 6.5553μs | 152.5472 KOps/s | 155.8835 KOps/s | $\color{#d91a1a}-2.14\\%$ | | test_memmaptd_index | 1.2773ms | 0.4266ms | 2.3439 KOps/s | 2.4185 KOps/s | $\color{#d91a1a}-3.08\\%$ | | test_memmaptd_index_astensor | 0.8426ms | 0.4950ms | 2.0201 KOps/s | 2.0963 KOps/s | $\color{#d91a1a}-3.64\\%$ | | test_memmaptd_index_op | 1.4278ms | 0.9912ms | 1.0089 KOps/s | 976.5858 Ops/s | $\color{#35bf28}+3.31\\%$ | | test_serialize_model | 99.1610ms | 95.3631ms | 10.4862 Ops/s | 10.1076 Ops/s | $\color{#35bf28}+3.75\\%$ | | test_serialize_model_pickle | 1.3494s | 1.2388s | 0.8072 Ops/s | 0.8056 Ops/s | $\color{#35bf28}+0.21\\%$ | | test_serialize_weights | 0.1871s | 0.1015s | 9.8518 Ops/s | 9.2748 Ops/s | $\textbf{\color{#35bf28}+6.22\\%}$ | | test_serialize_weights_returnearly | 0.2923s | 86.2885ms | 11.5890 Ops/s | 14.2379 Ops/s | $\textbf{\color{#d91a1a}-18.60\\%}$ | | test_serialize_weights_pickle | 1.3486s | 1.2367s | 0.8086 Ops/s | 0.8076 Ops/s | $\color{#35bf28}+0.12\\%$ | | test_reshape_pytree | 69.2610μs | 38.1827μs | 26.1899 KOps/s | 25.8260 KOps/s | $\color{#35bf28}+1.41\\%$ | | test_reshape_td | 0.1619ms | 42.2956μs | 23.6431 KOps/s | 21.5891 KOps/s | $\textbf{\color{#35bf28}+9.51\\%}$ | | test_view_pytree | 0.2606ms | 37.7340μs | 26.5013 KOps/s | 25.4500 KOps/s | $\color{#35bf28}+4.13\\%$ | | test_view_td | 0.2791ms | 51.0204μs | 19.6000 KOps/s | 21.2773 KOps/s | $\textbf{\color{#d91a1a}-7.88\\%}$ | | test_unbind_pytree | 0.1717ms | 36.5937μs | 27.3271 KOps/s | 27.7837 KOps/s | $\color{#d91a1a}-1.64\\%$ | | test_unbind_td | 0.4487ms | 45.5212μs | 21.9678 KOps/s | 21.6340 KOps/s | $\color{#35bf28}+1.54\\%$ | | test_split_pytree | 0.3469ms | 48.9149μs | 20.4437 KOps/s | 19.7128 KOps/s | $\color{#35bf28}+3.71\\%$ | | test_split_td | 0.1722ms | 61.5977μs | 16.2344 KOps/s | 14.5771 KOps/s | $\textbf{\color{#35bf28}+11.37\\%}$ | | test_add_pytree | 0.2831ms | 62.8777μs | 15.9039 KOps/s | 17.0210 KOps/s | $\textbf{\color{#d91a1a}-6.56\\%}$ | | test_add_td | 0.1134ms | 86.7095μs | 11.5328 KOps/s | 9.6200 KOps/s | $\textbf{\color{#35bf28}+19.88\\%}$ | | test_compile_add_one_nested[tensordict-compile] | 0.4161ms | 0.2109ms | 4.7413 KOps/s | 4.5267 KOps/s | $\color{#35bf28}+4.74\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.2625ms | 0.1764ms | 5.6703 KOps/s | 5.8342 KOps/s | $\color{#d91a1a}-2.81\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.1899ms | 0.1496ms | 6.6842 KOps/s | 6.9510 KOps/s | $\color{#d91a1a}-3.84\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.2523ms | 0.1989ms | 5.0267 KOps/s | 5.2484 KOps/s | $\color{#d91a1a}-4.22\\%$ | | test_compile_copy_nested[tensordict-compile] | 50.8810μs | 22.1227μs | 45.2024 KOps/s | 44.6647 KOps/s | $\color{#35bf28}+1.20\\%$ | | test_compile_copy_nested[tensordict-eager] | 85.9810μs | 47.9744μs | 20.8445 KOps/s | 20.3801 KOps/s | $\color{#35bf28}+2.28\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1100ms | 72.5158μs | 13.7901 KOps/s | 13.8436 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_compile_copy_nested[pytree-eager] | 86.7310μs | 59.6629μs | 16.7608 KOps/s | 16.4691 KOps/s | $\color{#35bf28}+1.77\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.3568ms | 0.3251ms | 3.0764 KOps/s | 3.0887 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.2674ms | 0.2230ms | 4.4849 KOps/s | 4.5636 KOps/s | $\color{#d91a1a}-1.72\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.1684ms | 0.1337ms | 7.4782 KOps/s | 7.7359 KOps/s | $\color{#d91a1a}-3.33\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.1423ms | 68.1509μs | 14.6733 KOps/s | 16.0561 KOps/s | $\textbf{\color{#d91a1a}-8.61\\%}$ | | test_compile_add_one_flat[pytree-compile] | 0.3898ms | 0.3251ms | 3.0755 KOps/s | 3.1165 KOps/s | $\color{#d91a1a}-1.31\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.7126ms | 0.6499ms | 1.5388 KOps/s | 1.6097 KOps/s | $\color{#d91a1a}-4.41\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.3372ms | 0.2724ms | 3.6714 KOps/s | 3.7293 KOps/s | $\color{#d91a1a}-1.55\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.3941ms | 0.3277ms | 3.0511 KOps/s | 3.0708 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.1830ms | 77.0512μs | 12.9784 KOps/s | 13.4235 KOps/s | $\color{#d91a1a}-3.32\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.2973ms | 0.1312ms | 7.6205 KOps/s | 7.7391 KOps/s | $\color{#d91a1a}-1.53\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.6512ms | 0.5517ms | 1.8126 KOps/s | 1.8620 KOps/s | $\color{#d91a1a}-2.66\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.3726ms | 0.3244ms | 3.0830 KOps/s | 3.0996 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_compile_copy_flat[tensordict-compile] | 44.5710μs | 19.3565μs | 51.6622 KOps/s | 54.0426 KOps/s | $\color{#d91a1a}-4.40\\%$ | | test_compile_copy_flat[tensordict-eager] | 58.7810μs | 32.4388μs | 30.8272 KOps/s | 31.6346 KOps/s | $\color{#d91a1a}-2.55\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1098ms | 76.8954μs | 13.0047 KOps/s | 13.3156 KOps/s | $\color{#d91a1a}-2.34\\%$ | | test_compile_copy_flat[pytree-eager] | 91.9610μs | 61.5015μs | 16.2598 KOps/s | 16.4945 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.5295ms | 0.9293ms | 1.0761 KOps/s | 1.0878 KOps/s | $\color{#d91a1a}-1.07\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.5678ms | 3.3070ms | 302.3920 Ops/s | 306.0762 Ops/s | $\color{#d91a1a}-1.20\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.5050ms | 0.9130ms | 1.0953 KOps/s | 1.1121 KOps/s | $\color{#d91a1a}-1.50\\%$ | | test_compile_assign_and_add[pytree-eager] | 3.4382ms | 3.3102ms | 302.0988 Ops/s | 292.3938 Ops/s | $\color{#35bf28}+3.32\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.1556ms | 0.1175ms | 8.5081 KOps/s | 9.1893 KOps/s | $\textbf{\color{#d91a1a}-7.41\\%}$ | | test_compile_indexing[tensor-tensordict-eager] | 0.2398ms | 67.4258μs | 14.8311 KOps/s | 15.3794 KOps/s | $\color{#d91a1a}-3.57\\%$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.1546ms | 0.1066ms | 9.3836 KOps/s | 9.8849 KOps/s | $\textbf{\color{#d91a1a}-5.07\\%}$ | | test_compile_indexing[tensor-tensorclass-eager] | 86.5310μs | 47.3413μs | 21.1232 KOps/s | 22.9062 KOps/s | $\textbf{\color{#d91a1a}-7.78\\%}$ | | test_compile_indexing[tensor-pytree-compile] | 0.1697ms | 0.1090ms | 9.1710 KOps/s | 9.3497 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_compile_indexing[tensor-pytree-eager] | 95.2510μs | 48.3535μs | 20.6810 KOps/s | 20.9085 KOps/s | $\color{#d91a1a}-1.09\\%$ | | test_compile_indexing[slice-tensordict-compile] | 0.2029ms | 0.1450ms | 6.8987 KOps/s | 7.0965 KOps/s | $\color{#d91a1a}-2.79\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.1970ms | 27.4978μs | 36.3666 KOps/s | 38.6458 KOps/s | $\textbf{\color{#d91a1a}-5.90\\%}$ | | test_compile_indexing[slice-tensorclass-compile] | 0.1945ms | 0.1307ms | 7.6520 KOps/s | 7.6378 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 67.3910μs | 22.8736μs | 43.7185 KOps/s | 43.9232 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.1831ms | 0.1358ms | 7.3619 KOps/s | 7.8090 KOps/s | $\textbf{\color{#d91a1a}-5.73\\%}$ | | test_compile_indexing[slice-pytree-eager] | 61.1310μs | 23.2748μs | 42.9649 KOps/s | 43.9756 KOps/s | $\color{#d91a1a}-2.30\\%$ | | test_compile_indexing[int-tensordict-compile] | 0.1858ms | 0.1450ms | 6.8968 KOps/s | 7.2133 KOps/s | $\color{#d91a1a}-4.39\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.5119ms | 27.0115μs | 37.0212 KOps/s | 38.2362 KOps/s | $\color{#d91a1a}-3.18\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.2015ms | 0.1348ms | 7.4202 KOps/s | 7.4668 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_compile_indexing[int-tensorclass-eager] | 51.5610μs | 22.5299μs | 44.3855 KOps/s | 41.4261 KOps/s | $\textbf{\color{#35bf28}+7.14\\%}$ | | test_compile_indexing[int-pytree-compile] | 0.2025ms | 0.1321ms | 7.5710 KOps/s | 7.4343 KOps/s | $\color{#35bf28}+1.84\\%$ | | test_compile_indexing[int-pytree-eager] | 52.2200μs | 22.7864μs | 43.8859 KOps/s | 45.5609 KOps/s | $\color{#d91a1a}-3.68\\%$ | | test_mod_add[eager] | 73.1710μs | 36.2539μs | 27.5832 KOps/s | 24.3936 KOps/s | $\textbf{\color{#35bf28}+13.08\\%}$ | | test_mod_add[compile] | 0.2515ms | 68.5279μs | 14.5926 KOps/s | 13.8913 KOps/s | $\textbf{\color{#35bf28}+5.05\\%}$ | | test_mod_add[compile-overhead] | 0.2813ms | 0.1489ms | 6.7137 KOps/s | 6.9530 KOps/s | $\color{#d91a1a}-3.44\\%$ | | test_mod_wrap[eager] | 0.3545ms | 0.2695ms | 3.7100 KOps/s | 3.8816 KOps/s | $\color{#d91a1a}-4.42\\%$ | | test_mod_wrap[compile] | 1.2157ms | 0.2976ms | 3.3606 KOps/s | 3.4602 KOps/s | $\color{#d91a1a}-2.88\\%$ | | test_mod_wrap[compile-overhead] | 8.0950ms | 4.3303ms | 230.9317 Ops/s | 228.0280 Ops/s | $\color{#35bf28}+1.27\\%$ | | test_mod_wrap_and_backward[eager] | 1.6972ms | 1.4605ms | 684.6807 Ops/s | 754.1422 Ops/s | $\textbf{\color{#d91a1a}-9.21\\%}$ | | test_mod_wrap_and_backward[compile] | 1.6104ms | 1.4623ms | 683.8627 Ops/s | 753.8905 Ops/s | $\textbf{\color{#d91a1a}-9.29\\%}$ | | test_mod_wrap_and_backward[compile-overhead] | 1.4641ms | 0.9861ms | 1.0141 KOps/s | 1.1245 KOps/s | $\textbf{\color{#d91a1a}-9.82\\%}$ | | test_seq_add[eager] | 0.1602ms | 0.1084ms | 9.2219 KOps/s | 9.2415 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_seq_add[compile] | 0.1197ms | 85.1953μs | 11.7377 KOps/s | 11.8967 KOps/s | $\color{#d91a1a}-1.34\\%$ | | test_seq_add[compile-overhead] | 0.1555ms | 0.1215ms | 8.2330 KOps/s | 8.1970 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_seq_wrap[eager] | 0.4877ms | 0.4114ms | 2.4309 KOps/s | 2.3684 KOps/s | $\color{#35bf28}+2.64\\%$ | | test_seq_wrap[compile] | 1.5088ms | 0.3295ms | 3.0352 KOps/s | 3.0949 KOps/s | $\color{#d91a1a}-1.93\\%$ | | test_seq_wrap[compile-overhead] | 0.3088s | 0.1463s | 6.8372 Ops/s | 6.7425 Ops/s | $\color{#35bf28}+1.41\\%$ | | test_func_call_runtime[False-eager] | 0.8527ms | 0.7587ms | 1.3180 KOps/s | 1.4001 KOps/s | $\textbf{\color{#d91a1a}-5.86\\%}$ | | test_func_call_runtime[False-compile] | 0.9341ms | 0.8208ms | 1.2184 KOps/s | 1.2633 KOps/s | $\color{#d91a1a}-3.56\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.4145ms | 0.3586ms | 2.7889 KOps/s | 2.7976 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_func_call_runtime[True-eager] | 1.1115ms | 0.9893ms | 1.0108 KOps/s | 1.0315 KOps/s | $\color{#d91a1a}-2.00\\%$ | | test_func_call_runtime[True-compile] | 0.9558ms | 0.8566ms | 1.1674 KOps/s | 1.1902 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.4452ms | 0.3996ms | 2.5026 KOps/s | 2.5052 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_distributed | 0.2720ms | 68.6665μs | 14.5631 KOps/s | 14.9607 KOps/s | $\color{#d91a1a}-2.66\\%$ | | test_tdmodule | 29.0300μs | 14.2895μs | 69.9814 KOps/s | 60.2981 KOps/s | $\textbf{\color{#35bf28}+16.06\\%}$ | | test_tdmodule_dispatch | 44.6400μs | 29.1443μs | 34.3120 KOps/s | 28.7129 KOps/s | $\textbf{\color{#35bf28}+19.50\\%}$ | | test_tdseq | 30.2500μs | 14.6686μs | 68.1729 KOps/s | 57.3207 KOps/s | $\textbf{\color{#35bf28}+18.93\\%}$ | | test_tdseq_dispatch | 46.9200μs | 31.0136μs | 32.2439 KOps/s | 26.8054 KOps/s | $\textbf{\color{#35bf28}+20.29\\%}$ | | test_instantiation_functorch | 2.1614ms | 2.0311ms | 492.3320 Ops/s | 505.6022 Ops/s | $\color{#d91a1a}-2.62\\%$ | | test_instantiation_td | 2.0039ms | 1.3135ms | 761.3393 Ops/s | 785.8436 Ops/s | $\color{#d91a1a}-3.12\\%$ | | test_exec_functorch | 0.2712ms | 0.2276ms | 4.3940 KOps/s | 4.6076 KOps/s | $\color{#d91a1a}-4.64\\%$ | | test_exec_functional_call | 0.2979ms | 0.2322ms | 4.3072 KOps/s | 4.7255 KOps/s | $\textbf{\color{#d91a1a}-8.85\\%}$ | | test_exec_td | 0.2913ms | 0.2328ms | 4.2954 KOps/s | 4.7491 KOps/s | $\textbf{\color{#d91a1a}-9.55\\%}$ | | test_exec_td_decorator | 0.5110ms | 0.2961ms | 3.3767 KOps/s | 3.5250 KOps/s | $\color{#d91a1a}-4.21\\%$ | | test_vmap_mlp_speed[True-True] | 0.8371ms | 0.6707ms | 1.4910 KOps/s | 1.5288 KOps/s | $\color{#d91a1a}-2.47\\%$ | | test_vmap_mlp_speed[True-False] | 0.6991ms | 0.6470ms | 1.5457 KOps/s | 1.5416 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_vmap_mlp_speed[False-True] | 0.6586ms | 0.5988ms | 1.6700 KOps/s | 1.7682 KOps/s | $\textbf{\color{#d91a1a}-5.55\\%}$ | | test_vmap_mlp_speed[False-False] | 0.6650ms | 0.5927ms | 1.6872 KOps/s | 1.7679 KOps/s | $\color{#d91a1a}-4.57\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.4735ms | 0.7370ms | 1.3568 KOps/s | 1.3752 KOps/s | $\color{#d91a1a}-1.34\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9198ms | 0.7324ms | 1.3653 KOps/s | 1.3736 KOps/s | $\color{#d91a1a}-0.60\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7536ms | 0.6467ms | 1.5462 KOps/s | 1.5822 KOps/s | $\color{#d91a1a}-2.27\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8538ms | 0.6420ms | 1.5577 KOps/s | 1.5909 KOps/s | $\color{#d91a1a}-2.08\\%$ | | test_vmap_transformer_speed[True-True] | 8.6960ms | 8.5929ms | 116.3754 Ops/s | 117.3778 Ops/s | $\color{#d91a1a}-0.85\\%$ | | test_vmap_transformer_speed[True-False] | 9.9493ms | 8.5856ms | 116.4736 Ops/s | 117.7642 Ops/s | $\color{#d91a1a}-1.10\\%$ | | test_vmap_transformer_speed[False-True] | 9.1462ms | 8.5031ms | 117.6043 Ops/s | 119.2965 Ops/s | $\color{#d91a1a}-1.42\\%$ | | test_vmap_transformer_speed[False-False] | 8.6143ms | 8.5146ms | 117.4448 Ops/s | 119.2193 Ops/s | $\color{#d91a1a}-1.49\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 20.6980ms | 20.5527ms | 48.6555 Ops/s | 49.3404 Ops/s | $\color{#d91a1a}-1.39\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 21.3069ms | 20.6385ms | 48.4532 Ops/s | 49.6135 Ops/s | $\color{#d91a1a}-2.34\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 20.3946ms | 20.3337ms | 49.1794 Ops/s | 50.0310 Ops/s | $\color{#d91a1a}-1.70\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 20.9650ms | 20.3444ms | 49.1535 Ops/s | 50.0771 Ops/s | $\color{#d91a1a}-1.84\\%$ | | test_to_module_speed[True] | 1.9913ms | 1.4713ms | 679.6804 Ops/s | 683.6216 Ops/s | $\color{#d91a1a}-0.58\\%$ | | test_to_module_speed[False] | 1.8510ms | 1.4266ms | 700.9575 Ops/s | 689.4729 Ops/s | $\color{#35bf28}+1.67\\%$ | | test_tc_init | 53.3610μs | 33.9812μs | 29.4281 KOps/s | 25.4945 KOps/s | $\textbf{\color{#35bf28}+15.43\\%}$ | | test_tc_init_nested | 89.4710μs | 66.6847μs | 14.9959 KOps/s | 12.4471 KOps/s | $\textbf{\color{#35bf28}+20.48\\%}$ | | test_tc_first_layer_tensor | 18.5400μs | 3.9871μs | 250.8065 KOps/s | 253.4125 KOps/s | $\color{#d91a1a}-1.03\\%$ | | test_tc_first_layer_nontensor | 18.9200μs | 4.0049μs | 249.6947 KOps/s | 252.4676 KOps/s | $\color{#d91a1a}-1.10\\%$ | | test_tc_second_layer_tensor | 30.3403μs | 1.2813μs | 780.4631 KOps/s | 768.3119 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_tc_second_layer_nontensor | 19.9400μs | 4.5977μs | 217.4984 KOps/s | 217.8786 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_unbind | 0.3237s | 12.7907ms | 78.1821 Ops/s | 76.3736 Ops/s | $\color{#35bf28}+2.37\\%$ | | test_full_like | 0.6613ms | 0.5796ms | 1.7252 KOps/s | 1.7368 KOps/s | $\color{#d91a1a}-0.67\\%$ | | test_zeros_like | 0.2668ms | 0.1978ms | 5.0545 KOps/s | 5.0549 KOps/s | $-0.01\\%$ | | test_ones_like | 0.2269ms | 0.1976ms | 5.0606 KOps/s | 5.0602 KOps/s | $+0.01\\%$ | | test_clone | 0.4414ms | 0.4154ms | 2.4072 KOps/s | 2.4162 KOps/s | $\color{#d91a1a}-0.37\\%$ | | test_squeeze | 41.2500μs | 11.2432μs | 88.9425 KOps/s | 88.7744 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_unsqueeze | 0.2639ms | 84.5050μs | 11.8336 KOps/s | 12.7367 KOps/s | $\textbf{\color{#d91a1a}-7.09\\%}$ | | test_split | 0.5042ms | 0.1865ms | 5.3622 KOps/s | 5.6489 KOps/s | $\textbf{\color{#d91a1a}-5.07\\%}$ | | test_permute | 0.2450ms | 0.1875ms | 5.3343 KOps/s | 5.4260 KOps/s | $\color{#d91a1a}-1.69\\%$ | | test_stack | 1.2580ms | 0.9080ms | 1.1013 KOps/s | 1.0941 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_cat | 1.2548ms | 1.2313ms | 812.1279 Ops/s | 811.9622 Ops/s | $\color{#35bf28}+0.02\\%$ |
Stack from ghstack (oldest at bottom):
cc @shagunsodhani