pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[BugFix] fix construction of lazy stacks from tds #903

Closed vmoens closed 1 month ago

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 49.2720μs | 22.2855μs | 44.8723 KOps/s | 42.7390 KOps/s | $\color{#35bf28}+4.99\\%$ | | test_plain_set_stack_nested | 48.8110μs | 22.9259μs | 43.6187 KOps/s | 42.2797 KOps/s | $\color{#35bf28}+3.17\\%$ | | test_plain_set_nested_inplace | 82.2340μs | 24.4881μs | 40.8362 KOps/s | 39.0839 KOps/s | $\color{#35bf28}+4.48\\%$ | | test_plain_set_stack_nested_inplace | 63.8400μs | 24.4623μs | 40.8793 KOps/s | 38.4389 KOps/s | $\textbf{\color{#35bf28}+6.35\\%}$ | | test_items | 15.5200μs | 2.6138μs | 382.5777 KOps/s | 379.9580 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_items_nested | 0.4566ms | 0.3600ms | 2.7782 KOps/s | 2.7890 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_items_nested_locked | 0.7499ms | 0.3601ms | 2.7770 KOps/s | 2.7786 KOps/s | $\color{#d91a1a}-0.05\\%$ | | test_items_nested_leaf | 0.1846ms | 86.7740μs | 11.5242 KOps/s | 11.4488 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_items_stack_nested | 3.0199ms | 0.3680ms | 2.7175 KOps/s | 2.7586 KOps/s | $\color{#d91a1a}-1.49\\%$ | | test_items_stack_nested_leaf | 0.1507ms | 89.1076μs | 11.2224 KOps/s | 11.3845 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_items_stack_nested_locked | 0.4800ms | 0.3631ms | 2.7543 KOps/s | 2.6225 KOps/s | $\textbf{\color{#35bf28}+5.02\\%}$ | | test_keys | 39.8940μs | 3.9425μs | 253.6479 KOps/s | 257.3159 KOps/s | $\color{#d91a1a}-1.43\\%$ | | test_keys_nested | 0.3227ms | 0.1416ms | 7.0642 KOps/s | 6.8183 KOps/s | $\color{#35bf28}+3.61\\%$ | | test_keys_nested_locked | 0.7158ms | 0.1470ms | 6.8024 KOps/s | 6.4852 KOps/s | $\color{#35bf28}+4.89\\%$ | | test_keys_nested_leaf | 0.2857ms | 0.1271ms | 7.8684 KOps/s | 7.8971 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_keys_stack_nested | 0.2704ms | 0.1438ms | 6.9563 KOps/s | 6.7306 KOps/s | $\color{#35bf28}+3.35\\%$ | | test_keys_stack_nested_leaf | 0.1744ms | 0.1227ms | 8.1478 KOps/s | 7.9321 KOps/s | $\color{#35bf28}+2.72\\%$ | | test_keys_stack_nested_locked | 0.2947ms | 0.1497ms | 6.6785 KOps/s | 6.5688 KOps/s | $\color{#35bf28}+1.67\\%$ | | test_values | 6.0763μs | 1.1877μs | 841.9538 KOps/s | 883.7682 KOps/s | $\color{#d91a1a}-4.73\\%$ | | test_values_nested | 99.0660μs | 50.5060μs | 19.7996 KOps/s | 19.5500 KOps/s | $\color{#35bf28}+1.28\\%$ | | test_values_nested_locked | 0.1092ms | 50.3240μs | 19.8713 KOps/s | 19.5833 KOps/s | $\color{#35bf28}+1.47\\%$ | | test_values_nested_leaf | 95.4180μs | 45.2899μs | 22.0800 KOps/s | 21.6677 KOps/s | $\color{#35bf28}+1.90\\%$ | | test_values_stack_nested | 0.1042ms | 51.9152μs | 19.2622 KOps/s | 19.6115 KOps/s | $\color{#d91a1a}-1.78\\%$ | | test_values_stack_nested_leaf | 0.1003ms | 46.1028μs | 21.6907 KOps/s | 21.9318 KOps/s | $\color{#d91a1a}-1.10\\%$ | | test_values_stack_nested_locked | 0.1286ms | 51.4685μs | 19.4294 KOps/s | 19.7335 KOps/s | $\color{#d91a1a}-1.54\\%$ | | test_membership | 2.5122μs | 0.7212μs | 1.3866 MOps/s | 1.3687 MOps/s | $\color{#35bf28}+1.31\\%$ | | test_membership_nested | 40.2150μs | 2.6685μs | 374.7382 KOps/s | 364.9507 KOps/s | $\color{#35bf28}+2.68\\%$ | | test_membership_nested_leaf | 29.7250μs | 2.7212μs | 367.4848 KOps/s | 364.2572 KOps/s | $\color{#35bf28}+0.89\\%$ | | test_membership_stacked_nested | 28.0430μs | 2.6975μs | 370.7182 KOps/s | 367.1618 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_membership_stacked_nested_leaf | 34.7850μs | 2.7168μs | 368.0791 KOps/s | 358.3240 KOps/s | $\color{#35bf28}+2.72\\%$ | | test_membership_nested_last | 30.1970μs | 4.0280μs | 248.2624 KOps/s | 244.4917 KOps/s | $\color{#35bf28}+1.54\\%$ | | test_membership_nested_leaf_last | 24.1560μs | 4.0332μs | 247.9432 KOps/s | 242.5214 KOps/s | $\color{#35bf28}+2.24\\%$ | | test_membership_stacked_nested_last | 51.0060μs | 6.4856μs | 154.1874 KOps/s | 245.6979 KOps/s | $\textbf{\color{#d91a1a}-37.25\\%}$ | | test_membership_stacked_nested_leaf_last | 36.8190μs | 6.5368μs | 152.9811 KOps/s | 244.5942 KOps/s | $\textbf{\color{#d91a1a}-37.46\\%}$ | | test_nested_getleaf | 51.7970μs | 10.9354μs | 91.4462 KOps/s | 91.4411 KOps/s | $+0.01\\%$ | | test_nested_get | 43.1110μs | 10.3899μs | 96.2477 KOps/s | 94.6849 KOps/s | $\color{#35bf28}+1.65\\%$ | | test_stacked_getleaf | 52.8080μs | 10.9223μs | 91.5560 KOps/s | 91.5072 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_stacked_get | 47.3690μs | 10.2592μs | 97.4735 KOps/s | 95.7215 KOps/s | $\color{#35bf28}+1.83\\%$ | | test_nested_getitemleaf | 53.9910μs | 11.3022μs | 88.4784 KOps/s | 87.0689 KOps/s | $\color{#35bf28}+1.62\\%$ | | test_nested_getitem | 43.8820μs | 10.4269μs | 95.9059 KOps/s | 94.6114 KOps/s | $\color{#35bf28}+1.37\\%$ | | test_stacked_getitemleaf | 49.8640μs | 11.2563μs | 88.8394 KOps/s | 87.5458 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_stacked_getitem | 46.0860μs | 10.3558μs | 96.5644 KOps/s | 94.4899 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_lock_nested | 1.9583ms | 0.5054ms | 1.9788 KOps/s | 1.7331 KOps/s | $\textbf{\color{#35bf28}+14.18\\%}$ | | test_lock_stack_nested | 0.9949ms | 0.4747ms | 2.1067 KOps/s | 2.0808 KOps/s | $\color{#35bf28}+1.24\\%$ | | test_unlock_nested | 0.8011ms | 0.4247ms | 2.3548 KOps/s | 2.0233 KOps/s | $\textbf{\color{#35bf28}+16.38\\%}$ | | test_unlock_stack_nested | 0.6360ms | 0.3855ms | 2.5940 KOps/s | 2.5304 KOps/s | $\color{#35bf28}+2.51\\%$ | | test_flatten_speed | 0.1868ms | 0.1058ms | 9.4544 KOps/s | 8.8539 KOps/s | $\textbf{\color{#35bf28}+6.78\\%}$ | | test_unflatten_speed | 0.7633ms | 0.4492ms | 2.2264 KOps/s | 2.2241 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_common_ops | 3.2686ms | 1.1575ms | 863.9434 Ops/s | 852.9139 Ops/s | $\color{#35bf28}+1.29\\%$ | | test_creation | 20.3580μs | 2.4193μs | 413.3345 KOps/s | 402.7558 KOps/s | $\color{#35bf28}+2.63\\%$ | | test_creation_empty | 49.4730μs | 19.7741μs | 50.5711 KOps/s | 46.8033 KOps/s | $\textbf{\color{#35bf28}+8.05\\%}$ | | test_creation_nested_1 | 76.2720μs | 23.0009μs | 43.4767 KOps/s | 40.3662 KOps/s | $\textbf{\color{#35bf28}+7.71\\%}$ | | test_creation_nested_2 | 68.9390μs | 27.2763μs | 36.6619 KOps/s | 34.1308 KOps/s | $\textbf{\color{#35bf28}+7.42\\%}$ | | test_clone | 72.1450μs | 17.6846μs | 56.5464 KOps/s | 56.8244 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_getitem[int] | 1.0281ms | 12.6741μs | 78.9012 KOps/s | 78.7254 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_getitem[slice_int] | 0.1272ms | 32.1613μs | 31.0933 KOps/s | 31.0135 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_getitem[range] | 0.1655ms | 55.7199μs | 17.9469 KOps/s | 17.8422 KOps/s | $\color{#35bf28}+0.59\\%$ | | test_getitem[tuple] | 0.1350ms | 26.2153μs | 38.1457 KOps/s | 38.0744 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_getitem[list] | 0.1994ms | 51.2810μs | 19.5004 KOps/s | 19.4239 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_setitem_dim[int] | 71.3630μs | 33.6576μs | 29.7110 KOps/s | 27.0762 KOps/s | $\textbf{\color{#35bf28}+9.73\\%}$ | | test_setitem_dim[slice_int] | 0.1090ms | 71.8159μs | 13.9245 KOps/s | 13.2837 KOps/s | $\color{#35bf28}+4.82\\%$ | | test_setitem_dim[range] | 0.2081ms | 93.6950μs | 10.6729 KOps/s | 10.3656 KOps/s | $\color{#35bf28}+2.96\\%$ | | test_setitem_dim[tuple] | 0.1076ms | 59.8471μs | 16.7092 KOps/s | 16.2308 KOps/s | $\color{#35bf28}+2.95\\%$ | | test_setitem | 0.1306ms | 31.5577μs | 31.6880 KOps/s | 31.6993 KOps/s | $\color{#d91a1a}-0.04\\%$ | | test_set | 96.5010μs | 30.9964μs | 32.2619 KOps/s | 32.3249 KOps/s | $\color{#d91a1a}-0.20\\%$ | | test_set_shared | 1.1797ms | 0.2146ms | 4.6608 KOps/s | 4.5523 KOps/s | $\color{#35bf28}+2.38\\%$ | | test_update | 0.1861ms | 39.1495μs | 25.5431 KOps/s | 24.9457 KOps/s | $\color{#35bf28}+2.39\\%$ | | test_update_nested | 0.1165ms | 49.0008μs | 20.4078 KOps/s | 19.8303 KOps/s | $\color{#35bf28}+2.91\\%$ | | test_update__nested | 94.8880μs | 35.3859μs | 28.2598 KOps/s | 28.7580 KOps/s | $\color{#d91a1a}-1.73\\%$ | | test_set_nested | 0.1066ms | 33.4605μs | 29.8859 KOps/s | 29.9224 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_set_nested_new | 1.0410ms | 38.2306μs | 26.1571 KOps/s | 26.0420 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_select | 0.1260ms | 55.3252μs | 18.0750 KOps/s | 18.2649 KOps/s | $\color{#d91a1a}-1.04\\%$ | | test_select_nested | 0.1268ms | 61.0573μs | 16.3781 KOps/s | 15.9928 KOps/s | $\color{#35bf28}+2.41\\%$ | | test_exclude_nested | 0.1561ms | 80.5609μs | 12.4130 KOps/s | 12.0675 KOps/s | $\color{#35bf28}+2.86\\%$ | | test_empty[True] | 0.6316ms | 0.3431ms | 2.9143 KOps/s | 2.8726 KOps/s | $\color{#35bf28}+1.45\\%$ | | test_empty[False] | 10.2115μs | 1.2786μs | 782.1150 KOps/s | 775.0943 KOps/s | $\color{#35bf28}+0.91\\%$ | | test_unbind_speed | 0.4084ms | 0.3233ms | 3.0932 KOps/s | 3.1095 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_unbind_speed_stack0 | 0.6216ms | 0.3149ms | 3.1752 KOps/s | 3.1448 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_unbind_speed_stack1 | 74.8160ms | 0.8046ms | 1.2428 KOps/s | 1.2170 KOps/s | $\color{#35bf28}+2.12\\%$ | | test_split | 75.5066ms | 2.2335ms | 447.7210 Ops/s | 445.7276 Ops/s | $\color{#35bf28}+0.45\\%$ | | test_chunk | 77.1338ms | 2.2352ms | 447.3848 Ops/s | 442.8774 Ops/s | $\color{#35bf28}+1.02\\%$ | | test_creation[device0] | 4.1172ms | 0.1228ms | 8.1427 KOps/s | 8.4186 KOps/s | $\color{#d91a1a}-3.28\\%$ | | test_creation_from_tensor | 0.2440ms | 0.1200ms | 8.3346 KOps/s | 8.3406 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_add_one[memmap_tensor0] | 0.2157ms | 7.9516μs | 125.7608 KOps/s | 124.9103 KOps/s | $\color{#35bf28}+0.68\\%$ | | test_contiguous[memmap_tensor0] | 17.3420μs | 2.2770μs | 439.1660 KOps/s | 458.1799 KOps/s | $\color{#d91a1a}-4.15\\%$ | | test_stack[memmap_tensor0] | 76.0620μs | 6.1112μs | 163.6342 KOps/s | 165.2353 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_memmaptd_index | 1.1875ms | 0.4443ms | 2.2505 KOps/s | 2.3377 KOps/s | $\color{#d91a1a}-3.73\\%$ | | test_memmaptd_index_astensor | 1.0132ms | 0.5260ms | 1.9011 KOps/s | 1.9603 KOps/s | $\color{#d91a1a}-3.02\\%$ | | test_memmaptd_index_op | 1.4525ms | 1.1224ms | 890.9379 Ops/s | 899.6723 Ops/s | $\color{#d91a1a}-0.97\\%$ | | test_serialize_model | 0.2000s | 0.1380s | 7.2467 Ops/s | 7.9011 Ops/s | $\textbf{\color{#d91a1a}-8.28\\%}$ | | test_serialize_model_pickle | 0.4419s | 0.3899s | 2.5650 Ops/s | 2.5152 Ops/s | $\color{#35bf28}+1.98\\%$ | | test_serialize_weights | 0.1303s | 0.1259s | 7.9430 Ops/s | 7.9491 Ops/s | $\color{#d91a1a}-0.08\\%$ | | test_serialize_weights_returnearly | 0.1771s | 0.1685s | 5.9334 Ops/s | 6.1534 Ops/s | $\color{#d91a1a}-3.58\\%$ | | test_serialize_weights_pickle | 0.4661s | 0.4351s | 2.2981 Ops/s | 2.5171 Ops/s | $\textbf{\color{#d91a1a}-8.70\\%}$ | | test_serialize_weights_filesystem | 0.1477s | 0.1424s | 7.0239 Ops/s | 7.1923 Ops/s | $\color{#d91a1a}-2.34\\%$ | | test_serialize_model_filesystem | 0.1556s | 0.1492s | 6.7036 Ops/s | 6.4918 Ops/s | $\color{#35bf28}+3.26\\%$ | | test_reshape_pytree | 87.9050μs | 39.3162μs | 25.4348 KOps/s | 25.6527 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_reshape_td | 0.1080ms | 49.0949μs | 20.3687 KOps/s | 20.6882 KOps/s | $\color{#d91a1a}-1.54\\%$ | | test_view_pytree | 94.7470μs | 39.4754μs | 25.3322 KOps/s | 26.5172 KOps/s | $\color{#d91a1a}-4.47\\%$ | | test_view_td | 0.1209ms | 55.2754μs | 18.0912 KOps/s | 18.6606 KOps/s | $\color{#d91a1a}-3.05\\%$ | | test_unbind_pytree | 0.1087ms | 35.6410μs | 28.0576 KOps/s | 28.6428 KOps/s | $\color{#d91a1a}-2.04\\%$ | | test_unbind_td | 0.3461ms | 47.6733μs | 20.9761 KOps/s | 21.5453 KOps/s | $\color{#d91a1a}-2.64\\%$ | | test_split_pytree | 0.1019ms | 39.4119μs | 25.3731 KOps/s | 26.2251 KOps/s | $\color{#d91a1a}-3.25\\%$ | | test_split_td | 0.4871ms | 60.9138μs | 16.4166 KOps/s | 16.5681 KOps/s | $\color{#d91a1a}-0.91\\%$ | | test_add_pytree | 0.1237ms | 44.3986μs | 22.5232 KOps/s | 22.7363 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_add_td | 0.2224ms | 86.7191μs | 11.5315 KOps/s | 10.8774 KOps/s | $\textbf{\color{#35bf28}+6.01\\%}$ | | test_distributed | 0.2550ms | 0.1279ms | 7.8157 KOps/s | 7.6006 KOps/s | $\color{#35bf28}+2.83\\%$ | | test_tdmodule | 57.0170μs | 17.5747μs | 56.8999 KOps/s | 52.6870 KOps/s | $\textbf{\color{#35bf28}+8.00\\%}$ | | test_tdmodule_dispatch | 62.2360μs | 35.8375μs | 27.9037 KOps/s | 25.7655 KOps/s | $\textbf{\color{#35bf28}+8.30\\%}$ | | test_tdseq | 37.7000μs | 19.0754μs | 52.4235 KOps/s | 50.2583 KOps/s | $\color{#35bf28}+4.31\\%$ | | test_tdseq_dispatch | 69.7100μs | 40.3265μs | 24.7976 KOps/s | 23.6025 KOps/s | $\textbf{\color{#35bf28}+5.06\\%}$ | | test_instantiation_functorch | 1.7763ms | 1.5799ms | 632.9606 Ops/s | 629.1871 Ops/s | $\color{#35bf28}+0.60\\%$ | | test_instantiation_td | 2.1670ms | 1.1505ms | 869.2249 Ops/s | 874.5055 Ops/s | $\color{#d91a1a}-0.60\\%$ | | test_exec_functorch | 0.3290ms | 0.1870ms | 5.3467 KOps/s | 5.4983 KOps/s | $\color{#d91a1a}-2.76\\%$ | | test_exec_functional_call | 0.3506ms | 0.1739ms | 5.7517 KOps/s | 5.6680 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_exec_td | 0.4519ms | 0.1744ms | 5.7355 KOps/s | 5.7015 KOps/s | $\color{#35bf28}+0.60\\%$ | | test_exec_td_decorator | 1.1743ms | 0.2627ms | 3.8068 KOps/s | 3.8418 KOps/s | $\color{#d91a1a}-0.91\\%$ | | test_vmap_mlp_speed[True-True] | 0.7648ms | 0.6104ms | 1.6383 KOps/s | 1.6391 KOps/s | $\color{#d91a1a}-0.05\\%$ | | test_vmap_mlp_speed[True-False] | 0.9684ms | 0.6094ms | 1.6408 KOps/s | 1.6101 KOps/s | $\color{#35bf28}+1.91\\%$ | | test_vmap_mlp_speed[False-True] | 0.9141ms | 0.5080ms | 1.9684 KOps/s | 1.9466 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_vmap_mlp_speed[False-False] | 0.7456ms | 0.4971ms | 2.0115 KOps/s | 1.9617 KOps/s | $\color{#35bf28}+2.54\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 0.9845ms | 0.7058ms | 1.4169 KOps/s | 1.4049 KOps/s | $\color{#35bf28}+0.85\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 1.1267ms | 0.7138ms | 1.4010 KOps/s | 1.4119 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7668ms | 0.5825ms | 1.7169 KOps/s | 1.6957 KOps/s | $\color{#35bf28}+1.25\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8899ms | 0.5863ms | 1.7057 KOps/s | 1.6964 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_to_module_speed[True] | 89.3216ms | 2.0427ms | 489.5486 Ops/s | 548.2843 Ops/s | $\textbf{\color{#d91a1a}-10.71\\%}$ | | test_to_module_speed[False] | 2.4572ms | 1.8288ms | 546.7943 Ops/s | 560.6109 Ops/s | $\color{#d91a1a}-2.46\\%$ | | test_tc_init | 0.1054ms | 44.5061μs | 22.4688 KOps/s | 20.5063 KOps/s | $\textbf{\color{#35bf28}+9.57\\%}$ | | test_tc_init_nested | 0.1802ms | 90.5153μs | 11.0479 KOps/s | 10.1199 KOps/s | $\textbf{\color{#35bf28}+9.17\\%}$ | | test_tc_first_layer_tensor | 48.0000μs | 9.1330μs | 109.4927 KOps/s | 104.3052 KOps/s | $\color{#35bf28}+4.97\\%$ | | test_tc_first_layer_nontensor | 51.4670μs | 9.1167μs | 109.6890 KOps/s | 105.4846 KOps/s | $\color{#35bf28}+3.99\\%$ | | test_tc_second_layer_tensor | 40.9260μs | 2.8149μs | 355.2560 KOps/s | 323.4543 KOps/s | $\textbf{\color{#35bf28}+9.83\\%}$ | | test_tc_second_layer_nontensor | 44.2730μs | 10.2736μs | 97.3367 KOps/s | 93.1295 KOps/s | $\color{#35bf28}+4.52\\%$ | | test_unbind | 0.1039s | 13.2547ms | 75.4449 Ops/s | 73.4936 Ops/s | $\color{#35bf28}+2.66\\%$ | | test_full_like | 8.5927ms | 7.5402ms | 132.6226 Ops/s | 143.1277 Ops/s | $\textbf{\color{#d91a1a}-7.34\\%}$ | | test_zeros_like | 12.6501ms | 6.6179ms | 151.1054 Ops/s | 154.5221 Ops/s | $\color{#d91a1a}-2.21\\%$ | | test_ones_like | 13.5620ms | 7.2937ms | 137.1045 Ops/s | 156.9763 Ops/s | $\textbf{\color{#d91a1a}-12.66\\%}$ | | test_clone | 12.6512ms | 8.7817ms | 113.8730 Ops/s | 106.0753 Ops/s | $\textbf{\color{#35bf28}+7.35\\%}$ | | test_squeeze | 66.0240μs | 14.8111μs | 67.5169 KOps/s | 70.2024 KOps/s | $\color{#d91a1a}-3.83\\%$ | | test_unsqueeze | 0.1715ms | 96.8117μs | 10.3293 KOps/s | 10.6634 KOps/s | $\color{#d91a1a}-3.13\\%$ | | test_split | 0.4295ms | 0.2064ms | 4.8439 KOps/s | 4.8394 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_permute | 0.4050ms | 0.2236ms | 4.4721 KOps/s | 4.4801 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_stack | 27.9592ms | 23.3586ms | 42.8107 Ops/s | 41.0213 Ops/s | $\color{#35bf28}+4.36\\%$ | | test_cat | 28.0039ms | 22.8349ms | 43.7926 Ops/s | 41.7223 Ops/s | $\color{#35bf28}+4.96\\%$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}28$. Worsened: $\large\color{#d91a1a}19$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 0.5557ms | 17.7750μs | 56.2588 KOps/s | 64.4906 KOps/s | $\textbf{\color{#d91a1a}-12.76\\%}$ | | test_plain_set_stack_nested | 37.7800μs | 17.6788μs | 56.5648 KOps/s | 64.3421 KOps/s | $\textbf{\color{#d91a1a}-12.09\\%}$ | | test_plain_set_nested_inplace | 0.1306ms | 18.6041μs | 53.7515 KOps/s | 60.1450 KOps/s | $\textbf{\color{#d91a1a}-10.63\\%}$ | | test_plain_set_stack_nested_inplace | 47.0200μs | 18.5481μs | 53.9139 KOps/s | 60.0125 KOps/s | $\textbf{\color{#d91a1a}-10.16\\%}$ | | test_items | 0.1710ms | 4.7158μs | 212.0527 KOps/s | 210.3511 KOps/s | $\color{#35bf28}+0.81\\%$ | | test_items_nested | 0.4881ms | 0.3984ms | 2.5101 KOps/s | 2.5527 KOps/s | $\color{#d91a1a}-1.67\\%$ | | test_items_nested_locked | 0.4207ms | 0.3959ms | 2.5260 KOps/s | 2.5277 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_items_nested_leaf | 0.2660ms | 87.0645μs | 11.4857 KOps/s | 11.5636 KOps/s | $\color{#d91a1a}-0.67\\%$ | | test_items_stack_nested | 0.5779ms | 0.3949ms | 2.5321 KOps/s | 2.5361 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_items_stack_nested_leaf | 0.1247ms | 85.2597μs | 11.7289 KOps/s | 11.5445 KOps/s | $\color{#35bf28}+1.60\\%$ | | test_items_stack_nested_locked | 0.4320ms | 0.3982ms | 2.5113 KOps/s | 2.5121 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_keys | 26.7800μs | 4.4111μs | 226.7011 KOps/s | 228.0134 KOps/s | $\color{#d91a1a}-0.58\\%$ | | test_keys_nested | 94.1010μs | 65.4208μs | 15.2856 KOps/s | 15.0163 KOps/s | $\color{#35bf28}+1.79\\%$ | | test_keys_nested_locked | 1.8405ms | 72.4586μs | 13.8010 KOps/s | 13.6211 KOps/s | $\color{#35bf28}+1.32\\%$ | | test_keys_nested_leaf | 86.5910μs | 57.5019μs | 17.3907 KOps/s | 17.2470 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_keys_stack_nested | 0.1071ms | 66.5519μs | 15.0259 KOps/s | 14.6902 KOps/s | $\color{#35bf28}+2.28\\%$ | | test_keys_stack_nested_leaf | 88.1110μs | 58.0555μs | 17.2249 KOps/s | 17.3583 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_keys_stack_nested_locked | 0.1697ms | 71.9281μs | 13.9028 KOps/s | 13.6341 KOps/s | $\color{#35bf28}+1.97\\%$ | | test_values | 8.9467μs | 1.7711μs | 564.6098 KOps/s | 567.6834 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_values_nested | 55.5000μs | 34.1625μs | 29.2718 KOps/s | 29.6903 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_values_nested_locked | 53.9410μs | 35.8601μs | 27.8861 KOps/s | 28.2742 KOps/s | $\color{#d91a1a}-1.37\\%$ | | test_values_nested_leaf | 44.7810μs | 30.2347μs | 33.0746 KOps/s | 33.5551 KOps/s | $\color{#d91a1a}-1.43\\%$ | | test_values_stack_nested | 0.1682ms | 34.3697μs | 29.0954 KOps/s | 28.9175 KOps/s | $\color{#35bf28}+0.62\\%$ | | test_values_stack_nested_leaf | 59.9800μs | 31.1034μs | 32.1508 KOps/s | 32.2735 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_values_stack_nested_locked | 66.5210μs | 36.3518μs | 27.5090 KOps/s | 27.6234 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_membership | 1.6641μs | 0.5411μs | 1.8479 MOps/s | 1.8490 MOps/s | $\color{#d91a1a}-0.06\\%$ | | test_membership_nested | 17.0110μs | 2.1260μs | 470.3590 KOps/s | 460.0811 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_membership_nested_leaf | 15.9450μs | 2.0512μs | 487.5193 KOps/s | 479.9789 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_membership_stacked_nested | 14.8200μs | 2.0974μs | 476.7795 KOps/s | 464.8107 KOps/s | $\color{#35bf28}+2.57\\%$ | | test_membership_stacked_nested_leaf | 42.5910μs | 2.0723μs | 482.5461 KOps/s | 469.8513 KOps/s | $\color{#35bf28}+2.70\\%$ | | test_membership_nested_last | 22.3600μs | 3.0481μs | 328.0724 KOps/s | 326.7638 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_membership_nested_leaf_last | 33.1910μs | 3.0194μs | 331.1900 KOps/s | 323.5101 KOps/s | $\color{#35bf28}+2.37\\%$ | | test_membership_stacked_nested_last | 48.1610μs | 9.1899μs | 108.8149 KOps/s | 328.4704 KOps/s | $\textbf{\color{#d91a1a}-66.87\\%}$ | | test_membership_stacked_nested_leaf_last | 39.0800μs | 9.1682μs | 109.0725 KOps/s | 330.8880 KOps/s | $\textbf{\color{#d91a1a}-67.04\\%}$ | | test_nested_getleaf | 24.6600μs | 8.0613μs | 124.0496 KOps/s | 124.8787 KOps/s | $\color{#d91a1a}-0.66\\%$ | | test_nested_get | 33.7700μs | 7.5603μs | 132.2699 KOps/s | 132.6862 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_stacked_getleaf | 37.6210μs | 8.0429μs | 124.3339 KOps/s | 123.8938 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_stacked_get | 24.0410μs | 7.5288μs | 132.8226 KOps/s | 132.5334 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_nested_getitemleaf | 25.4310μs | 8.2245μs | 121.5874 KOps/s | 122.2118 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_nested_getitem | 24.8500μs | 7.7022μs | 129.8337 KOps/s | 129.4906 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_stacked_getitemleaf | 38.9300μs | 8.2458μs | 121.2741 KOps/s | 121.3170 KOps/s | $\color{#d91a1a}-0.04\\%$ | | test_stacked_getitem | 35.0910μs | 7.7134μs | 129.6439 KOps/s | 129.3848 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_lock_nested | 3.1760ms | 0.4766ms | 2.0980 KOps/s | 2.0500 KOps/s | $\color{#35bf28}+2.34\\%$ | | test_lock_stack_nested | 0.5390ms | 0.4262ms | 2.3463 KOps/s | 2.2258 KOps/s | $\textbf{\color{#35bf28}+5.41\\%}$ | | test_unlock_nested | 0.8314ms | 0.3878ms | 2.5790 KOps/s | 2.4726 KOps/s | $\color{#35bf28}+4.30\\%$ | | test_unlock_stack_nested | 0.4528ms | 0.3401ms | 2.9403 KOps/s | 2.7469 KOps/s | $\textbf{\color{#35bf28}+7.04\\%}$ | | test_flatten_speed | 0.1979ms | 0.1046ms | 9.5559 KOps/s | 9.4804 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_unflatten_speed | 0.3822ms | 0.2932ms | 3.4111 KOps/s | 3.4048 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_common_ops | 1.7645ms | 1.3207ms | 757.1743 Ops/s | 736.8874 Ops/s | $\color{#35bf28}+2.75\\%$ | | test_creation | 31.4800μs | 1.9560μs | 511.2362 KOps/s | 520.5153 KOps/s | $\color{#d91a1a}-1.78\\%$ | | test_creation_empty | 75.7710μs | 18.7442μs | 53.3499 KOps/s | 68.9361 KOps/s | $\textbf{\color{#d91a1a}-22.61\\%}$ | | test_creation_nested_1 | 45.9500μs | 20.7194μs | 48.2639 KOps/s | 59.8615 KOps/s | $\textbf{\color{#d91a1a}-19.37\\%}$ | | test_creation_nested_2 | 91.9320μs | 23.3955μs | 42.7433 KOps/s | 51.8206 KOps/s | $\textbf{\color{#d91a1a}-17.52\\%}$ | | test_clone | 0.1836ms | 30.4522μs | 32.8384 KOps/s | 28.6156 KOps/s | $\textbf{\color{#35bf28}+14.76\\%}$ | | test_getitem[int] | 1.2668ms | 17.1500μs | 58.3092 KOps/s | 57.0690 KOps/s | $\color{#35bf28}+2.17\\%$ | | test_getitem[slice_int] | 0.1544ms | 28.9378μs | 34.5569 KOps/s | 30.7933 KOps/s | $\textbf{\color{#35bf28}+12.22\\%}$ | | test_getitem[range] | 0.3157ms | 0.1177ms | 8.4931 KOps/s | 8.4797 KOps/s | $\color{#35bf28}+0.16\\%$ | | test_getitem[tuple] | 0.1536ms | 25.4528μs | 39.2884 KOps/s | 35.4929 KOps/s | $\textbf{\color{#35bf28}+10.69\\%}$ | | test_getitem[list] | 0.2589ms | 0.1061ms | 9.4210 KOps/s | 8.8463 KOps/s | $\textbf{\color{#35bf28}+6.50\\%}$ | | test_setitem_dim[int] | 76.8410μs | 56.7336μs | 17.6262 KOps/s | 18.2070 KOps/s | $\color{#d91a1a}-3.19\\%$ | | test_setitem_dim[slice_int] | 0.1088ms | 79.8111μs | 12.5296 KOps/s | 12.2735 KOps/s | $\color{#35bf28}+2.09\\%$ | | test_setitem_dim[range] | 0.2997ms | 0.1439ms | 6.9503 KOps/s | 6.8150 KOps/s | $\color{#35bf28}+1.99\\%$ | | test_setitem_dim[tuple] | 0.2193ms | 73.1998μs | 13.6612 KOps/s | 13.6339 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_setitem | 0.2446ms | 46.1866μs | 21.6513 KOps/s | 21.0280 KOps/s | $\color{#35bf28}+2.96\\%$ | | test_set | 0.2007ms | 43.9157μs | 22.7709 KOps/s | 21.5389 KOps/s | $\textbf{\color{#35bf28}+5.72\\%}$ | | test_set_shared | 0.4218ms | 54.0796μs | 18.4913 KOps/s | 17.0318 KOps/s | $\textbf{\color{#35bf28}+8.57\\%}$ | | test_update | 0.2302ms | 51.8522μs | 19.2856 KOps/s | 18.9158 KOps/s | $\color{#35bf28}+1.95\\%$ | | test_update_nested | 0.2144ms | 60.3234μs | 16.5773 KOps/s | 16.2732 KOps/s | $\color{#35bf28}+1.87\\%$ | | test_update__nested | 0.2124ms | 60.3047μs | 16.5824 KOps/s | 14.3662 KOps/s | $\textbf{\color{#35bf28}+15.43\\%}$ | | test_set_nested | 0.2165ms | 46.4296μs | 21.5380 KOps/s | 20.5453 KOps/s | $\color{#35bf28}+4.83\\%$ | | test_set_nested_new | 0.1957ms | 48.0899μs | 20.7944 KOps/s | 18.6242 KOps/s | $\textbf{\color{#35bf28}+11.65\\%}$ | | test_select | 0.2256ms | 64.4585μs | 15.5139 KOps/s | 14.8647 KOps/s | $\color{#35bf28}+4.37\\%$ | | test_select_nested | 89.8010μs | 53.8237μs | 18.5792 KOps/s | 18.6596 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_exclude_nested | 0.1104ms | 72.2201μs | 13.8466 KOps/s | 13.6629 KOps/s | $\color{#35bf28}+1.34\\%$ | | test_empty[True] | 0.3866ms | 0.3006ms | 3.3267 KOps/s | 3.3738 KOps/s | $\color{#d91a1a}-1.40\\%$ | | test_empty[False] | 5.0151μs | 0.9365μs | 1.0678 MOps/s | 1.0865 MOps/s | $\color{#d91a1a}-1.72\\%$ | | test_to | 0.1773ms | 38.1719μs | 26.1972 KOps/s | 25.6537 KOps/s | $\color{#35bf28}+2.12\\%$ | | test_to_nonblocking | 41.7500μs | 23.8157μs | 41.9891 KOps/s | 42.2358 KOps/s | $\color{#d91a1a}-0.58\\%$ | | test_unbind_speed | 0.4693ms | 0.3039ms | 3.2902 KOps/s | 3.1773 KOps/s | $\color{#35bf28}+3.55\\%$ | | test_unbind_speed_stack0 | 0.3375ms | 0.2938ms | 3.4033 KOps/s | 3.1910 KOps/s | $\textbf{\color{#35bf28}+6.65\\%}$ | | test_unbind_speed_stack1 | 94.3889ms | 0.8353ms | 1.1972 KOps/s | 1.2603 KOps/s | $\textbf{\color{#d91a1a}-5.00\\%}$ | | test_split | 2.2574ms | 2.1247ms | 470.6576 Ops/s | 413.3124 Ops/s | $\textbf{\color{#35bf28}+13.87\\%}$ | | test_chunk | 94.6066ms | 2.5246ms | 396.0976 Ops/s | 408.3535 Ops/s | $\color{#d91a1a}-3.00\\%$ | | test_creation[device0] | 0.2528ms | 0.1044ms | 9.5755 KOps/s | 9.3261 KOps/s | $\color{#35bf28}+2.67\\%$ | | test_creation_from_tensor | 0.2436ms | 0.1012ms | 9.8810 KOps/s | 9.3659 KOps/s | $\textbf{\color{#35bf28}+5.50\\%}$ | | test_add_one[memmap_tensor0] | 21.5200μs | 8.8741μs | 112.6871 KOps/s | 103.9685 KOps/s | $\textbf{\color{#35bf28}+8.39\\%}$ | | test_contiguous[memmap_tensor0] | 44.4700μs | 2.2469μs | 445.0482 KOps/s | 441.2230 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_stack[memmap_tensor0] | 39.7010μs | 6.7028μs | 149.1921 KOps/s | 138.8193 KOps/s | $\textbf{\color{#35bf28}+7.47\\%}$ | | test_memmaptd_index | 1.1083ms | 0.4292ms | 2.3300 KOps/s | 2.2693 KOps/s | $\color{#35bf28}+2.68\\%$ | | test_memmaptd_index_astensor | 0.8040ms | 0.4922ms | 2.0317 KOps/s | 1.9417 KOps/s | $\color{#35bf28}+4.63\\%$ | | test_memmaptd_index_op | 1.5093ms | 1.0452ms | 956.7995 Ops/s | 968.4984 Ops/s | $\color{#d91a1a}-1.21\\%$ | | test_serialize_model | 0.1005s | 96.6665ms | 10.3448 Ops/s | 10.1450 Ops/s | $\color{#35bf28}+1.97\\%$ | | test_serialize_model_pickle | 1.3508s | 1.2387s | 0.8073 Ops/s | 0.8074 Ops/s | $-0.01\\%$ | | test_serialize_weights | 0.1896s | 0.1032s | 9.6880 Ops/s | 9.0520 Ops/s | $\textbf{\color{#35bf28}+7.03\\%}$ | | test_serialize_weights_returnearly | 86.0354ms | 72.6054ms | 13.7731 Ops/s | 13.8793 Ops/s | $\color{#d91a1a}-0.77\\%$ | | test_serialize_weights_pickle | 1.3492s | 1.2371s | 0.8083 Ops/s | 0.8091 Ops/s | $\color{#d91a1a}-0.09\\%$ | | test_reshape_pytree | 0.1330ms | 38.0069μs | 26.3110 KOps/s | 25.2405 KOps/s | $\color{#35bf28}+4.24\\%$ | | test_reshape_td | 0.2510ms | 44.0678μs | 22.6923 KOps/s | 22.2275 KOps/s | $\color{#35bf28}+2.09\\%$ | | test_view_pytree | 0.1731ms | 38.5993μs | 25.9072 KOps/s | 25.3894 KOps/s | $\color{#35bf28}+2.04\\%$ | | test_view_td | 0.2550ms | 51.7039μs | 19.3409 KOps/s | 19.2409 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_unbind_pytree | 0.1576ms | 36.8850μs | 27.1113 KOps/s | 26.6550 KOps/s | $\color{#35bf28}+1.71\\%$ | | test_unbind_td | 0.4105ms | 44.5419μs | 22.4508 KOps/s | 21.0619 KOps/s | $\textbf{\color{#35bf28}+6.59\\%}$ | | test_split_pytree | 0.1852ms | 49.9656μs | 20.0138 KOps/s | 19.6104 KOps/s | $\color{#35bf28}+2.06\\%$ | | test_split_td | 0.5159ms | 60.2476μs | 16.5982 KOps/s | 15.9707 KOps/s | $\color{#35bf28}+3.93\\%$ | | test_add_pytree | 0.2107ms | 59.7795μs | 16.7281 KOps/s | 16.0681 KOps/s | $\color{#35bf28}+4.11\\%$ | | test_add_td | 0.3403ms | 94.2810μs | 10.6066 KOps/s | 10.7413 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_compile_add_one_nested[tensordict-compile] | 0.4157ms | 0.2129ms | 4.6964 KOps/s | 4.7297 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.3232ms | 0.1758ms | 5.6899 KOps/s | 5.6763 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.2867ms | 0.1477ms | 6.7712 KOps/s | 6.7202 KOps/s | $\color{#35bf28}+0.76\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.3476ms | 0.1951ms | 5.1252 KOps/s | 4.8860 KOps/s | $\color{#35bf28}+4.90\\%$ | | test_compile_copy_nested[tensordict-compile] | 0.1635ms | 21.8359μs | 45.7960 KOps/s | 45.4991 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_compile_copy_nested[tensordict-eager] | 0.1699ms | 48.2036μs | 20.7453 KOps/s | 20.6016 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1643ms | 72.8520μs | 13.7265 KOps/s | 13.5990 KOps/s | $\color{#35bf28}+0.94\\%$ | | test_compile_copy_nested[pytree-eager] | 0.1117ms | 59.3152μs | 16.8591 KOps/s | 16.7050 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.5108ms | 0.3296ms | 3.0342 KOps/s | 3.0013 KOps/s | $\color{#35bf28}+1.10\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.3899ms | 0.2236ms | 4.4723 KOps/s | 4.4695 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.2936ms | 0.1313ms | 7.6157 KOps/s | 7.5547 KOps/s | $\color{#35bf28}+0.81\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.2195ms | 64.7936μs | 15.4336 KOps/s | 15.1172 KOps/s | $\color{#35bf28}+2.09\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.4803ms | 0.3312ms | 3.0195 KOps/s | 3.0247 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.9213ms | 0.7175ms | 1.3937 KOps/s | 1.4964 KOps/s | $\textbf{\color{#d91a1a}-6.87\\%}$ | | test_compile_add_self_flat[tensordict-eager] | 0.4037ms | 0.2772ms | 3.6078 KOps/s | 3.6564 KOps/s | $\color{#d91a1a}-1.33\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.4612ms | 0.3327ms | 3.0054 KOps/s | 2.9934 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.2216ms | 77.3537μs | 12.9276 KOps/s | 12.7999 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.2634ms | 0.1323ms | 7.5567 KOps/s | 7.5561 KOps/s | $+0.01\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.7248ms | 0.5545ms | 1.8036 KOps/s | 1.7488 KOps/s | $\color{#35bf28}+3.13\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.5083ms | 0.3287ms | 3.0425 KOps/s | 3.0392 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_compile_copy_flat[tensordict-compile] | 0.1523ms | 18.9090μs | 52.8849 KOps/s | 53.9929 KOps/s | $\color{#d91a1a}-2.05\\%$ | | test_compile_copy_flat[tensordict-eager] | 65.8410μs | 32.4137μs | 30.8512 KOps/s | 30.9899 KOps/s | $\color{#d91a1a}-0.45\\%$ | | test_compile_copy_flat[pytree-compile] | 0.2150ms | 76.3190μs | 13.1029 KOps/s | 12.9872 KOps/s | $\color{#35bf28}+0.89\\%$ | | test_compile_copy_flat[pytree-eager] | 84.8310μs | 60.8901μs | 16.4230 KOps/s | 16.4295 KOps/s | $\color{#d91a1a}-0.04\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.7217ms | 0.9836ms | 1.0167 KOps/s | 1.0226 KOps/s | $\color{#d91a1a}-0.58\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.8358ms | 3.4155ms | 292.7788 Ops/s | 289.1979 Ops/s | $\color{#35bf28}+1.24\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.6694ms | 0.9606ms | 1.0411 KOps/s | 1.0452 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_compile_assign_and_add[pytree-eager] | 3.5325ms | 3.3180ms | 301.3839 Ops/s | 295.2306 Ops/s | $\color{#35bf28}+2.08\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.4110ms | 0.1116ms | 8.9583 KOps/s | 8.8301 KOps/s | $\color{#35bf28}+1.45\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.2783ms | 69.7782μs | 14.3311 KOps/s | 15.3014 KOps/s | $\textbf{\color{#d91a1a}-6.34\\%}$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.2539ms | 0.1050ms | 9.5251 KOps/s | 9.5045 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 0.2652ms | 45.4343μs | 22.0098 KOps/s | 19.3399 KOps/s | $\textbf{\color{#35bf28}+13.81\\%}$ | | test_compile_indexing[tensor-pytree-compile] | 0.3085ms | 0.1082ms | 9.2425 KOps/s | 9.0026 KOps/s | $\color{#35bf28}+2.66\\%$ | | test_compile_indexing[tensor-pytree-eager] | 0.2795ms | 50.1573μs | 19.9373 KOps/s | 19.4761 KOps/s | $\color{#35bf28}+2.37\\%$ | | test_compile_indexing[slice-tensordict-compile] | 0.3085ms | 0.1442ms | 6.9330 KOps/s | 6.8157 KOps/s | $\color{#35bf28}+1.72\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.2961ms | 27.9792μs | 35.7408 KOps/s | 36.0173 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.3424ms | 0.1328ms | 7.5309 KOps/s | 7.4292 KOps/s | $\color{#35bf28}+1.37\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 75.3420μs | 22.6619μs | 44.1270 KOps/s | 42.1635 KOps/s | $\color{#35bf28}+4.66\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.3450ms | 0.1325ms | 7.5446 KOps/s | 7.2385 KOps/s | $\color{#35bf28}+4.23\\%$ | | test_compile_indexing[slice-pytree-eager] | 0.2304ms | 23.1728μs | 43.1541 KOps/s | 39.8582 KOps/s | $\textbf{\color{#35bf28}+8.27\\%}$ | | test_compile_indexing[int-tensordict-compile] | 0.3499ms | 0.1406ms | 7.1105 KOps/s | 6.8952 KOps/s | $\color{#35bf28}+3.12\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.5063ms | 26.7913μs | 37.3256 KOps/s | 36.4183 KOps/s | $\color{#35bf28}+2.49\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.3364ms | 0.1323ms | 7.5568 KOps/s | 7.2621 KOps/s | $\color{#35bf28}+4.06\\%$ | | test_compile_indexing[int-tensorclass-eager] | 0.2276ms | 22.5831μs | 44.2809 KOps/s | 41.7496 KOps/s | $\textbf{\color{#35bf28}+6.06\\%}$ | | test_compile_indexing[int-pytree-compile] | 0.3287ms | 0.1323ms | 7.5563 KOps/s | 7.1268 KOps/s | $\textbf{\color{#35bf28}+6.03\\%}$ | | test_compile_indexing[int-pytree-eager] | 0.2277ms | 22.4790μs | 44.4859 KOps/s | 42.7885 KOps/s | $\color{#35bf28}+3.97\\%$ | | test_mod_add[eager] | 0.2664ms | 39.9478μs | 25.0327 KOps/s | 24.7728 KOps/s | $\color{#35bf28}+1.05\\%$ | | test_mod_add[compile] | 0.3247ms | 68.2205μs | 14.6584 KOps/s | 14.2964 KOps/s | $\color{#35bf28}+2.53\\%$ | | test_mod_add[compile-overhead] | 0.2640ms | 0.1478ms | 6.7675 KOps/s | 6.6567 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_mod_wrap[eager] | 0.4869ms | 0.2571ms | 3.8892 KOps/s | 3.9292 KOps/s | $\color{#d91a1a}-1.02\\%$ | | test_mod_wrap[compile] | 0.5265ms | 0.3005ms | 3.3282 KOps/s | 3.2720 KOps/s | $\color{#35bf28}+1.72\\%$ | | test_mod_wrap[compile-overhead] | 8.3368ms | 4.3705ms | 228.8076 Ops/s | 232.8334 Ops/s | $\color{#d91a1a}-1.73\\%$ | | test_mod_wrap_and_backward[eager] | 1.6655ms | 1.4407ms | 694.0898 Ops/s | 679.2056 Ops/s | $\color{#35bf28}+2.19\\%$ | | test_mod_wrap_and_backward[compile] | 1.6952ms | 1.4844ms | 673.6620 Ops/s | 669.3179 Ops/s | $\color{#35bf28}+0.65\\%$ | | test_mod_wrap_and_backward[compile-overhead] | 1.4901ms | 1.0429ms | 958.9033 Ops/s | 982.5331 Ops/s | $\color{#d91a1a}-2.40\\%$ | | test_seq_add[eager] | 0.2577ms | 0.1140ms | 8.7758 KOps/s | 8.9901 KOps/s | $\color{#d91a1a}-2.38\\%$ | | test_seq_add[compile] | 0.2329ms | 83.8188μs | 11.9305 KOps/s | 11.1822 KOps/s | $\textbf{\color{#35bf28}+6.69\\%}$ | | test_seq_add[compile-overhead] | 0.2675ms | 0.1228ms | 8.1442 KOps/s | 8.1818 KOps/s | $\color{#d91a1a}-0.46\\%$ | | test_seq_wrap[eager] | 0.5891ms | 0.4274ms | 2.3398 KOps/s | 2.3887 KOps/s | $\color{#d91a1a}-2.05\\%$ | | test_seq_wrap[compile] | 1.5400ms | 0.3324ms | 3.0084 KOps/s | 2.9258 KOps/s | $\color{#35bf28}+2.82\\%$ | | test_seq_wrap[compile-overhead] | 0.3090s | 0.1482s | 6.7461 Ops/s | 6.6559 Ops/s | $\color{#35bf28}+1.36\\%$ | | test_func_call_runtime[False-eager] | 0.9092ms | 0.7409ms | 1.3498 KOps/s | 1.2741 KOps/s | $\textbf{\color{#35bf28}+5.94\\%}$ | | test_func_call_runtime[False-compile] | 0.9898ms | 0.8238ms | 1.2139 KOps/s | 1.1385 KOps/s | $\textbf{\color{#35bf28}+6.62\\%}$ | | test_func_call_runtime[False-compile-overhead] | 0.5221ms | 0.3750ms | 2.6669 KOps/s | 2.6159 KOps/s | $\color{#35bf28}+1.95\\%$ | | test_func_call_runtime[True-eager] | 1.1549ms | 0.9904ms | 1.0097 KOps/s | 968.4513 Ops/s | $\color{#35bf28}+4.26\\%$ | | test_func_call_runtime[True-compile] | 1.0395ms | 0.8673ms | 1.1530 KOps/s | 1.0780 KOps/s | $\textbf{\color{#35bf28}+6.95\\%}$ | | test_func_call_runtime[True-compile-overhead] | 0.5864ms | 0.4157ms | 2.4058 KOps/s | 2.3803 KOps/s | $\color{#35bf28}+1.07\\%$ | | test_distributed | 1.7373ms | 76.9848μs | 12.9896 KOps/s | 14.2620 KOps/s | $\textbf{\color{#d91a1a}-8.92\\%}$ | | test_tdmodule | 0.1443ms | 16.8598μs | 59.3128 KOps/s | 65.1928 KOps/s | $\textbf{\color{#d91a1a}-9.02\\%}$ | | test_tdmodule_dispatch | 72.2410μs | 34.6459μs | 28.8634 KOps/s | 32.2912 KOps/s | $\textbf{\color{#d91a1a}-10.62\\%}$ | | test_tdseq | 35.3710μs | 17.3231μs | 57.7265 KOps/s | 64.2527 KOps/s | $\textbf{\color{#d91a1a}-10.16\\%}$ | | test_tdseq_dispatch | 57.6710μs | 36.5307μs | 27.3743 KOps/s | 30.7333 KOps/s | $\textbf{\color{#d91a1a}-10.93\\%}$ | | test_instantiation_functorch | 2.1648ms | 1.9964ms | 500.8971 Ops/s | 495.6300 Ops/s | $\color{#35bf28}+1.06\\%$ | | test_instantiation_td | 1.9924ms | 1.3053ms | 766.0889 Ops/s | 736.3354 Ops/s | $\color{#35bf28}+4.04\\%$ | | test_exec_functorch | 0.3780ms | 0.2290ms | 4.3663 KOps/s | 4.1961 KOps/s | $\color{#35bf28}+4.06\\%$ | | test_exec_functional_call | 0.3773ms | 0.2190ms | 4.5670 KOps/s | 4.2854 KOps/s | $\textbf{\color{#35bf28}+6.57\\%}$ | | test_exec_td | 0.3648ms | 0.2210ms | 4.5245 KOps/s | 4.2283 KOps/s | $\textbf{\color{#35bf28}+7.01\\%}$ | | test_exec_td_decorator | 0.8802ms | 0.2958ms | 3.3806 KOps/s | 3.2182 KOps/s | $\textbf{\color{#35bf28}+5.04\\%}$ | | test_vmap_mlp_speed[True-True] | 0.8280ms | 0.6663ms | 1.5009 KOps/s | 1.4531 KOps/s | $\color{#35bf28}+3.29\\%$ | | test_vmap_mlp_speed[True-False] | 0.8303ms | 0.6669ms | 1.4994 KOps/s | 1.4612 KOps/s | $\color{#35bf28}+2.62\\%$ | | test_vmap_mlp_speed[False-True] | 0.7317ms | 0.5793ms | 1.7263 KOps/s | 1.6456 KOps/s | $\color{#35bf28}+4.90\\%$ | | test_vmap_mlp_speed[False-False] | 0.7431ms | 0.5854ms | 1.7083 KOps/s | 1.6520 KOps/s | $\color{#35bf28}+3.41\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 0.9921ms | 0.7467ms | 1.3393 KOps/s | 1.3037 KOps/s | $\color{#35bf28}+2.73\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9774ms | 0.7488ms | 1.3354 KOps/s | 1.3141 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8447ms | 0.6475ms | 1.5444 KOps/s | 1.4890 KOps/s | $\color{#35bf28}+3.73\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8088ms | 0.6442ms | 1.5524 KOps/s | 1.4880 KOps/s | $\color{#35bf28}+4.33\\%$ | | test_vmap_transformer_speed[True-True] | 8.8648ms | 8.6854ms | 115.1361 Ops/s | 111.3599 Ops/s | $\color{#35bf28}+3.39\\%$ | | test_vmap_transformer_speed[True-False] | 8.9124ms | 8.6699ms | 115.3410 Ops/s | 113.6364 Ops/s | $\color{#35bf28}+1.50\\%$ | | test_vmap_transformer_speed[False-True] | 8.7661ms | 8.5754ms | 116.6123 Ops/s | 115.7123 Ops/s | $\color{#35bf28}+0.78\\%$ | | test_vmap_transformer_speed[False-False] | 8.7402ms | 8.5264ms | 117.2830 Ops/s | 115.6560 Ops/s | $\color{#35bf28}+1.41\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 20.9735ms | 20.7188ms | 48.2654 Ops/s | 47.9909 Ops/s | $\color{#35bf28}+0.57\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 20.9253ms | 20.7079ms | 48.2908 Ops/s | 47.3015 Ops/s | $\color{#35bf28}+2.09\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 20.7158ms | 20.5520ms | 48.6569 Ops/s | 48.5021 Ops/s | $\color{#35bf28}+0.32\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 20.6592ms | 20.4465ms | 48.9081 Ops/s | 47.8652 Ops/s | $\color{#35bf28}+2.18\\%$ | | test_to_module_speed[True] | 2.9361ms | 1.5288ms | 654.1107 Ops/s | 665.3487 Ops/s | $\color{#d91a1a}-1.69\\%$ | | test_to_module_speed[False] | 1.9551ms | 1.5003ms | 666.5130 Ops/s | 678.6096 Ops/s | $\color{#d91a1a}-1.78\\%$ | | test_tc_init | 82.7110μs | 38.9301μs | 25.6870 KOps/s | 29.3480 KOps/s | $\textbf{\color{#d91a1a}-12.47\\%}$ | | test_tc_init_nested | 0.1986ms | 80.2709μs | 12.4578 KOps/s | 14.9950 KOps/s | $\textbf{\color{#d91a1a}-16.92\\%}$ | | test_tc_first_layer_tensor | 17.0100μs | 3.9443μs | 253.5311 KOps/s | 251.2656 KOps/s | $\color{#35bf28}+0.90\\%$ | | test_tc_first_layer_nontensor | 17.1800μs | 3.9946μs | 250.3389 KOps/s | 250.4444 KOps/s | $\color{#d91a1a}-0.04\\%$ | | test_tc_second_layer_tensor | 6.6177μs | 1.2815μs | 780.3474 KOps/s | 773.9883 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_tc_second_layer_nontensor | 19.8900μs | 4.5848μs | 218.1104 KOps/s | 218.6508 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_unbind | 0.3237s | 13.1133ms | 76.2586 Ops/s | 74.7226 Ops/s | $\color{#35bf28}+2.06\\%$ | | test_full_like | 0.7663ms | 0.5789ms | 1.7276 KOps/s | 1.7275 KOps/s | $+0.00\\%$ | | test_zeros_like | 0.3606ms | 0.1980ms | 5.0513 KOps/s | 5.0493 KOps/s | $\color{#35bf28}+0.04\\%$ | | test_ones_like | 0.3849ms | 0.1979ms | 5.0531 KOps/s | 5.0531 KOps/s | $+0.00\\%$ | | test_clone | 0.5629ms | 0.4146ms | 2.4121 KOps/s | 2.4156 KOps/s | $\color{#d91a1a}-0.15\\%$ | | test_squeeze | 96.8410μs | 11.8301μs | 84.5302 KOps/s | 85.1088 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_unsqueeze | 0.2680ms | 86.4104μs | 11.5727 KOps/s | 11.5379 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_split | 0.4714ms | 0.1851ms | 5.4019 KOps/s | 5.4355 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_permute | 0.3289ms | 0.1984ms | 5.0399 KOps/s | 5.0710 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_stack | 1.3446ms | 0.9034ms | 1.1070 KOps/s | 1.0913 KOps/s | $\color{#35bf28}+1.43\\%$ | | test_cat | 1.3796ms | 1.2317ms | 811.8576 Ops/s | 811.7141 Ops/s | $\color{#35bf28}+0.02\\%$ |