pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[BugFix] fix _expand_to_match_shape for single bool tensor #902

Closed vmoens closed 1 month ago

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}26$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 45.5250μs | 21.9617μs | 45.5338 KOps/s | 44.0560 KOps/s | $\color{#35bf28}+3.35\\%$ | | test_plain_set_stack_nested | 63.4780μs | 22.1794μs | 45.0869 KOps/s | 43.9914 KOps/s | $\color{#35bf28}+2.49\\%$ | | test_plain_set_nested_inplace | 89.7970μs | 23.9012μs | 41.8389 KOps/s | 40.3770 KOps/s | $\color{#35bf28}+3.62\\%$ | | test_plain_set_stack_nested_inplace | 76.5330μs | 24.1039μs | 41.4871 KOps/s | 41.0286 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_items | 24.0540μs | 2.5795μs | 387.6664 KOps/s | 357.7499 KOps/s | $\textbf{\color{#35bf28}+8.36\\%}$ | | test_items_nested | 0.7392ms | 0.3611ms | 2.7695 KOps/s | 2.7502 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_items_nested_locked | 2.1839ms | 0.3632ms | 2.7534 KOps/s | 2.7660 KOps/s | $\color{#d91a1a}-0.45\\%$ | | test_items_nested_leaf | 0.3255ms | 89.1020μs | 11.2231 KOps/s | 11.6143 KOps/s | $\color{#d91a1a}-3.37\\%$ | | test_items_stack_nested | 0.5639ms | 0.3645ms | 2.7434 KOps/s | 2.7288 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_items_stack_nested_leaf | 0.1630ms | 85.4703μs | 11.7000 KOps/s | 11.5941 KOps/s | $\color{#35bf28}+0.91\\%$ | | test_items_stack_nested_locked | 0.7707ms | 0.3696ms | 2.7054 KOps/s | 2.7677 KOps/s | $\color{#d91a1a}-2.25\\%$ | | test_keys | 47.6290μs | 3.8536μs | 259.4956 KOps/s | 244.1452 KOps/s | $\textbf{\color{#35bf28}+6.29\\%}$ | | test_keys_nested | 0.3137ms | 0.1430ms | 6.9935 KOps/s | 6.8945 KOps/s | $\color{#35bf28}+1.44\\%$ | | test_keys_nested_locked | 1.9063ms | 0.1537ms | 6.5081 KOps/s | 6.6518 KOps/s | $\color{#d91a1a}-2.16\\%$ | | test_keys_nested_leaf | 0.2186ms | 0.1237ms | 8.0815 KOps/s | 8.0074 KOps/s | $\color{#35bf28}+0.93\\%$ | | test_keys_stack_nested | 0.2408ms | 0.1447ms | 6.9100 KOps/s | 6.8815 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_keys_stack_nested_leaf | 0.2161ms | 0.1238ms | 8.0749 KOps/s | 8.0556 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_keys_stack_nested_locked | 0.2599ms | 0.1493ms | 6.6962 KOps/s | 6.6383 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_values | 10.3143μs | 1.1639μs | 859.1836 KOps/s | 861.2822 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_values_nested | 91.4700μs | 48.9993μs | 20.4085 KOps/s | 20.0089 KOps/s | $\color{#35bf28}+2.00\\%$ | | test_values_nested_locked | 94.2850μs | 48.7434μs | 20.5156 KOps/s | 19.9248 KOps/s | $\color{#35bf28}+2.97\\%$ | | test_values_nested_leaf | 96.3670μs | 44.7104μs | 22.3662 KOps/s | 22.3119 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_values_stack_nested | 0.1002ms | 50.6312μs | 19.7507 KOps/s | 19.9275 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_values_stack_nested_leaf | 96.4800μs | 44.2706μs | 22.5884 KOps/s | 22.3222 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_values_stack_nested_locked | 0.1024ms | 51.1807μs | 19.5386 KOps/s | 20.0695 KOps/s | $\color{#d91a1a}-2.65\\%$ | | test_membership | 2.8383μs | 0.7063μs | 1.4159 MOps/s | 1.1023 MOps/s | $\textbf{\color{#35bf28}+28.45\\%}$ | | test_membership_nested | 27.9420μs | 2.6925μs | 371.3982 KOps/s | 372.0381 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_membership_nested_leaf | 29.2140μs | 2.7090μs | 369.1374 KOps/s | 326.7643 KOps/s | $\textbf{\color{#35bf28}+12.97\\%}$ | | test_membership_stacked_nested | 28.4840μs | 2.6800μs | 373.1388 KOps/s | 369.4733 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_membership_stacked_nested_leaf | 30.5770μs | 2.7093μs | 369.0963 KOps/s | 369.4705 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_membership_nested_last | 41.9380μs | 3.9699μs | 251.8933 KOps/s | 250.1299 KOps/s | $\color{#35bf28}+0.71\\%$ | | test_membership_nested_leaf_last | 24.2750μs | 4.0002μs | 249.9870 KOps/s | 250.9570 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_membership_stacked_nested_last | 58.9900μs | 12.9737μs | 77.0792 KOps/s | 252.3239 KOps/s | $\textbf{\color{#d91a1a}-69.45\\%}$ | | test_membership_stacked_nested_leaf_last | 43.1510μs | 13.1713μs | 75.9226 KOps/s | 248.8684 KOps/s | $\textbf{\color{#d91a1a}-69.49\\%}$ | | test_nested_getleaf | 57.0260μs | 11.0861μs | 90.2031 KOps/s | 95.1283 KOps/s | $\textbf{\color{#d91a1a}-5.18\\%}$ | | test_nested_get | 51.8170μs | 10.6157μs | 94.1997 KOps/s | 99.3060 KOps/s | $\textbf{\color{#d91a1a}-5.14\\%}$ | | test_stacked_getleaf | 50.0640μs | 10.9301μs | 91.4908 KOps/s | 94.0257 KOps/s | $\color{#d91a1a}-2.70\\%$ | | test_stacked_get | 33.6420μs | 10.4294μs | 95.8824 KOps/s | 99.0112 KOps/s | $\color{#d91a1a}-3.16\\%$ | | test_nested_getitemleaf | 53.8700μs | 11.3772μs | 87.8949 KOps/s | 90.0638 KOps/s | $\color{#d91a1a}-2.41\\%$ | | test_nested_getitem | 60.3820μs | 10.5287μs | 94.9789 KOps/s | 97.7026 KOps/s | $\color{#d91a1a}-2.79\\%$ | | test_stacked_getitemleaf | 40.4650μs | 11.2858μs | 88.6073 KOps/s | 89.9988 KOps/s | $\color{#d91a1a}-1.55\\%$ | | test_stacked_getitem | 56.8350μs | 10.5348μs | 94.9237 KOps/s | 97.4208 KOps/s | $\color{#d91a1a}-2.56\\%$ | | test_lock_nested | 3.0446ms | 0.5262ms | 1.9006 KOps/s | 1.6648 KOps/s | $\textbf{\color{#35bf28}+14.16\\%}$ | | test_lock_stack_nested | 0.6106ms | 0.4677ms | 2.1381 KOps/s | 2.0306 KOps/s | $\textbf{\color{#35bf28}+5.29\\%}$ | | test_unlock_nested | 0.9455ms | 0.4449ms | 2.2478 KOps/s | 1.9542 KOps/s | $\textbf{\color{#35bf28}+15.03\\%}$ | | test_unlock_stack_nested | 0.5492ms | 0.3819ms | 2.6186 KOps/s | 2.4484 KOps/s | $\textbf{\color{#35bf28}+6.95\\%}$ | | test_flatten_speed | 0.2396ms | 0.1078ms | 9.2736 KOps/s | 9.3572 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_unflatten_speed | 0.6077ms | 0.4529ms | 2.2078 KOps/s | 2.2503 KOps/s | $\color{#d91a1a}-1.89\\%$ | | test_common_ops | 5.3983ms | 1.1482ms | 870.9199 Ops/s | 850.5103 Ops/s | $\color{#35bf28}+2.40\\%$ | | test_creation | 26.3890μs | 2.5970μs | 385.0580 KOps/s | 399.5637 KOps/s | $\color{#d91a1a}-3.63\\%$ | | test_creation_empty | 65.9130μs | 18.9621μs | 52.7368 KOps/s | 48.3839 KOps/s | $\textbf{\color{#35bf28}+9.00\\%}$ | | test_creation_nested_1 | 50.9250μs | 22.1614μs | 45.1235 KOps/s | 42.3411 KOps/s | $\textbf{\color{#35bf28}+6.57\\%}$ | | test_creation_nested_2 | 90.2990μs | 26.4394μs | 37.8224 KOps/s | 35.5785 KOps/s | $\textbf{\color{#35bf28}+6.31\\%}$ | | test_clone | 87.3230μs | 17.8868μs | 55.9072 KOps/s | 55.9399 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_getitem[int] | 0.9763ms | 13.3332μs | 75.0008 KOps/s | 76.8919 KOps/s | $\color{#d91a1a}-2.46\\%$ | | test_getitem[slice_int] | 0.1541ms | 34.0167μs | 29.3973 KOps/s | 30.5000 KOps/s | $\color{#d91a1a}-3.62\\%$ | | test_getitem[range] | 0.4018ms | 57.1003μs | 17.5131 KOps/s | 17.5687 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_getitem[tuple] | 0.1322ms | 27.2979μs | 36.6328 KOps/s | 36.5675 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_getitem[list] | 0.2565ms | 52.0509μs | 19.2120 KOps/s | 19.3084 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_setitem_dim[int] | 57.7170μs | 32.7733μs | 30.5126 KOps/s | 26.6784 KOps/s | $\textbf{\color{#35bf28}+14.37\\%}$ | | test_setitem_dim[slice_int] | 0.1685ms | 70.7185μs | 14.1406 KOps/s | 13.1765 KOps/s | $\textbf{\color{#35bf28}+7.32\\%}$ | | test_setitem_dim[range] | 0.1389ms | 92.2327μs | 10.8421 KOps/s | 10.1312 KOps/s | $\textbf{\color{#35bf28}+7.02\\%}$ | | test_setitem_dim[tuple] | 0.1077ms | 58.9015μs | 16.9775 KOps/s | 15.6625 KOps/s | $\textbf{\color{#35bf28}+8.40\\%}$ | | test_setitem | 0.1111ms | 30.8819μs | 32.3814 KOps/s | 31.3893 KOps/s | $\color{#35bf28}+3.16\\%$ | | test_set | 0.1247ms | 30.0619μs | 33.2647 KOps/s | 31.5649 KOps/s | $\textbf{\color{#35bf28}+5.39\\%}$ | | test_set_shared | 3.8803ms | 0.2219ms | 4.5075 KOps/s | 4.4708 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_update | 0.1658ms | 36.6309μs | 27.2994 KOps/s | 25.5093 KOps/s | $\textbf{\color{#35bf28}+7.02\\%}$ | | test_update_nested | 0.2261ms | 47.4724μs | 21.0649 KOps/s | 19.9409 KOps/s | $\textbf{\color{#35bf28}+5.64\\%}$ | | test_update__nested | 0.1950ms | 34.8115μs | 28.7262 KOps/s | 28.3219 KOps/s | $\color{#35bf28}+1.43\\%$ | | test_set_nested | 0.1649ms | 32.2656μs | 30.9928 KOps/s | 30.0308 KOps/s | $\color{#35bf28}+3.20\\%$ | | test_set_nested_new | 0.1400ms | 37.0331μs | 27.0029 KOps/s | 25.7790 KOps/s | $\color{#35bf28}+4.75\\%$ | | test_select | 0.1462ms | 54.8544μs | 18.2301 KOps/s | 18.1048 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_select_nested | 0.1283ms | 60.5448μs | 16.5167 KOps/s | 16.6768 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_exclude_nested | 0.1628ms | 81.6546μs | 12.2467 KOps/s | 12.4947 KOps/s | $\color{#d91a1a}-1.98\\%$ | | test_empty[True] | 0.4698ms | 0.3475ms | 2.8775 KOps/s | 2.9709 KOps/s | $\color{#d91a1a}-3.14\\%$ | | test_empty[False] | 11.7895μs | 1.2930μs | 773.3822 KOps/s | 819.6408 KOps/s | $\textbf{\color{#d91a1a}-5.64\\%}$ | | test_unbind_speed | 0.7840ms | 0.3350ms | 2.9849 KOps/s | 2.9951 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_unbind_speed_stack0 | 0.5580ms | 0.3086ms | 3.2409 KOps/s | 3.0356 KOps/s | $\textbf{\color{#35bf28}+6.76\\%}$ | | test_unbind_speed_stack1 | 84.0227ms | 0.8102ms | 1.2342 KOps/s | 1.3879 KOps/s | $\textbf{\color{#d91a1a}-11.07\\%}$ | | test_split | 84.2526ms | 2.2327ms | 447.8861 Ops/s | 425.0238 Ops/s | $\textbf{\color{#35bf28}+5.38\\%}$ | | test_chunk | 83.9164ms | 2.2321ms | 448.0180 Ops/s | 420.9829 Ops/s | $\textbf{\color{#35bf28}+6.42\\%}$ | | test_creation[device0] | 0.2793ms | 0.1227ms | 8.1528 KOps/s | 8.0588 KOps/s | $\color{#35bf28}+1.17\\%$ | | test_creation_from_tensor | 3.3868ms | 0.1229ms | 8.1370 KOps/s | 8.1320 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_add_one[memmap_tensor0] | 0.1872ms | 7.6793μs | 130.2203 KOps/s | 120.5144 KOps/s | $\textbf{\color{#35bf28}+8.05\\%}$ | | test_contiguous[memmap_tensor0] | 25.9180μs | 2.2112μs | 452.2454 KOps/s | 450.3832 KOps/s | $\color{#35bf28}+0.41\\%$ | | test_stack[memmap_tensor0] | 72.2750μs | 6.0014μs | 166.6291 KOps/s | 164.9375 KOps/s | $\color{#35bf28}+1.03\\%$ | | test_memmaptd_index | 1.1382ms | 0.4386ms | 2.2802 KOps/s | 2.2862 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_memmaptd_index_astensor | 0.7801ms | 0.5166ms | 1.9357 KOps/s | 1.9206 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_memmaptd_index_op | 1.4163ms | 1.0637ms | 940.1244 Ops/s | 900.7616 Ops/s | $\color{#35bf28}+4.37\\%$ | | test_serialize_model | 0.1352s | 0.1283s | 7.7947 Ops/s | 7.0631 Ops/s | $\textbf{\color{#35bf28}+10.36\\%}$ | | test_serialize_model_pickle | 0.4465s | 0.3941s | 2.5377 Ops/s | 2.4965 Ops/s | $\color{#35bf28}+1.65\\%$ | | test_serialize_weights | 0.1457s | 0.1292s | 7.7397 Ops/s | 7.8621 Ops/s | $\color{#d91a1a}-1.56\\%$ | | test_serialize_weights_returnearly | 0.1858s | 0.1664s | 6.0096 Ops/s | 6.0906 Ops/s | $\color{#d91a1a}-1.33\\%$ | | test_serialize_weights_pickle | 1.0404s | 0.7489s | 1.3352 Ops/s | 2.4615 Ops/s | $\textbf{\color{#d91a1a}-45.76\\%}$ | | test_serialize_weights_filesystem | 0.1477s | 0.1448s | 6.9084 Ops/s | 6.9730 Ops/s | $\color{#d91a1a}-0.93\\%$ | | test_serialize_model_filesystem | 0.1552s | 0.1490s | 6.7123 Ops/s | 6.7188 Ops/s | $\color{#d91a1a}-0.10\\%$ | | test_reshape_pytree | 89.7170μs | 39.4581μs | 25.3434 KOps/s | 25.4025 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_reshape_td | 0.1352ms | 49.5370μs | 20.1869 KOps/s | 19.4473 KOps/s | $\color{#35bf28}+3.80\\%$ | | test_view_pytree | 95.6790μs | 39.2206μs | 25.4968 KOps/s | 25.6273 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_view_td | 0.1187ms | 55.1220μs | 18.1416 KOps/s | 17.2698 KOps/s | $\textbf{\color{#35bf28}+5.05\\%}$ | | test_unbind_pytree | 84.5980μs | 35.7377μs | 27.9816 KOps/s | 28.1659 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_unbind_td | 0.3589ms | 48.6826μs | 20.5412 KOps/s | 17.8744 KOps/s | $\textbf{\color{#35bf28}+14.92\\%}$ | | test_split_pytree | 0.1077ms | 39.1108μs | 25.5684 KOps/s | 26.4003 KOps/s | $\color{#d91a1a}-3.15\\%$ | | test_split_td | 0.5476ms | 60.5027μs | 16.5282 KOps/s | 16.1651 KOps/s | $\color{#35bf28}+2.25\\%$ | | test_add_pytree | 0.1217ms | 44.0748μs | 22.6887 KOps/s | 22.4682 KOps/s | $\color{#35bf28}+0.98\\%$ | | test_add_td | 0.1734ms | 82.1569μs | 12.1718 KOps/s | 11.2424 KOps/s | $\textbf{\color{#35bf28}+8.27\\%}$ | | test_distributed | 0.2937ms | 0.1318ms | 7.5861 KOps/s | 7.5281 KOps/s | $\color{#35bf28}+0.77\\%$ | | test_tdmodule | 40.7560μs | 17.5237μs | 57.0654 KOps/s | 57.2777 KOps/s | $\color{#d91a1a}-0.37\\%$ | | test_tdmodule_dispatch | 78.8770μs | 37.2370μs | 26.8550 KOps/s | 26.9176 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_tdseq | 41.9380μs | 19.6569μs | 50.8726 KOps/s | 52.0305 KOps/s | $\color{#d91a1a}-2.23\\%$ | | test_tdseq_dispatch | 85.8200μs | 43.0589μs | 23.2240 KOps/s | 24.0971 KOps/s | $\color{#d91a1a}-3.62\\%$ | | test_instantiation_functorch | 2.9127ms | 1.5996ms | 625.1481 Ops/s | 637.1188 Ops/s | $\color{#d91a1a}-1.88\\%$ | | test_instantiation_td | 1.8868ms | 1.1770ms | 849.5863 Ops/s | 882.3800 Ops/s | $\color{#d91a1a}-3.72\\%$ | | test_exec_functorch | 0.3198ms | 0.1801ms | 5.5526 KOps/s | 5.4188 KOps/s | $\color{#35bf28}+2.47\\%$ | | test_exec_functional_call | 0.2930ms | 0.1693ms | 5.9068 KOps/s | 5.7416 KOps/s | $\color{#35bf28}+2.88\\%$ | | test_exec_td | 0.3253ms | 0.1722ms | 5.8081 KOps/s | 5.8716 KOps/s | $\color{#d91a1a}-1.08\\%$ | | test_exec_td_decorator | 83.8653ms | 0.3218ms | 3.1079 KOps/s | 3.9247 KOps/s | $\textbf{\color{#d91a1a}-20.81\\%}$ | | test_vmap_mlp_speed[True-True] | 0.8064ms | 0.6111ms | 1.6365 KOps/s | 1.6191 KOps/s | $\color{#35bf28}+1.07\\%$ | | test_vmap_mlp_speed[True-False] | 1.3898ms | 0.6191ms | 1.6154 KOps/s | 1.6322 KOps/s | $\color{#d91a1a}-1.03\\%$ | | test_vmap_mlp_speed[False-True] | 0.7495ms | 0.4962ms | 2.0152 KOps/s | 1.9881 KOps/s | $\color{#35bf28}+1.36\\%$ | | test_vmap_mlp_speed[False-False] | 0.6995ms | 0.4954ms | 2.0185 KOps/s | 1.9735 KOps/s | $\color{#35bf28}+2.28\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.2148ms | 0.6980ms | 1.4327 KOps/s | 1.4159 KOps/s | $\color{#35bf28}+1.18\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 1.2134ms | 0.7026ms | 1.4233 KOps/s | 1.4189 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8101ms | 0.5771ms | 1.7328 KOps/s | 1.7194 KOps/s | $\color{#35bf28}+0.78\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7292ms | 0.5776ms | 1.7314 KOps/s | 1.7054 KOps/s | $\color{#35bf28}+1.52\\%$ | | test_to_module_speed[True] | 2.8508ms | 1.8597ms | 537.7103 Ops/s | 558.4528 Ops/s | $\color{#d91a1a}-3.71\\%$ | | test_to_module_speed[False] | 4.1727ms | 1.8502ms | 540.4939 Ops/s | 565.2006 Ops/s | $\color{#d91a1a}-4.37\\%$ | | test_tc_init | 0.1001ms | 43.9544μs | 22.7508 KOps/s | 22.0774 KOps/s | $\color{#35bf28}+3.05\\%$ | | test_tc_init_nested | 0.1874ms | 87.2828μs | 11.4570 KOps/s | 11.1972 KOps/s | $\color{#35bf28}+2.32\\%$ | | test_tc_first_layer_tensor | 51.5160μs | 9.4453μs | 105.8731 KOps/s | 109.4267 KOps/s | $\color{#d91a1a}-3.25\\%$ | | test_tc_first_layer_nontensor | 40.6260μs | 9.2622μs | 107.9654 KOps/s | 108.8915 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_tc_second_layer_tensor | 78.9380μs | 2.9365μs | 340.5458 KOps/s | 356.0622 KOps/s | $\color{#d91a1a}-4.36\\%$ | | test_tc_second_layer_nontensor | 35.0760μs | 10.5991μs | 94.3479 KOps/s | 98.3132 KOps/s | $\color{#d91a1a}-4.03\\%$ | | test_unbind | 0.1090s | 13.9113ms | 71.8840 Ops/s | 69.7159 Ops/s | $\color{#35bf28}+3.11\\%$ | | test_full_like | 10.4053ms | 8.4895ms | 117.7920 Ops/s | 141.7561 Ops/s | $\textbf{\color{#d91a1a}-16.91\\%}$ | | test_zeros_like | 11.7537ms | 7.0446ms | 141.9529 Ops/s | 142.5975 Ops/s | $\color{#d91a1a}-0.45\\%$ | | test_ones_like | 16.4042ms | 8.1388ms | 122.8679 Ops/s | 133.5150 Ops/s | $\textbf{\color{#d91a1a}-7.97\\%}$ | | test_clone | 18.6706ms | 10.0841ms | 99.1663 Ops/s | 109.6168 Ops/s | $\textbf{\color{#d91a1a}-9.53\\%}$ | | test_squeeze | 71.2320μs | 15.1322μs | 66.0841 KOps/s | 69.0604 KOps/s | $\color{#d91a1a}-4.31\\%$ | | test_unsqueeze | 0.1650ms | 97.9576μs | 10.2085 KOps/s | 10.1546 KOps/s | $\color{#35bf28}+0.53\\%$ | | test_split | 0.4583ms | 0.2108ms | 4.7432 KOps/s | 4.7645 KOps/s | $\color{#d91a1a}-0.45\\%$ | | test_permute | 0.3733ms | 0.2241ms | 4.4618 KOps/s | 4.3284 KOps/s | $\color{#35bf28}+3.08\\%$ | | test_stack | 33.9953ms | 26.9606ms | 37.0912 Ops/s | 40.4623 Ops/s | $\textbf{\color{#d91a1a}-8.33\\%}$ | | test_cat | 38.1198ms | 28.0177ms | 35.6917 Ops/s | 40.1871 Ops/s | $\textbf{\color{#d91a1a}-11.19\\%}$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}25$. Worsened: $\large\color{#d91a1a}14$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 27.4200μs | 16.6275μs | 60.1413 KOps/s | 55.7803 KOps/s | $\textbf{\color{#35bf28}+7.82\\%}$ | | test_plain_set_stack_nested | 34.6610μs | 16.2696μs | 61.4642 KOps/s | 56.1647 KOps/s | $\textbf{\color{#35bf28}+9.44\\%}$ | | test_plain_set_nested_inplace | 37.9720μs | 17.4744μs | 57.2267 KOps/s | 52.2553 KOps/s | $\textbf{\color{#35bf28}+9.51\\%}$ | | test_plain_set_stack_nested_inplace | 45.0410μs | 17.2813μs | 57.8659 KOps/s | 52.3854 KOps/s | $\textbf{\color{#35bf28}+10.46\\%}$ | | test_items | 23.4900μs | 4.7139μs | 212.1371 KOps/s | 211.2163 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_items_nested | 0.4542ms | 0.3949ms | 2.5322 KOps/s | 2.5693 KOps/s | $\color{#d91a1a}-1.44\\%$ | | test_items_nested_locked | 0.4581ms | 0.4004ms | 2.4976 KOps/s | 2.5179 KOps/s | $\color{#d91a1a}-0.81\\%$ | | test_items_nested_leaf | 0.1276ms | 87.2761μs | 11.4579 KOps/s | 11.4943 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_items_stack_nested | 0.4529ms | 0.3944ms | 2.5355 KOps/s | 2.5452 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_items_stack_nested_leaf | 0.1203ms | 88.4886μs | 11.3009 KOps/s | 11.2733 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_items_stack_nested_locked | 0.4573ms | 0.4020ms | 2.4874 KOps/s | 2.5462 KOps/s | $\color{#d91a1a}-2.31\\%$ | | test_keys | 21.9100μs | 4.4491μs | 224.7628 KOps/s | 226.7229 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_keys_nested | 83.7120μs | 67.0394μs | 14.9166 KOps/s | 14.7062 KOps/s | $\color{#35bf28}+1.43\\%$ | | test_keys_nested_locked | 1.8916ms | 75.0286μs | 13.3283 KOps/s | 13.4313 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_keys_nested_leaf | 77.3810μs | 57.5560μs | 17.3744 KOps/s | 17.5021 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_keys_stack_nested | 84.0520μs | 67.4978μs | 14.8153 KOps/s | 15.0656 KOps/s | $\color{#d91a1a}-1.66\\%$ | | test_keys_stack_nested_leaf | 75.3110μs | 57.9153μs | 17.2666 KOps/s | 17.0262 KOps/s | $\color{#35bf28}+1.41\\%$ | | test_keys_stack_nested_locked | 93.4210μs | 74.6499μs | 13.3959 KOps/s | 13.4594 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_values | 9.5907μs | 1.8011μs | 555.2276 KOps/s | 555.0669 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_values_nested | 97.7720μs | 34.4806μs | 29.0018 KOps/s | 29.5414 KOps/s | $\color{#d91a1a}-1.83\\%$ | | test_values_nested_locked | 51.0410μs | 35.8828μs | 27.8685 KOps/s | 27.8277 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_values_nested_leaf | 50.7110μs | 30.3097μs | 32.9927 KOps/s | 33.0660 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_values_stack_nested | 58.1210μs | 35.0536μs | 28.5277 KOps/s | 28.7017 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_values_stack_nested_leaf | 46.9910μs | 31.5750μs | 31.6706 KOps/s | 32.8461 KOps/s | $\color{#d91a1a}-3.58\\%$ | | test_values_stack_nested_locked | 66.4110μs | 37.3864μs | 26.7477 KOps/s | 27.6401 KOps/s | $\color{#d91a1a}-3.23\\%$ | | test_membership | 2.1212μs | 0.5777μs | 1.7309 MOps/s | 1.7825 MOps/s | $\color{#d91a1a}-2.90\\%$ | | test_membership_nested | 17.9100μs | 2.0491μs | 488.0199 KOps/s | 486.1990 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_membership_nested_leaf | 13.1505μs | 2.0007μs | 499.8188 KOps/s | 513.0497 KOps/s | $\color{#d91a1a}-2.58\\%$ | | test_membership_stacked_nested | 22.7900μs | 2.0996μs | 476.2768 KOps/s | 473.4756 KOps/s | $\color{#35bf28}+0.59\\%$ | | test_membership_stacked_nested_leaf | 15.2700μs | 2.0881μs | 478.9039 KOps/s | 479.9968 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_membership_nested_last | 21.8500μs | 3.0348μs | 329.5138 KOps/s | 335.7276 KOps/s | $\color{#d91a1a}-1.85\\%$ | | test_membership_nested_leaf_last | 15.0500μs | 2.9993μs | 333.4155 KOps/s | 333.7747 KOps/s | $\color{#d91a1a}-0.11\\%$ | | test_membership_stacked_nested_last | 25.3300μs | 3.8235μs | 261.5373 KOps/s | 326.2389 KOps/s | $\textbf{\color{#d91a1a}-19.83\\%}$ | | test_membership_stacked_nested_leaf_last | 15.1700μs | 3.7916μs | 263.7442 KOps/s | 331.4646 KOps/s | $\textbf{\color{#d91a1a}-20.43\\%}$ | | test_nested_getleaf | 25.8500μs | 8.1136μs | 123.2495 KOps/s | 124.1011 KOps/s | $\color{#d91a1a}-0.69\\%$ | | test_nested_get | 28.8100μs | 7.5317μs | 132.7729 KOps/s | 132.6260 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_stacked_getleaf | 25.6900μs | 8.0805μs | 123.7546 KOps/s | 123.7242 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_stacked_get | 23.5000μs | 7.5628μs | 132.2268 KOps/s | 131.9494 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_nested_getitemleaf | 22.2410μs | 8.3263μs | 120.1012 KOps/s | 121.8553 KOps/s | $\color{#d91a1a}-1.44\\%$ | | test_nested_getitem | 24.0600μs | 7.6826μs | 130.1643 KOps/s | 129.3548 KOps/s | $\color{#35bf28}+0.63\\%$ | | test_stacked_getitemleaf | 24.1310μs | 8.1735μs | 122.3466 KOps/s | 121.2356 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_stacked_getitem | 24.9000μs | 7.6947μs | 129.9590 KOps/s | 128.4970 KOps/s | $\color{#35bf28}+1.14\\%$ | | test_lock_nested | 4.3851ms | 0.4808ms | 2.0800 KOps/s | 2.1162 KOps/s | $\color{#d91a1a}-1.71\\%$ | | test_lock_stack_nested | 0.5020ms | 0.4363ms | 2.2917 KOps/s | 2.3101 KOps/s | $\color{#d91a1a}-0.80\\%$ | | test_unlock_nested | 0.8696ms | 0.4014ms | 2.4913 KOps/s | 2.5255 KOps/s | $\color{#d91a1a}-1.35\\%$ | | test_unlock_stack_nested | 0.4254ms | 0.3539ms | 2.8253 KOps/s | 2.8522 KOps/s | $\color{#d91a1a}-0.95\\%$ | | test_flatten_speed | 0.2139ms | 0.1085ms | 9.2139 KOps/s | 9.3483 KOps/s | $\color{#d91a1a}-1.44\\%$ | | test_unflatten_speed | 0.3890ms | 0.3027ms | 3.3033 KOps/s | 3.3528 KOps/s | $\color{#d91a1a}-1.48\\%$ | | test_common_ops | 1.7608ms | 1.3041ms | 766.8076 Ops/s | 755.4838 Ops/s | $\color{#35bf28}+1.50\\%$ | | test_creation | 16.2310μs | 2.0390μs | 490.4280 KOps/s | 500.3419 KOps/s | $\color{#d91a1a}-1.98\\%$ | | test_creation_empty | 34.2510μs | 15.9307μs | 62.7720 KOps/s | 52.4190 KOps/s | $\textbf{\color{#35bf28}+19.75\\%}$ | | test_creation_nested_1 | 38.2400μs | 17.7328μs | 56.3927 KOps/s | 48.2570 KOps/s | $\textbf{\color{#35bf28}+16.86\\%}$ | | test_creation_nested_2 | 38.1710μs | 20.8432μs | 47.9773 KOps/s | 42.5916 KOps/s | $\textbf{\color{#35bf28}+12.64\\%}$ | | test_clone | 53.7310μs | 32.3456μs | 30.9161 KOps/s | 34.1344 KOps/s | $\textbf{\color{#d91a1a}-9.43\\%}$ | | test_getitem[int] | 1.1253ms | 17.1136μs | 58.4330 KOps/s | 60.4198 KOps/s | $\color{#d91a1a}-3.29\\%$ | | test_getitem[slice_int] | 0.1488ms | 28.5179μs | 35.0656 KOps/s | 34.8834 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_getitem[range] | 0.2962ms | 0.1139ms | 8.7823 KOps/s | 8.7827 KOps/s | $-0.00\\%$ | | test_getitem[tuple] | 0.1531ms | 25.5525μs | 39.1352 KOps/s | 39.1228 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_getitem[list] | 0.2259ms | 0.1026ms | 9.7501 KOps/s | 9.5003 KOps/s | $\color{#35bf28}+2.63\\%$ | | test_setitem_dim[int] | 74.2120μs | 51.6020μs | 19.3791 KOps/s | 17.8856 KOps/s | $\textbf{\color{#35bf28}+8.35\\%}$ | | test_setitem_dim[slice_int] | 0.1273ms | 81.9282μs | 12.2058 KOps/s | 12.8242 KOps/s | $\color{#d91a1a}-4.82\\%$ | | test_setitem_dim[range] | 0.1727ms | 0.1451ms | 6.8935 KOps/s | 7.0253 KOps/s | $\color{#d91a1a}-1.88\\%$ | | test_setitem_dim[tuple] | 90.5820μs | 73.8534μs | 13.5403 KOps/s | 13.5004 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_setitem | 72.2510μs | 46.8522μs | 21.3437 KOps/s | 22.8240 KOps/s | $\textbf{\color{#d91a1a}-6.49\\%}$ | | test_set | 68.3710μs | 45.5662μs | 21.9461 KOps/s | 23.1938 KOps/s | $\textbf{\color{#d91a1a}-5.38\\%}$ | | test_set_shared | 0.3999ms | 56.0141μs | 17.8527 KOps/s | 18.6479 KOps/s | $\color{#d91a1a}-4.26\\%$ | | test_update | 83.2010μs | 50.4002μs | 19.8412 KOps/s | 18.9368 KOps/s | $\color{#35bf28}+4.78\\%$ | | test_update_nested | 95.6010μs | 61.0634μs | 16.3764 KOps/s | 15.6692 KOps/s | $\color{#35bf28}+4.51\\%$ | | test_update__nested | 0.1022ms | 65.7648μs | 15.2057 KOps/s | 16.6525 KOps/s | $\textbf{\color{#d91a1a}-8.69\\%}$ | | test_set_nested | 82.2510μs | 48.5883μs | 20.5811 KOps/s | 21.7755 KOps/s | $\textbf{\color{#d91a1a}-5.49\\%}$ | | test_set_nested_new | 0.5492ms | 52.0438μs | 19.2146 KOps/s | 19.3044 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_select | 0.1079ms | 68.2853μs | 14.6444 KOps/s | 15.3077 KOps/s | $\color{#d91a1a}-4.33\\%$ | | test_select_nested | 79.6520μs | 54.0718μs | 18.4939 KOps/s | 18.2731 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_exclude_nested | 0.1060ms | 72.3312μs | 13.8253 KOps/s | 14.0302 KOps/s | $\color{#d91a1a}-1.46\\%$ | | test_empty[True] | 0.3722ms | 0.3030ms | 3.3000 KOps/s | 3.3181 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_empty[False] | 2.5312μs | 0.9924μs | 1.0077 MOps/s | 1.0478 MOps/s | $\color{#d91a1a}-3.82\\%$ | | test_to | 64.8310μs | 38.7991μs | 25.7738 KOps/s | 26.5540 KOps/s | $\color{#d91a1a}-2.94\\%$ | | test_to_nonblocking | 48.5810μs | 23.7870μs | 42.0397 KOps/s | 42.3945 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_unbind_speed | 0.3826ms | 0.3264ms | 3.0640 KOps/s | 3.3060 KOps/s | $\textbf{\color{#d91a1a}-7.32\\%}$ | | test_unbind_speed_stack0 | 0.3973ms | 0.3164ms | 3.1607 KOps/s | 3.3078 KOps/s | $\color{#d91a1a}-4.45\\%$ | | test_unbind_speed_stack1 | 95.2872ms | 0.8771ms | 1.1402 KOps/s | 1.2525 KOps/s | $\textbf{\color{#d91a1a}-8.97\\%}$ | | test_split | 2.4179ms | 2.0861ms | 479.3662 Ops/s | 430.8863 Ops/s | $\textbf{\color{#35bf28}+11.25\\%}$ | | test_chunk | 95.1120ms | 2.5063ms | 398.9897 Ops/s | 429.7260 Ops/s | $\textbf{\color{#d91a1a}-7.15\\%}$ | | test_creation[device0] | 0.1423ms | 0.1030ms | 9.7088 KOps/s | 9.8276 KOps/s | $\color{#d91a1a}-1.21\\%$ | | test_creation_from_tensor | 0.1555ms | 0.1006ms | 9.9413 KOps/s | 10.0686 KOps/s | $\color{#d91a1a}-1.26\\%$ | | test_add_one[memmap_tensor0] | 20.9110μs | 8.8569μs | 112.9064 KOps/s | 115.6682 KOps/s | $\color{#d91a1a}-2.39\\%$ | | test_contiguous[memmap_tensor0] | 23.6810μs | 2.1782μs | 459.0896 KOps/s | 473.0169 KOps/s | $\color{#d91a1a}-2.94\\%$ | | test_stack[memmap_tensor0] | 32.5510μs | 6.7225μs | 148.7537 KOps/s | 153.9780 KOps/s | $\color{#d91a1a}-3.39\\%$ | | test_memmaptd_index | 1.0577ms | 0.4180ms | 2.3923 KOps/s | 2.4417 KOps/s | $\color{#d91a1a}-2.02\\%$ | | test_memmaptd_index_astensor | 0.9072ms | 0.4929ms | 2.0290 KOps/s | 2.0789 KOps/s | $\color{#d91a1a}-2.40\\%$ | | test_memmaptd_index_op | 1.4730ms | 1.0034ms | 996.5980 Ops/s | 960.0169 Ops/s | $\color{#35bf28}+3.81\\%$ | | test_serialize_model | 99.9471ms | 95.2779ms | 10.4956 Ops/s | 10.2492 Ops/s | $\color{#35bf28}+2.40\\%$ | | test_serialize_model_pickle | 1.3529s | 1.2384s | 0.8075 Ops/s | 0.8059 Ops/s | $\color{#35bf28}+0.21\\%$ | | test_serialize_weights | 0.1884s | 0.1033s | 9.6783 Ops/s | 10.2960 Ops/s | $\textbf{\color{#d91a1a}-6.00\\%}$ | | test_serialize_weights_returnearly | 75.3639ms | 70.1493ms | 14.2553 Ops/s | 11.5553 Ops/s | $\textbf{\color{#35bf28}+23.37\\%}$ | | test_serialize_weights_pickle | 1.4034s | 1.2454s | 0.8030 Ops/s | 0.8085 Ops/s | $\color{#d91a1a}-0.68\\%$ | | test_reshape_pytree | 64.8910μs | 38.2897μs | 26.1167 KOps/s | 26.2489 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_reshape_td | 0.2570ms | 43.7454μs | 22.8596 KOps/s | 22.5761 KOps/s | $\color{#35bf28}+1.26\\%$ | | test_view_pytree | 60.6910μs | 37.7317μs | 26.5029 KOps/s | 26.4404 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_view_td | 0.2525ms | 53.7250μs | 18.6133 KOps/s | 19.7699 KOps/s | $\textbf{\color{#d91a1a}-5.85\\%}$ | | test_unbind_pytree | 68.3010μs | 37.0955μs | 26.9574 KOps/s | 27.6617 KOps/s | $\color{#d91a1a}-2.55\\%$ | | test_unbind_td | 91.6892ms | 52.8417μs | 18.9245 KOps/s | 21.7569 KOps/s | $\textbf{\color{#d91a1a}-13.02\\%}$ | | test_split_pytree | 78.9810μs | 50.6513μs | 19.7428 KOps/s | 19.6698 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_split_td | 0.2648ms | 58.9889μs | 16.9523 KOps/s | 16.6808 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_add_pytree | 95.3930μs | 58.0741μs | 17.2194 KOps/s | 17.4374 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_add_td | 0.3776ms | 90.2655μs | 11.0784 KOps/s | 10.5932 KOps/s | $\color{#35bf28}+4.58\\%$ | | test_compile_add_one_nested[tensordict-compile] | 0.4127ms | 0.2110ms | 4.7383 KOps/s | 4.9021 KOps/s | $\color{#d91a1a}-3.34\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.2740ms | 0.1738ms | 5.7552 KOps/s | 5.7834 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.2130ms | 0.1430ms | 6.9919 KOps/s | 7.0930 KOps/s | $\color{#d91a1a}-1.43\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.2486ms | 0.1899ms | 5.2671 KOps/s | 5.2768 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_compile_copy_nested[tensordict-compile] | 42.2100μs | 21.6576μs | 46.1732 KOps/s | 44.6901 KOps/s | $\color{#35bf28}+3.32\\%$ | | test_compile_copy_nested[tensordict-eager] | 82.3410μs | 49.9581μs | 20.0168 KOps/s | 20.7224 KOps/s | $\color{#d91a1a}-3.41\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1207ms | 74.3966μs | 13.4415 KOps/s | 13.6327 KOps/s | $\color{#d91a1a}-1.40\\%$ | | test_compile_copy_nested[pytree-eager] | 92.1710μs | 61.1767μs | 16.3461 KOps/s | 16.5005 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.4201ms | 0.3308ms | 3.0226 KOps/s | 3.1329 KOps/s | $\color{#d91a1a}-3.52\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.3150ms | 0.2255ms | 4.4337 KOps/s | 4.4753 KOps/s | $\color{#d91a1a}-0.93\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.1811ms | 0.1313ms | 7.6137 KOps/s | 7.9805 KOps/s | $\color{#d91a1a}-4.60\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.1301ms | 63.2368μs | 15.8136 KOps/s | 15.9363 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.3934ms | 0.3271ms | 3.0567 KOps/s | 3.1549 KOps/s | $\color{#d91a1a}-3.11\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.6748ms | 0.6060ms | 1.6500 KOps/s | 1.6164 KOps/s | $\color{#35bf28}+2.08\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.3360ms | 0.2753ms | 3.6323 KOps/s | 3.6553 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.3949ms | 0.3301ms | 3.0292 KOps/s | 3.1406 KOps/s | $\color{#d91a1a}-3.55\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.1755ms | 76.4164μs | 13.0862 KOps/s | 12.9380 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.1883ms | 0.1315ms | 7.6020 KOps/s | 7.9416 KOps/s | $\color{#d91a1a}-4.28\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.5906ms | 0.5272ms | 1.8969 KOps/s | 1.7707 KOps/s | $\textbf{\color{#35bf28}+7.13\\%}$ | | test_compile_add_self_flat[pytree-compile] | 0.3908ms | 0.3218ms | 3.1072 KOps/s | 3.1230 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_compile_copy_flat[tensordict-compile] | 44.4810μs | 18.8745μs | 52.9815 KOps/s | 55.4555 KOps/s | $\color{#d91a1a}-4.46\\%$ | | test_compile_copy_flat[tensordict-eager] | 59.8610μs | 32.7118μs | 30.5700 KOps/s | 30.9447 KOps/s | $\color{#d91a1a}-1.21\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1088ms | 76.5091μs | 13.0703 KOps/s | 13.3911 KOps/s | $\color{#d91a1a}-2.40\\%$ | | test_compile_copy_flat[pytree-eager] | 84.7210μs | 61.2903μs | 16.3158 KOps/s | 16.4438 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.7824ms | 0.9872ms | 1.0130 KOps/s | 1.0374 KOps/s | $\color{#d91a1a}-2.35\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.6383ms | 3.2648ms | 306.3016 Ops/s | 307.2478 Ops/s | $\color{#d91a1a}-0.31\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.5631ms | 0.9253ms | 1.0807 KOps/s | 1.0787 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_compile_assign_and_add[pytree-eager] | 3.3847ms | 3.1636ms | 316.0940 Ops/s | 319.4569 Ops/s | $\color{#d91a1a}-1.05\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.1879ms | 0.1096ms | 9.1261 KOps/s | 9.4073 KOps/s | $\color{#d91a1a}-2.99\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.2232ms | 63.9160μs | 15.6455 KOps/s | 15.3719 KOps/s | $\color{#35bf28}+1.78\\%$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.1717ms | 0.1058ms | 9.4494 KOps/s | 10.1503 KOps/s | $\textbf{\color{#d91a1a}-6.91\\%}$ | | test_compile_indexing[tensor-tensorclass-eager] | 98.9010μs | 43.9990μs | 22.7278 KOps/s | 21.8880 KOps/s | $\color{#35bf28}+3.84\\%$ | | test_compile_indexing[tensor-pytree-compile] | 0.1720ms | 0.1036ms | 9.6538 KOps/s | 9.7274 KOps/s | $\color{#d91a1a}-0.76\\%$ | | test_compile_indexing[tensor-pytree-eager] | 79.8110μs | 43.9812μs | 22.7370 KOps/s | 21.0350 KOps/s | $\textbf{\color{#35bf28}+8.09\\%}$ | | test_compile_indexing[slice-tensordict-compile] | 0.1908ms | 0.1387ms | 7.2107 KOps/s | 7.3777 KOps/s | $\color{#d91a1a}-2.26\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.1862ms | 26.3717μs | 37.9194 KOps/s | 39.1969 KOps/s | $\color{#d91a1a}-3.26\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.1794ms | 0.1288ms | 7.7666 KOps/s | 7.9562 KOps/s | $\color{#d91a1a}-2.38\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 64.5710μs | 22.6478μs | 44.1544 KOps/s | 44.7776 KOps/s | $\color{#d91a1a}-1.39\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.2104ms | 0.1291ms | 7.7462 KOps/s | 7.9163 KOps/s | $\color{#d91a1a}-2.15\\%$ | | test_compile_indexing[slice-pytree-eager] | 50.2810μs | 22.5431μs | 44.3595 KOps/s | 41.9962 KOps/s | $\textbf{\color{#35bf28}+5.63\\%}$ | | test_compile_indexing[int-tensordict-compile] | 0.1972ms | 0.1371ms | 7.2923 KOps/s | 7.4710 KOps/s | $\color{#d91a1a}-2.39\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.5220ms | 25.4578μs | 39.2806 KOps/s | 39.6523 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.1809ms | 0.1290ms | 7.7526 KOps/s | 7.9787 KOps/s | $\color{#d91a1a}-2.83\\%$ | | test_compile_indexing[int-tensorclass-eager] | 55.5710μs | 22.5234μs | 44.3983 KOps/s | 45.5605 KOps/s | $\color{#d91a1a}-2.55\\%$ | | test_compile_indexing[int-pytree-compile] | 0.1965ms | 0.1281ms | 7.8060 KOps/s | 7.9830 KOps/s | $\color{#d91a1a}-2.22\\%$ | | test_compile_indexing[int-pytree-eager] | 53.7800μs | 22.1152μs | 45.2178 KOps/s | 45.6170 KOps/s | $\color{#d91a1a}-0.88\\%$ | | test_mod_add[eager] | 98.4710μs | 36.4800μs | 27.4123 KOps/s | 24.6742 KOps/s | $\textbf{\color{#35bf28}+11.10\\%}$ | | test_mod_add[compile] | 0.1238ms | 68.5807μs | 14.5814 KOps/s | 14.9464 KOps/s | $\color{#d91a1a}-2.44\\%$ | | test_mod_add[compile-overhead] | 0.2582ms | 0.1460ms | 6.8512 KOps/s | 6.9089 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_mod_wrap[eager] | 0.3487ms | 0.2444ms | 4.0918 KOps/s | 3.8263 KOps/s | $\textbf{\color{#35bf28}+6.94\\%}$ | | test_mod_wrap[compile] | 0.3540ms | 0.2923ms | 3.4207 KOps/s | 3.2672 KOps/s | $\color{#35bf28}+4.70\\%$ | | test_mod_wrap[compile-overhead] | 8.4125ms | 4.4096ms | 226.7754 Ops/s | 229.3415 Ops/s | $\color{#d91a1a}-1.12\\%$ | | test_mod_wrap_and_backward[eager] | 1.5140ms | 1.4022ms | 713.1805 Ops/s | 743.1192 Ops/s | $\color{#d91a1a}-4.03\\%$ | | test_mod_wrap_and_backward[compile] | 1.6520ms | 1.4435ms | 692.7837 Ops/s | 690.1405 Ops/s | $\color{#35bf28}+0.38\\%$ | | test_mod_wrap_and_backward[compile-overhead] | 1.4780ms | 0.9926ms | 1.0075 KOps/s | 1.0104 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_seq_add[eager] | 0.2204ms | 0.1069ms | 9.3524 KOps/s | 8.9882 KOps/s | $\color{#35bf28}+4.05\\%$ | | test_seq_add[compile] | 0.1353ms | 84.5005μs | 11.8342 KOps/s | 11.9647 KOps/s | $\color{#d91a1a}-1.09\\%$ | | test_seq_add[compile-overhead] | 0.1853ms | 0.1208ms | 8.2752 KOps/s | 8.1858 KOps/s | $\color{#35bf28}+1.09\\%$ | | test_seq_wrap[eager] | 0.5097ms | 0.4146ms | 2.4123 KOps/s | 2.3130 KOps/s | $\color{#35bf28}+4.29\\%$ | | test_seq_wrap[compile] | 1.5376ms | 0.3265ms | 3.0628 KOps/s | 3.0228 KOps/s | $\color{#35bf28}+1.32\\%$ | | test_seq_wrap[compile-overhead] | 0.3100s | 0.1483s | 6.7447 Ops/s | 6.7216 Ops/s | $\color{#35bf28}+0.34\\%$ | | test_func_call_runtime[False-eager] | 0.8236ms | 0.7351ms | 1.3603 KOps/s | 1.4055 KOps/s | $\color{#d91a1a}-3.22\\%$ | | test_func_call_runtime[False-compile] | 0.8681ms | 0.8063ms | 1.2403 KOps/s | 1.2316 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.4156ms | 0.3674ms | 2.7221 KOps/s | 2.7123 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_func_call_runtime[True-eager] | 1.0569ms | 0.9681ms | 1.0330 KOps/s | 1.0298 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_func_call_runtime[True-compile] | 0.9457ms | 0.8422ms | 1.1873 KOps/s | 1.1777 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.4689ms | 0.4079ms | 2.4516 KOps/s | 2.4160 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_distributed | 0.2630ms | 70.4561μs | 14.1932 KOps/s | 13.7128 KOps/s | $\color{#35bf28}+3.50\\%$ | | test_tdmodule | 83.0410μs | 15.0949μs | 66.2473 KOps/s | 54.6456 KOps/s | $\textbf{\color{#35bf28}+21.23\\%}$ | | test_tdmodule_dispatch | 50.7410μs | 30.8341μs | 32.4316 KOps/s | 27.1476 KOps/s | $\textbf{\color{#35bf28}+19.46\\%}$ | | test_tdseq | 31.8400μs | 15.7014μs | 63.6886 KOps/s | 56.0165 KOps/s | $\textbf{\color{#35bf28}+13.70\\%}$ | | test_tdseq_dispatch | 50.3010μs | 33.0111μs | 30.2928 KOps/s | 25.6177 KOps/s | $\textbf{\color{#35bf28}+18.25\\%}$ | | test_instantiation_functorch | 2.0684ms | 1.9752ms | 506.2675 Ops/s | 506.5423 Ops/s | $\color{#d91a1a}-0.05\\%$ | | test_instantiation_td | 2.0188ms | 1.3179ms | 758.7708 Ops/s | 769.3446 Ops/s | $\color{#d91a1a}-1.37\\%$ | | test_exec_functorch | 0.2772ms | 0.2202ms | 4.5422 KOps/s | 4.5484 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_exec_functional_call | 0.2605ms | 0.2087ms | 4.7908 KOps/s | 4.4861 KOps/s | $\textbf{\color{#35bf28}+6.79\\%}$ | | test_exec_td | 0.2913ms | 0.2089ms | 4.7862 KOps/s | 4.6087 KOps/s | $\color{#35bf28}+3.85\\%$ | | test_exec_td_decorator | 0.4960ms | 0.2842ms | 3.5182 KOps/s | 3.3840 KOps/s | $\color{#35bf28}+3.96\\%$ | | test_vmap_mlp_speed[True-True] | 0.7758ms | 0.6420ms | 1.5577 KOps/s | 1.4706 KOps/s | $\textbf{\color{#35bf28}+5.93\\%}$ | | test_vmap_mlp_speed[True-False] | 0.7259ms | 0.6395ms | 1.5636 KOps/s | 1.4864 KOps/s | $\textbf{\color{#35bf28}+5.20\\%}$ | | test_vmap_mlp_speed[False-True] | 0.6426ms | 0.5613ms | 1.7817 KOps/s | 1.7324 KOps/s | $\color{#35bf28}+2.85\\%$ | | test_vmap_mlp_speed[False-False] | 0.6448ms | 0.5860ms | 1.7065 KOps/s | 1.6827 KOps/s | $\color{#35bf28}+1.42\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.5539ms | 0.7235ms | 1.3821 KOps/s | 1.3546 KOps/s | $\color{#35bf28}+2.03\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9000ms | 0.7194ms | 1.3900 KOps/s | 1.3641 KOps/s | $\color{#35bf28}+1.89\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7820ms | 0.6278ms | 1.5929 KOps/s | 1.5480 KOps/s | $\color{#35bf28}+2.90\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7484ms | 0.6281ms | 1.5921 KOps/s | 1.5765 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_vmap_transformer_speed[True-True] | 8.6877ms | 8.4784ms | 117.9464 Ops/s | 116.7469 Ops/s | $\color{#35bf28}+1.03\\%$ | | test_vmap_transformer_speed[True-False] | 8.7365ms | 8.4429ms | 118.4426 Ops/s | 116.6183 Ops/s | $\color{#35bf28}+1.56\\%$ | | test_vmap_transformer_speed[False-True] | 8.5760ms | 8.3169ms | 120.2375 Ops/s | 117.5465 Ops/s | $\color{#35bf28}+2.29\\%$ | | test_vmap_transformer_speed[False-False] | 8.7319ms | 8.3458ms | 119.8213 Ops/s | 118.4971 Ops/s | $\color{#35bf28}+1.12\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 20.7067ms | 20.3157ms | 49.2230 Ops/s | 48.5215 Ops/s | $\color{#35bf28}+1.45\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 21.1312ms | 20.1782ms | 49.5585 Ops/s | 48.7671 Ops/s | $\color{#35bf28}+1.62\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 20.1909ms | 19.9431ms | 50.1427 Ops/s | 49.8228 Ops/s | $\color{#35bf28}+0.64\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 20.8096ms | 19.9837ms | 50.0407 Ops/s | 49.0351 Ops/s | $\color{#35bf28}+2.05\\%$ | | test_to_module_speed[True] | 2.7672ms | 1.5216ms | 657.2084 Ops/s | 658.7015 Ops/s | $\color{#d91a1a}-0.23\\%$ | | test_to_module_speed[False] | 2.0350ms | 1.5052ms | 664.3767 Ops/s | 663.5525 Ops/s | $\color{#35bf28}+0.12\\%$ | | test_tc_init | 61.4110μs | 35.9707μs | 27.8004 KOps/s | 25.9630 KOps/s | $\textbf{\color{#35bf28}+7.08\\%}$ | | test_tc_init_nested | 94.8810μs | 71.8255μs | 13.9226 KOps/s | 12.3680 KOps/s | $\textbf{\color{#35bf28}+12.57\\%}$ | | test_tc_first_layer_tensor | 19.1510μs | 3.9884μs | 250.7253 KOps/s | 247.6769 KOps/s | $\color{#35bf28}+1.23\\%$ | | test_tc_first_layer_nontensor | 27.7500μs | 4.0210μs | 248.6914 KOps/s | 246.2047 KOps/s | $\color{#35bf28}+1.01\\%$ | | test_tc_second_layer_tensor | 33.7380μs | 1.3011μs | 768.5742 KOps/s | 760.1912 KOps/s | $\color{#35bf28}+1.10\\%$ | | test_tc_second_layer_nontensor | 21.5910μs | 4.6742μs | 213.9399 KOps/s | 215.8284 KOps/s | $\color{#d91a1a}-0.87\\%$ | | test_unbind | 0.3253s | 12.9979ms | 76.9356 Ops/s | 71.7861 Ops/s | $\textbf{\color{#35bf28}+7.17\\%}$ | | test_full_like | 0.6535ms | 0.5784ms | 1.7288 KOps/s | 1.7311 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_zeros_like | 0.2596ms | 0.1979ms | 5.0543 KOps/s | 5.0614 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_ones_like | 0.2257ms | 0.1976ms | 5.0597 KOps/s | 5.0651 KOps/s | $\color{#d91a1a}-0.11\\%$ | | test_clone | 0.4467ms | 0.4144ms | 2.4132 KOps/s | 2.4204 KOps/s | $\color{#d91a1a}-0.30\\%$ | | test_squeeze | 27.5810μs | 11.7168μs | 85.3477 KOps/s | 83.4446 KOps/s | $\color{#35bf28}+2.28\\%$ | | test_unsqueeze | 0.2495ms | 86.0552μs | 11.6205 KOps/s | 11.7908 KOps/s | $\color{#d91a1a}-1.44\\%$ | | test_split | 0.4329ms | 0.1791ms | 5.5835 KOps/s | 5.5562 KOps/s | $\color{#35bf28}+0.49\\%$ | | test_permute | 0.2561ms | 0.1911ms | 5.2339 KOps/s | 5.1307 KOps/s | $\color{#35bf28}+2.01\\%$ | | test_stack | 1.2633ms | 0.9275ms | 1.0782 KOps/s | 1.0961 KOps/s | $\color{#d91a1a}-1.63\\%$ | | test_cat | 1.2568ms | 1.2314ms | 812.0556 Ops/s | 812.3090 Ops/s | $\color{#d91a1a}-0.03\\%$ |