pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
807 stars 66 forks source link

[BugFix] Fix compile + vmap #924

Closed vmoens closed 1 month ago

vmoens commented 1 month ago

When calling a vmap within a compiled code, things break because of the context manager that we use to exclude TDs from pytree. We can actually just indicate that TD is a leaf using the is_leaf argument where appropriate. This will only work with torch > 2.4.

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}19$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 42.4190μs | 21.6083μs | 46.2785 KOps/s | 45.3332 KOps/s | $\color{#35bf28}+2.09\\%$ | | test_plain_set_stack_nested | 58.5190μs | 21.9459μs | 45.5667 KOps/s | 45.4540 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_plain_set_nested_inplace | 61.7650μs | 23.9143μs | 41.8159 KOps/s | 41.3495 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_plain_set_stack_nested_inplace | 67.7260μs | 23.8162μs | 41.9883 KOps/s | 41.7990 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_items | 19.7470μs | 2.6668μs | 374.9786 KOps/s | 345.8320 KOps/s | $\textbf{\color{#35bf28}+8.43\\%}$ | | test_items_nested | 0.7182ms | 0.3427ms | 2.9176 KOps/s | 2.9737 KOps/s | $\color{#d91a1a}-1.89\\%$ | | test_items_nested_locked | 1.6010ms | 0.3445ms | 2.9029 KOps/s | 2.9743 KOps/s | $\color{#d91a1a}-2.40\\%$ | | test_items_nested_leaf | 0.1868ms | 84.2227μs | 11.8733 KOps/s | 11.3794 KOps/s | $\color{#35bf28}+4.34\\%$ | | test_items_stack_nested | 0.6050ms | 0.3377ms | 2.9610 KOps/s | 2.9365 KOps/s | $\color{#35bf28}+0.84\\%$ | | test_items_stack_nested_leaf | 0.1627ms | 84.7623μs | 11.7977 KOps/s | 11.2614 KOps/s | $\color{#35bf28}+4.76\\%$ | | test_items_stack_nested_locked | 0.5538ms | 0.3420ms | 2.9240 KOps/s | 2.9516 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_keys | 39.8740μs | 3.8678μs | 258.5426 KOps/s | 249.8973 KOps/s | $\color{#35bf28}+3.46\\%$ | | test_keys_nested | 0.2774ms | 0.1453ms | 6.8816 KOps/s | 6.9043 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_keys_nested_locked | 0.7685ms | 0.1499ms | 6.6730 KOps/s | 6.7037 KOps/s | $\color{#d91a1a}-0.46\\%$ | | test_keys_nested_leaf | 0.2364ms | 0.1251ms | 7.9948 KOps/s | 7.9191 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_keys_stack_nested | 0.2420ms | 0.1444ms | 6.9274 KOps/s | 6.9015 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_keys_stack_nested_leaf | 0.2316ms | 0.1234ms | 8.1056 KOps/s | 8.0480 KOps/s | $\color{#35bf28}+0.72\\%$ | | test_keys_stack_nested_locked | 0.2612ms | 0.1487ms | 6.7232 KOps/s | 6.6516 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_values | 10.4070μs | 1.1759μs | 850.4312 KOps/s | 859.5905 KOps/s | $\color{#d91a1a}-1.07\\%$ | | test_values_nested | 88.5050μs | 49.6154μs | 20.1550 KOps/s | 19.7277 KOps/s | $\color{#35bf28}+2.17\\%$ | | test_values_nested_locked | 90.8100μs | 49.6484μs | 20.1416 KOps/s | 19.8665 KOps/s | $\color{#35bf28}+1.38\\%$ | | test_values_nested_leaf | 90.2390μs | 44.3136μs | 22.5664 KOps/s | 22.1644 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_values_stack_nested | 91.3910μs | 49.0932μs | 20.3694 KOps/s | 18.8397 KOps/s | $\textbf{\color{#35bf28}+8.12\\%}$ | | test_values_stack_nested_leaf | 0.1054ms | 44.7803μs | 22.3312 KOps/s | 21.2390 KOps/s | $\textbf{\color{#35bf28}+5.14\\%}$ | | test_values_stack_nested_locked | 95.2070μs | 49.5544μs | 20.1798 KOps/s | 19.8073 KOps/s | $\color{#35bf28}+1.88\\%$ | | test_membership | 38.4120μs | 0.9145μs | 1.0935 MOps/s | 1.3562 MOps/s | $\textbf{\color{#d91a1a}-19.37\\%}$ | | test_membership_nested | 42.1680μs | 2.6928μs | 371.3559 KOps/s | 381.5961 KOps/s | $\color{#d91a1a}-2.68\\%$ | | test_membership_nested_leaf | 0.1108ms | 2.6958μs | 370.9461 KOps/s | 378.4908 KOps/s | $\color{#d91a1a}-1.99\\%$ | | test_membership_stacked_nested | 73.8070μs | 2.6488μs | 377.5352 KOps/s | 377.9176 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_membership_stacked_nested_leaf | 0.1165ms | 2.6758μs | 373.7181 KOps/s | 375.2985 KOps/s | $\color{#d91a1a}-0.42\\%$ | | test_membership_nested_last | 43.2510μs | 3.9329μs | 254.2676 KOps/s | 251.0978 KOps/s | $\color{#35bf28}+1.26\\%$ | | test_membership_nested_leaf_last | 33.4020μs | 3.9677μs | 252.0376 KOps/s | 250.4677 KOps/s | $\color{#35bf28}+0.63\\%$ | | test_membership_stacked_nested_last | 42.9290μs | 3.9159μs | 255.3689 KOps/s | 247.7363 KOps/s | $\color{#35bf28}+3.08\\%$ | | test_membership_stacked_nested_leaf_last | 48.3300μs | 3.9244μs | 254.8130 KOps/s | 252.1399 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_nested_getleaf | 0.1292ms | 10.4953μs | 95.2808 KOps/s | 94.5966 KOps/s | $\color{#35bf28}+0.72\\%$ | | test_nested_get | 54.5620μs | 9.8125μs | 101.9107 KOps/s | 100.5453 KOps/s | $\color{#35bf28}+1.36\\%$ | | test_stacked_getleaf | 39.6540μs | 10.3646μs | 96.4823 KOps/s | 95.9595 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_stacked_get | 37.8600μs | 9.8091μs | 101.9463 KOps/s | 100.5097 KOps/s | $\color{#35bf28}+1.43\\%$ | | test_nested_getitemleaf | 38.8930μs | 10.8628μs | 92.0575 KOps/s | 89.5355 KOps/s | $\color{#35bf28}+2.82\\%$ | | test_nested_getitem | 40.8060μs | 10.1315μs | 98.7019 KOps/s | 97.0032 KOps/s | $\color{#35bf28}+1.75\\%$ | | test_stacked_getitemleaf | 37.4300μs | 11.0348μs | 90.6222 KOps/s | 88.4392 KOps/s | $\color{#35bf28}+2.47\\%$ | | test_stacked_getitem | 0.1143ms | 10.0304μs | 99.6969 KOps/s | 96.9928 KOps/s | $\color{#35bf28}+2.79\\%$ | | test_lock_nested | 7.4563ms | 0.5123ms | 1.9519 KOps/s | 1.9702 KOps/s | $\color{#d91a1a}-0.93\\%$ | | test_lock_stack_nested | 0.7278ms | 0.4784ms | 2.0905 KOps/s | 2.1100 KOps/s | $\color{#d91a1a}-0.92\\%$ | | test_unlock_nested | 1.1470ms | 0.4326ms | 2.3116 KOps/s | 2.3622 KOps/s | $\color{#d91a1a}-2.14\\%$ | | test_unlock_stack_nested | 0.8578ms | 0.3932ms | 2.5435 KOps/s | 2.5602 KOps/s | $\color{#d91a1a}-0.66\\%$ | | test_flatten_speed | 0.6085ms | 0.1029ms | 9.7158 KOps/s | 9.3945 KOps/s | $\color{#35bf28}+3.42\\%$ | | test_unflatten_speed | 1.1967ms | 0.4339ms | 2.3046 KOps/s | 2.2858 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_common_ops | 5.6008ms | 1.1256ms | 888.3853 Ops/s | 909.1984 Ops/s | $\color{#d91a1a}-2.29\\%$ | | test_creation | 15.5090μs | 2.0587μs | 485.7533 KOps/s | 485.8839 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_creation_empty | 55.7140μs | 18.3597μs | 54.4670 KOps/s | 54.2099 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_creation_nested_1 | 1.3305ms | 21.9411μs | 45.5767 KOps/s | 45.4982 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_creation_nested_2 | 65.9430μs | 25.5414μs | 39.1521 KOps/s | 39.5371 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_clone | 0.1434ms | 17.0799μs | 58.5485 KOps/s | 59.4588 KOps/s | $\color{#d91a1a}-1.53\\%$ | | test_getitem[int] | 0.7630ms | 16.5958μs | 60.2563 KOps/s | 60.0063 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_getitem[slice_int] | 0.1535ms | 31.8219μs | 31.4249 KOps/s | 31.9483 KOps/s | $\color{#d91a1a}-1.64\\%$ | | test_getitem[range] | 0.3218ms | 56.9406μs | 17.5622 KOps/s | 17.7589 KOps/s | $\color{#d91a1a}-1.11\\%$ | | test_getitem[tuple] | 0.1160ms | 25.3365μs | 39.4687 KOps/s | 40.4567 KOps/s | $\color{#d91a1a}-2.44\\%$ | | test_getitem[list] | 0.3422ms | 52.6694μs | 18.9863 KOps/s | 19.9395 KOps/s | $\color{#d91a1a}-4.78\\%$ | | test_setitem_dim[int] | 67.3050μs | 42.2186μs | 23.6862 KOps/s | 24.3120 KOps/s | $\color{#d91a1a}-2.57\\%$ | | test_setitem_dim[slice_int] | 0.1407ms | 73.4666μs | 13.6116 KOps/s | 14.1215 KOps/s | $\color{#d91a1a}-3.61\\%$ | | test_setitem_dim[range] | 0.4497ms | 95.3903μs | 10.4832 KOps/s | 10.8765 KOps/s | $\color{#d91a1a}-3.62\\%$ | | test_setitem_dim[tuple] | 0.1315ms | 60.2775μs | 16.5899 KOps/s | 17.3321 KOps/s | $\color{#d91a1a}-4.28\\%$ | | test_setitem | 0.1379ms | 29.4368μs | 33.9711 KOps/s | 33.3967 KOps/s | $\color{#35bf28}+1.72\\%$ | | test_set | 0.2151ms | 29.4662μs | 33.9372 KOps/s | 34.2452 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_set_shared | 4.3457ms | 0.2239ms | 4.4654 KOps/s | 4.6604 KOps/s | $\color{#d91a1a}-4.18\\%$ | | test_update | 0.1761ms | 36.5845μs | 27.3340 KOps/s | 26.9280 KOps/s | $\color{#35bf28}+1.51\\%$ | | test_update_nested | 0.1980ms | 46.4733μs | 21.5177 KOps/s | 21.4476 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_update__nested | 0.1425ms | 33.7349μs | 29.6429 KOps/s | 29.7284 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_set_nested | 0.1596ms | 31.4815μs | 31.7647 KOps/s | 31.3715 KOps/s | $\color{#35bf28}+1.25\\%$ | | test_set_nested_new | 0.1888ms | 35.4053μs | 28.2444 KOps/s | 27.3097 KOps/s | $\color{#35bf28}+3.42\\%$ | | test_select | 0.1803ms | 52.2072μs | 19.1544 KOps/s | 18.5066 KOps/s | $\color{#35bf28}+3.50\\%$ | | test_select_nested | 1.0637ms | 58.0818μs | 17.2171 KOps/s | 17.1973 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_exclude_nested | 0.1494ms | 78.1970μs | 12.7882 KOps/s | 12.8867 KOps/s | $\color{#d91a1a}-0.76\\%$ | | test_empty[True] | 0.4991ms | 0.3209ms | 3.1163 KOps/s | 3.1069 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_empty[False] | 29.4747μs | 1.1639μs | 859.1873 KOps/s | 860.5703 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_unbind_speed | 0.5132ms | 0.3153ms | 3.1715 KOps/s | 3.2105 KOps/s | $\color{#d91a1a}-1.21\\%$ | | test_unbind_speed_stack0 | 0.4684ms | 0.3141ms | 3.1833 KOps/s | 3.2261 KOps/s | $\color{#d91a1a}-1.33\\%$ | | test_unbind_speed_stack1 | 0.1023s | 0.8408ms | 1.1894 KOps/s | 1.3432 KOps/s | $\textbf{\color{#d91a1a}-11.45\\%}$ | | test_split | 99.9267ms | 2.2023ms | 454.0642 Ops/s | 457.3334 Ops/s | $\color{#d91a1a}-0.71\\%$ | | test_chunk | 97.4919ms | 2.1935ms | 455.8841 Ops/s | 449.6043 Ops/s | $\color{#35bf28}+1.40\\%$ | | test_creation[device0] | 0.2980ms | 0.1217ms | 8.2163 KOps/s | 8.3402 KOps/s | $\color{#d91a1a}-1.49\\%$ | | test_creation_from_tensor | 4.6481ms | 0.1241ms | 8.0568 KOps/s | 8.1726 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_add_one[memmap_tensor0] | 0.4523ms | 8.0842μs | 123.6975 KOps/s | 131.9465 KOps/s | $\textbf{\color{#d91a1a}-6.25\\%}$ | | test_contiguous[memmap_tensor0] | 38.1210μs | 1.9880μs | 503.0280 KOps/s | 501.2100 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_stack[memmap_tensor0] | 52.5580μs | 5.6557μs | 176.8125 KOps/s | 169.8341 KOps/s | $\color{#35bf28}+4.11\\%$ | | test_memmaptd_index | 1.3141ms | 0.4128ms | 2.4222 KOps/s | 2.3834 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_memmaptd_index_astensor | 0.7470ms | 0.4929ms | 2.0289 KOps/s | 2.0475 KOps/s | $\color{#d91a1a}-0.91\\%$ | | test_memmaptd_index_op | 1.5595ms | 1.0918ms | 915.9328 Ops/s | 948.1380 Ops/s | $\color{#d91a1a}-3.40\\%$ | | test_serialize_model | 0.1293s | 0.1196s | 8.3604 Ops/s | 7.2517 Ops/s | $\textbf{\color{#35bf28}+15.29\\%}$ | | test_serialize_model_pickle | 0.4401s | 0.3950s | 2.5318 Ops/s | 2.4552 Ops/s | $\color{#35bf28}+3.12\\%$ | | test_serialize_weights | 0.2289s | 0.1409s | 7.0997 Ops/s | 8.3752 Ops/s | $\textbf{\color{#d91a1a}-15.23\\%}$ | | test_serialize_weights_returnearly | 0.1734s | 0.1639s | 6.1019 Ops/s | 6.1731 Ops/s | $\color{#d91a1a}-1.15\\%$ | | test_serialize_weights_pickle | 0.4547s | 0.3991s | 2.5058 Ops/s | 2.4644 Ops/s | $\color{#35bf28}+1.68\\%$ | | test_serialize_weights_filesystem | 0.1516s | 0.1452s | 6.8882 Ops/s | 7.0748 Ops/s | $\color{#d91a1a}-2.64\\%$ | | test_serialize_model_filesystem | 0.1749s | 0.1608s | 6.2208 Ops/s | 6.5388 Ops/s | $\color{#d91a1a}-4.86\\%$ | | test_reshape_pytree | 0.1304ms | 39.9055μs | 25.0592 KOps/s | 25.2525 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_reshape_td | 95.2770μs | 47.0720μs | 21.2441 KOps/s | 20.9203 KOps/s | $\color{#35bf28}+1.55\\%$ | | test_view_pytree | 95.6680μs | 39.5778μs | 25.2667 KOps/s | 25.4573 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_view_td | 0.1159ms | 53.2477μs | 18.7802 KOps/s | 18.0599 KOps/s | $\color{#35bf28}+3.99\\%$ | | test_unbind_pytree | 97.7730μs | 37.5926μs | 26.6010 KOps/s | 27.1680 KOps/s | $\color{#d91a1a}-2.09\\%$ | | test_unbind_td | 0.3436ms | 46.6772μs | 21.4237 KOps/s | 21.6395 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_split_pytree | 0.1124ms | 40.4497μs | 24.7220 KOps/s | 24.8526 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_split_td | 0.5970ms | 57.9541μs | 17.2550 KOps/s | 16.8660 KOps/s | $\color{#35bf28}+2.31\\%$ | | test_add_pytree | 0.1261ms | 48.1185μs | 20.7820 KOps/s | 21.2727 KOps/s | $\color{#d91a1a}-2.31\\%$ | | test_add_td | 0.2206ms | 87.4080μs | 11.4406 KOps/s | 11.8679 KOps/s | $\color{#d91a1a}-3.60\\%$ | | test_compile_add_one_nested[tensordict-compile] | 0.1272ms | 55.4540μs | 18.0330 KOps/s | 18.2580 KOps/s | $\color{#d91a1a}-1.23\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.3419ms | 0.1893ms | 5.2824 KOps/s | 5.1811 KOps/s | $\color{#35bf28}+1.95\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.1321ms | 55.1517μs | 18.1318 KOps/s | 18.3907 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.2898ms | 0.1487ms | 6.7261 KOps/s | 6.8778 KOps/s | $\color{#d91a1a}-2.21\\%$ | | test_compile_copy_nested[tensordict-compile] | 61.5050μs | 20.5342μs | 48.6992 KOps/s | 48.5939 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_compile_copy_nested[tensordict-eager] | 0.1595ms | 65.4119μs | 15.2877 KOps/s | 15.5131 KOps/s | $\color{#d91a1a}-1.45\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1552ms | 79.0602μs | 12.6486 KOps/s | 12.9164 KOps/s | $\color{#d91a1a}-2.07\\%$ | | test_compile_copy_nested[pytree-eager] | 0.1400ms | 72.5891μs | 13.7762 KOps/s | 14.0872 KOps/s | $\color{#d91a1a}-2.21\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.2854ms | 0.1756ms | 5.6939 KOps/s | 5.7255 KOps/s | $\color{#d91a1a}-0.55\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.3522ms | 0.1941ms | 5.1516 KOps/s | 5.1590 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 90.2680μs | 39.8148μs | 25.1163 KOps/s | 26.0815 KOps/s | $\color{#d91a1a}-3.70\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 1.0282ms | 69.5315μs | 14.3820 KOps/s | 13.9333 KOps/s | $\color{#35bf28}+3.22\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.3500ms | 0.1774ms | 5.6374 KOps/s | 5.6386 KOps/s | $\color{#d91a1a}-0.02\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.5977ms | 0.2947ms | 3.3937 KOps/s | 3.4213 KOps/s | $\color{#d91a1a}-0.81\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.4493ms | 0.2074ms | 4.8224 KOps/s | 4.8109 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.3903ms | 0.1770ms | 5.6489 KOps/s | 5.7171 KOps/s | $\color{#d91a1a}-1.19\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.1486ms | 61.5348μs | 16.2510 KOps/s | 15.7992 KOps/s | $\color{#35bf28}+2.86\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.1335ms | 41.0367μs | 24.3684 KOps/s | 25.1830 KOps/s | $\color{#d91a1a}-3.23\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.3852ms | 0.2420ms | 4.1329 KOps/s | 4.1805 KOps/s | $\color{#d91a1a}-1.14\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.4344ms | 0.1772ms | 5.6445 KOps/s | 5.7356 KOps/s | $\color{#d91a1a}-1.59\\%$ | | test_compile_copy_flat[tensordict-compile] | 0.1966ms | 0.1078ms | 9.2791 KOps/s | 9.2064 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_compile_copy_flat[tensordict-eager] | 0.1291ms | 56.7106μs | 17.6334 KOps/s | 17.7851 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1557ms | 80.4112μs | 12.4361 KOps/s | 12.4876 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_compile_copy_flat[pytree-eager] | 0.1356ms | 72.4971μs | 13.7936 KOps/s | 13.9102 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_compile_assign_and_add[tensordict-compile] | 0.4188ms | 0.1918ms | 5.2132 KOps/s | 5.2077 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.6456ms | 1.6675ms | 599.6934 Ops/s | 618.7271 Ops/s | $\color{#d91a1a}-3.08\\%$ | | test_compile_assign_and_add[pytree-compile] | 0.2683ms | 0.1903ms | 5.2554 KOps/s | 5.2152 KOps/s | $\color{#35bf28}+0.77\\%$ | | test_compile_assign_and_add[pytree-eager] | 1.2824ms | 1.0921ms | 915.6407 Ops/s | 912.6230 Ops/s | $\color{#35bf28}+0.33\\%$ | | test_compile_assign_and_add_stack[compile] | 0.5325ms | 0.4095ms | 2.4421 KOps/s | 2.3903 KOps/s | $\color{#35bf28}+2.16\\%$ | | test_compile_assign_and_add_stack[eager] | 6.0917ms | 4.0477ms | 247.0517 Ops/s | 258.3512 Ops/s | $\color{#d91a1a}-4.37\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 94.0070μs | 32.8211μs | 30.4682 KOps/s | 31.1907 KOps/s | $\color{#d91a1a}-2.32\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 1.2216ms | 48.6950μs | 20.5360 KOps/s | 21.1874 KOps/s | $\color{#d91a1a}-3.07\\%$ | | test_compile_indexing[tensor-tensorclass-compile] | 73.8480μs | 29.3157μs | 34.1114 KOps/s | 34.2512 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 98.6140μs | 31.2624μs | 31.9873 KOps/s | 33.8912 KOps/s | $\textbf{\color{#d91a1a}-5.62\\%}$ | | test_compile_indexing[tensor-pytree-compile] | 0.1309ms | 29.2105μs | 34.2343 KOps/s | 34.3832 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_compile_indexing[tensor-pytree-eager] | 0.1491ms | 30.6876μs | 32.5865 KOps/s | 33.7135 KOps/s | $\color{#d91a1a}-3.34\\%$ | | test_compile_indexing[slice-tensordict-compile] | 0.1352ms | 72.8088μs | 13.7346 KOps/s | 13.5865 KOps/s | $\color{#35bf28}+1.09\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.6544ms | 28.1650μs | 35.5051 KOps/s | 35.4058 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.1352ms | 67.8244μs | 14.7440 KOps/s | 14.5013 KOps/s | $\color{#35bf28}+1.67\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 0.1299ms | 24.5722μs | 40.6963 KOps/s | 40.7096 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.1239ms | 68.0649μs | 14.6919 KOps/s | 14.7692 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_compile_indexing[slice-pytree-eager] | 0.1016ms | 24.0602μs | 41.5624 KOps/s | 41.2039 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_compile_indexing[int-tensordict-compile] | 0.1929ms | 72.4001μs | 13.8121 KOps/s | 13.8436 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.8821ms | 28.2269μs | 35.4272 KOps/s | 35.9858 KOps/s | $\color{#d91a1a}-1.55\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.1949ms | 67.4968μs | 14.8155 KOps/s | 14.8465 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_compile_indexing[int-tensorclass-eager] | 0.3937ms | 24.8641μs | 40.2186 KOps/s | 40.6492 KOps/s | $\color{#d91a1a}-1.06\\%$ | | test_compile_indexing[int-pytree-compile] | 0.1342ms | 67.4779μs | 14.8197 KOps/s | 14.7297 KOps/s | $\color{#35bf28}+0.61\\%$ | | test_compile_indexing[int-pytree-eager] | 76.8630μs | 25.3020μs | 39.5225 KOps/s | 41.5613 KOps/s | $\color{#d91a1a}-4.91\\%$ | | test_mod_add[eager] | 80.2290μs | 25.4654μs | 39.2690 KOps/s | 39.8278 KOps/s | $\color{#d91a1a}-1.40\\%$ | | test_mod_add[compile] | 82.4130μs | 36.7756μs | 27.1919 KOps/s | 25.9174 KOps/s | $\color{#35bf28}+4.92\\%$ | | test_mod_add[compile-overhead] | 94.8770μs | 36.5106μs | 27.3893 KOps/s | 26.0491 KOps/s | $\textbf{\color{#35bf28}+5.14\\%}$ | | test_mod_wrap[eager] | 0.4107ms | 0.2147ms | 4.6577 KOps/s | 4.5028 KOps/s | $\color{#35bf28}+3.44\\%$ | | test_mod_wrap[compile] | 1.6059ms | 0.2296ms | 4.3552 KOps/s | 4.2211 KOps/s | $\color{#35bf28}+3.18\\%$ | | test_mod_wrap[compile-overhead] | 0.4147ms | 0.2273ms | 4.3987 KOps/s | 4.3025 KOps/s | $\color{#35bf28}+2.24\\%$ | | test_mod_wrap_and_backward[eager] | 12.4056ms | 11.3675ms | 87.9699 Ops/s | 82.4582 Ops/s | $\textbf{\color{#35bf28}+6.68\\%}$ | | test_mod_wrap_and_backward[compile] | 13.0629ms | 11.5908ms | 86.2753 Ops/s | 78.9348 Ops/s | $\textbf{\color{#35bf28}+9.30\\%}$ | | test_mod_wrap_and_backward[compile-overhead] | 15.4123ms | 11.7939ms | 84.7894 Ops/s | 80.7059 Ops/s | $\textbf{\color{#35bf28}+5.06\\%}$ | | test_seq_add[eager] | 0.1710ms | 89.1784μs | 11.2135 KOps/s | 11.5514 KOps/s | $\color{#d91a1a}-2.93\\%$ | | test_seq_add[compile] | 0.1616ms | 61.6312μs | 16.2255 KOps/s | 15.9260 KOps/s | $\color{#35bf28}+1.88\\%$ | | test_seq_add[compile-overhead] | 0.1539ms | 60.0012μs | 16.6663 KOps/s | 16.0673 KOps/s | $\color{#35bf28}+3.73\\%$ | | test_seq_wrap[eager] | 0.5397ms | 0.3768ms | 2.6539 KOps/s | 2.5802 KOps/s | $\color{#35bf28}+2.86\\%$ | | test_seq_wrap[compile] | 0.7586ms | 0.2638ms | 3.7909 KOps/s | 3.7321 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_seq_wrap[compile-overhead] | 0.4348ms | 0.2634ms | 3.7972 KOps/s | 3.7399 KOps/s | $\color{#35bf28}+1.53\\%$ | | test_func_call_runtime[False-eager] | 0.9491ms | 0.5328ms | 1.8770 KOps/s | 1.8347 KOps/s | $\color{#35bf28}+2.31\\%$ | | test_func_call_runtime[False-compile] | 0.6105ms | 0.4963ms | 2.0151 KOps/s | 1.9636 KOps/s | $\color{#35bf28}+2.62\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.6725ms | 0.4988ms | 2.0047 KOps/s | 1.9956 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_func_call_runtime[True-eager] | 1.3820ms | 0.7529ms | 1.3281 KOps/s | 1.2672 KOps/s | $\color{#35bf28}+4.81\\%$ | | test_func_call_runtime[True-compile] | 0.6721ms | 0.5115ms | 1.9550 KOps/s | 1.9190 KOps/s | $\color{#35bf28}+1.88\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.6781ms | 0.5155ms | 1.9397 KOps/s | 1.8895 KOps/s | $\color{#35bf28}+2.66\\%$ | | test_func_call_cm_runtime[False-eager] | 1.7051ms | 0.5330ms | 1.8761 KOps/s | 1.8102 KOps/s | $\color{#35bf28}+3.64\\%$ | | test_func_call_cm_runtime[False-compile] | 0.6776ms | 0.5002ms | 1.9992 KOps/s | 1.9966 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_func_call_cm_runtime[False-compile-overhead] | 1.0868ms | 0.5009ms | 1.9965 KOps/s | 1.9746 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_func_call_cm_runtime[True-eager] | 2.6759ms | 0.9171ms | 1.0904 KOps/s | 1.0713 KOps/s | $\color{#35bf28}+1.78\\%$ | | test_func_call_cm_runtime[True-compile] | 1.0751ms | 0.8449ms | 1.1835 KOps/s | 1.1259 KOps/s | $\textbf{\color{#35bf28}+5.12\\%}$ | | test_func_call_cm_runtime[True-compile-overhead] | 1.1659ms | 0.8441ms | 1.1847 KOps/s | 1.1215 KOps/s | $\textbf{\color{#35bf28}+5.63\\%}$ | | test_distributed | 0.5255ms | 0.1322ms | 7.5621 KOps/s | 7.3488 KOps/s | $\color{#35bf28}+2.90\\%$ | | test_tdmodule | 50.2740μs | 18.0621μs | 55.3646 KOps/s | 56.3330 KOps/s | $\color{#d91a1a}-1.72\\%$ | | test_tdmodule_dispatch | 69.9010μs | 37.0406μs | 26.9974 KOps/s | 27.3855 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_tdseq | 48.5710μs | 19.4261μs | 51.4770 KOps/s | 48.4417 KOps/s | $\textbf{\color{#35bf28}+6.27\\%}$ | | test_tdseq_dispatch | 65.1820μs | 41.3030μs | 24.2113 KOps/s | 23.0970 KOps/s | $\color{#35bf28}+4.82\\%$ | | test_instantiation_functorch | 3.5717ms | 1.6849ms | 593.5203 Ops/s | 561.6479 Ops/s | $\textbf{\color{#35bf28}+5.67\\%}$ | | test_instantiation_td | 2.3552ms | 1.1693ms | 855.2302 Ops/s | 805.7862 Ops/s | $\textbf{\color{#35bf28}+6.14\\%}$ | | test_exec_functorch | 0.3517ms | 0.1778ms | 5.6236 KOps/s | 5.4329 KOps/s | $\color{#35bf28}+3.51\\%$ | | test_exec_functional_call | 0.3487ms | 0.1716ms | 5.8280 KOps/s | 5.7768 KOps/s | $\color{#35bf28}+0.89\\%$ | | test_exec_td | 0.2542ms | 0.1709ms | 5.8514 KOps/s | 5.6377 KOps/s | $\color{#35bf28}+3.79\\%$ | | test_exec_td_decorator | 0.7989ms | 0.2249ms | 4.4457 KOps/s | 4.2946 KOps/s | $\color{#35bf28}+3.52\\%$ | | test_vmap_mlp_speed[True-True] | 0.8761ms | 0.5811ms | 1.7208 KOps/s | 1.6145 KOps/s | $\textbf{\color{#35bf28}+6.58\\%}$ | | test_vmap_mlp_speed[True-False] | 0.8449ms | 0.5789ms | 1.7274 KOps/s | 1.6325 KOps/s | $\textbf{\color{#35bf28}+5.82\\%}$ | | test_vmap_mlp_speed[False-True] | 0.6500ms | 0.4773ms | 2.0952 KOps/s | 1.9872 KOps/s | $\textbf{\color{#35bf28}+5.44\\%}$ | | test_vmap_mlp_speed[False-False] | 1.2051ms | 0.4801ms | 2.0831 KOps/s | 1.9412 KOps/s | $\textbf{\color{#35bf28}+7.31\\%}$ | | test_vmap_mlp_speed_decorator[True-True] | 0.8320ms | 0.6358ms | 1.5727 KOps/s | 1.4828 KOps/s | $\textbf{\color{#35bf28}+6.06\\%}$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9287ms | 0.6337ms | 1.5782 KOps/s | 1.4881 KOps/s | $\textbf{\color{#35bf28}+6.05\\%}$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8032ms | 0.5257ms | 1.9022 KOps/s | 1.8141 KOps/s | $\color{#35bf28}+4.85\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7765ms | 0.5269ms | 1.8979 KOps/s | 1.8239 KOps/s | $\color{#35bf28}+4.05\\%$ | | test_to_module_speed[True] | 1.5782ms | 1.3144ms | 760.8023 Ops/s | 744.6603 Ops/s | $\color{#35bf28}+2.17\\%$ | | test_to_module_speed[False] | 1.5221ms | 1.2915ms | 774.2944 Ops/s | 770.2241 Ops/s | $\color{#35bf28}+0.53\\%$ | | test_tc_init | 82.4230μs | 45.3699μs | 22.0410 KOps/s | 21.9906 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_tc_init_nested | 0.1582ms | 92.5656μs | 10.8031 KOps/s | 10.8446 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_tc_first_layer_tensor | 20.9890μs | 1.4424μs | 693.2984 KOps/s | 685.6095 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_tc_first_layer_nontensor | 44.5530μs | 4.2898μs | 233.1126 KOps/s | 233.0576 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_tc_second_layer_tensor | 35.6160μs | 2.6970μs | 370.7802 KOps/s | 361.5111 KOps/s | $\color{#35bf28}+2.56\\%$ | | test_tc_second_layer_nontensor | 38.9330μs | 5.6612μs | 176.6409 KOps/s | 179.6105 KOps/s | $\color{#d91a1a}-1.65\\%$ | | test_unbind | 0.4723s | 13.7951ms | 72.4895 Ops/s | 70.3025 Ops/s | $\color{#35bf28}+3.11\\%$ | | test_full_like | 9.8486ms | 8.3326ms | 120.0111 Ops/s | 121.2628 Ops/s | $\color{#d91a1a}-1.03\\%$ | | test_zeros_like | 15.7281ms | 7.0131ms | 142.5897 Ops/s | 139.1631 Ops/s | $\color{#35bf28}+2.46\\%$ | | test_ones_like | 15.2020ms | 8.0697ms | 123.9205 Ops/s | 125.9565 Ops/s | $\color{#d91a1a}-1.62\\%$ | | test_clone | 14.6101ms | 9.4474ms | 105.8489 Ops/s | 104.9596 Ops/s | $\color{#35bf28}+0.85\\%$ | | test_squeeze | 63.4080μs | 13.3333μs | 75.0002 KOps/s | 74.3832 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_unsqueeze | 0.1754ms | 95.3119μs | 10.4919 KOps/s | 10.6574 KOps/s | $\color{#d91a1a}-1.55\\%$ | | test_split | 0.4926ms | 0.2037ms | 4.9094 KOps/s | 4.9085 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_permute | 0.4514ms | 0.2291ms | 4.3651 KOps/s | 4.4268 KOps/s | $\color{#d91a1a}-1.39\\%$ | | test_stack | 35.3864ms | 26.4393ms | 37.8225 Ops/s | 37.8016 Ops/s | $\color{#35bf28}+0.06\\%$ | | test_cat | 33.3723ms | 25.9367ms | 38.5554 Ops/s | 38.1753 Ops/s | $\color{#35bf28}+1.00\\%$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 225. Improved: $\large\color{#35bf28}25$. Worsened: $\large\color{#d91a1a}14$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 0.1441ms | 16.1073μs | 62.0835 KOps/s | 58.2383 KOps/s | $\textbf{\color{#35bf28}+6.60\\%}$ | | test_plain_set_stack_nested | 31.4010μs | 16.2709μs | 61.4594 KOps/s | 58.1923 KOps/s | $\textbf{\color{#35bf28}+5.61\\%}$ | | test_plain_set_nested_inplace | 36.2910μs | 17.3855μs | 57.5192 KOps/s | 54.5798 KOps/s | $\textbf{\color{#35bf28}+5.39\\%}$ | | test_plain_set_stack_nested_inplace | 43.7110μs | 17.2598μs | 57.9380 KOps/s | 54.7921 KOps/s | $\textbf{\color{#35bf28}+5.74\\%}$ | | test_items | 25.6400μs | 4.7732μs | 209.5045 KOps/s | 213.2016 KOps/s | $\color{#d91a1a}-1.73\\%$ | | test_items_nested | 0.4297ms | 0.3652ms | 2.7380 KOps/s | 2.7246 KOps/s | $\color{#35bf28}+0.49\\%$ | | test_items_nested_locked | 0.4607ms | 0.3693ms | 2.7080 KOps/s | 2.7087 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_items_nested_leaf | 0.1134ms | 85.1264μs | 11.7472 KOps/s | 11.6823 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_items_stack_nested | 0.4333ms | 0.3733ms | 2.6786 KOps/s | 2.7219 KOps/s | $\color{#d91a1a}-1.59\\%$ | | test_items_stack_nested_leaf | 0.1136ms | 85.0451μs | 11.7585 KOps/s | 11.5564 KOps/s | $\color{#35bf28}+1.75\\%$ | | test_items_stack_nested_locked | 0.4309ms | 0.3737ms | 2.6760 KOps/s | 2.6955 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_keys | 38.0710μs | 4.4303μs | 225.7176 KOps/s | 226.6685 KOps/s | $\color{#d91a1a}-0.42\\%$ | | test_keys_nested | 95.8230μs | 66.5403μs | 15.0285 KOps/s | 15.1369 KOps/s | $\color{#d91a1a}-0.72\\%$ | | test_keys_nested_locked | 0.7101ms | 73.6208μs | 13.5831 KOps/s | 13.7712 KOps/s | $\color{#d91a1a}-1.37\\%$ | | test_keys_nested_leaf | 83.4720μs | 58.7254μs | 17.0284 KOps/s | 17.1767 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_keys_stack_nested | 0.1330ms | 66.8227μs | 14.9650 KOps/s | 14.9132 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_keys_stack_nested_leaf | 85.5610μs | 58.1666μs | 17.1920 KOps/s | 17.1139 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_keys_stack_nested_locked | 0.1093ms | 72.8409μs | 13.7285 KOps/s | 13.8241 KOps/s | $\color{#d91a1a}-0.69\\%$ | | test_values | 8.6537μs | 1.7693μs | 565.2054 KOps/s | 559.2135 KOps/s | $\color{#35bf28}+1.07\\%$ | | test_values_nested | 57.1320μs | 33.9819μs | 29.4274 KOps/s | 29.3974 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_values_nested_locked | 0.1215ms | 36.0935μs | 27.7058 KOps/s | 27.7851 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_values_nested_leaf | 65.6300μs | 30.2402μs | 33.0685 KOps/s | 33.1686 KOps/s | $\color{#d91a1a}-0.30\\%$ | | test_values_stack_nested | 63.4310μs | 34.5712μs | 28.9258 KOps/s | 28.4647 KOps/s | $\color{#35bf28}+1.62\\%$ | | test_values_stack_nested_leaf | 54.5200μs | 30.9212μs | 32.3403 KOps/s | 32.2741 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_values_stack_nested_locked | 69.7920μs | 36.9699μs | 27.0490 KOps/s | 27.1348 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_membership | 1.7081μs | 0.5514μs | 1.8135 MOps/s | 1.8026 MOps/s | $\color{#35bf28}+0.61\\%$ | | test_membership_nested | 12.2850μs | 1.9358μs | 516.5764 KOps/s | 489.8512 KOps/s | $\textbf{\color{#35bf28}+5.46\\%}$ | | test_membership_nested_leaf | 14.2555μs | 1.9625μs | 509.5540 KOps/s | 508.5565 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_membership_stacked_nested | 20.5210μs | 1.9813μs | 504.7088 KOps/s | 490.6563 KOps/s | $\color{#35bf28}+2.86\\%$ | | test_membership_stacked_nested_leaf | 15.5500μs | 1.9826μs | 504.3874 KOps/s | 495.1585 KOps/s | $\color{#35bf28}+1.86\\%$ | | test_membership_nested_last | 21.3900μs | 2.9399μs | 340.1497 KOps/s | 344.6392 KOps/s | $\color{#d91a1a}-1.30\\%$ | | test_membership_nested_leaf_last | 30.8910μs | 2.9786μs | 335.7310 KOps/s | 341.7676 KOps/s | $\color{#d91a1a}-1.77\\%$ | | test_membership_stacked_nested_last | 31.4400μs | 9.1632μs | 109.1318 KOps/s | 232.3674 KOps/s | $\textbf{\color{#d91a1a}-53.03\\%}$ | | test_membership_stacked_nested_leaf_last | 24.8910μs | 9.1725μs | 109.0214 KOps/s | 233.2946 KOps/s | $\textbf{\color{#d91a1a}-53.27\\%}$ | | test_nested_getleaf | 35.5890μs | 8.0755μs | 123.8314 KOps/s | 125.5246 KOps/s | $\color{#d91a1a}-1.35\\%$ | | test_nested_get | 28.6110μs | 7.5767μs | 131.9839 KOps/s | 133.6493 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_stacked_getleaf | 34.6400μs | 8.1177μs | 123.1871 KOps/s | 124.5006 KOps/s | $\color{#d91a1a}-1.06\\%$ | | test_stacked_get | 24.5710μs | 7.5798μs | 131.9296 KOps/s | 133.7098 KOps/s | $\color{#d91a1a}-1.33\\%$ | | test_nested_getitemleaf | 34.0710μs | 8.2135μs | 121.7501 KOps/s | 122.5332 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_nested_getitem | 22.2700μs | 7.7738μs | 128.6371 KOps/s | 130.3897 KOps/s | $\color{#d91a1a}-1.34\\%$ | | test_stacked_getitemleaf | 36.2510μs | 8.3397μs | 119.9080 KOps/s | 122.3593 KOps/s | $\color{#d91a1a}-2.00\\%$ | | test_stacked_getitem | 21.0400μs | 7.7756μs | 128.6068 KOps/s | 130.9940 KOps/s | $\color{#d91a1a}-1.82\\%$ | | test_lock_nested | 9.4973ms | 0.4905ms | 2.0387 KOps/s | 2.0934 KOps/s | $\color{#d91a1a}-2.61\\%$ | | test_lock_stack_nested | 0.4889ms | 0.4301ms | 2.3251 KOps/s | 2.2995 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_unlock_nested | 0.8908ms | 0.3976ms | 2.5153 KOps/s | 2.5145 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_unlock_stack_nested | 0.4017ms | 0.3454ms | 2.8953 KOps/s | 2.8315 KOps/s | $\color{#35bf28}+2.25\\%$ | | test_flatten_speed | 0.5253ms | 0.1046ms | 9.5587 KOps/s | 9.4279 KOps/s | $\color{#35bf28}+1.39\\%$ | | test_unflatten_speed | 0.3436ms | 0.2872ms | 3.4822 KOps/s | 3.4502 KOps/s | $\color{#35bf28}+0.93\\%$ | | test_common_ops | 1.6605ms | 1.3967ms | 715.9569 Ops/s | 717.6373 Ops/s | $\color{#d91a1a}-0.23\\%$ | | test_creation | 20.8910μs | 1.6715μs | 598.2753 KOps/s | 596.0611 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_creation_empty | 41.6310μs | 15.6402μs | 63.9377 KOps/s | 54.6991 KOps/s | $\textbf{\color{#35bf28}+16.89\\%}$ | | test_creation_nested_1 | 47.5110μs | 17.6278μs | 56.7286 KOps/s | 49.8892 KOps/s | $\textbf{\color{#35bf28}+13.71\\%}$ | | test_creation_nested_2 | 36.9600μs | 20.2886μs | 49.2886 KOps/s | 43.9278 KOps/s | $\textbf{\color{#35bf28}+12.20\\%}$ | | test_clone | 65.2810μs | 35.9039μs | 27.8521 KOps/s | 31.6269 KOps/s | $\textbf{\color{#d91a1a}-11.94\\%}$ | | test_getitem[int] | 1.1932ms | 19.6472μs | 50.8979 KOps/s | 57.0697 KOps/s | $\textbf{\color{#d91a1a}-10.81\\%}$ | | test_getitem[slice_int] | 0.1560ms | 32.9896μs | 30.3126 KOps/s | 32.6721 KOps/s | $\textbf{\color{#d91a1a}-7.22\\%}$ | | test_getitem[range] | 0.2992ms | 0.1179ms | 8.4823 KOps/s | 8.6217 KOps/s | $\color{#d91a1a}-1.62\\%$ | | test_getitem[tuple] | 0.1533ms | 28.5608μs | 35.0130 KOps/s | 37.5722 KOps/s | $\textbf{\color{#d91a1a}-6.81\\%}$ | | test_getitem[list] | 91.0653ms | 0.1222ms | 8.1823 KOps/s | 9.3009 KOps/s | $\textbf{\color{#d91a1a}-12.03\\%}$ | | test_setitem_dim[int] | 75.9610μs | 53.6260μs | 18.6477 KOps/s | 17.1169 KOps/s | $\textbf{\color{#35bf28}+8.94\\%}$ | | test_setitem_dim[slice_int] | 0.1010ms | 78.3854μs | 12.7575 KOps/s | 12.4341 KOps/s | $\color{#35bf28}+2.60\\%$ | | test_setitem_dim[range] | 0.1755ms | 0.1423ms | 7.0253 KOps/s | 6.9551 KOps/s | $\color{#35bf28}+1.01\\%$ | | test_setitem_dim[tuple] | 0.1018ms | 76.9037μs | 13.0033 KOps/s | 13.6433 KOps/s | $\color{#d91a1a}-4.69\\%$ | | test_setitem | 88.3010μs | 49.4471μs | 20.2236 KOps/s | 21.7166 KOps/s | $\textbf{\color{#d91a1a}-6.87\\%}$ | | test_set | 87.5720μs | 48.7267μs | 20.5226 KOps/s | 21.5390 KOps/s | $\color{#d91a1a}-4.72\\%$ | | test_set_shared | 0.3196ms | 59.9001μs | 16.6945 KOps/s | 17.9477 KOps/s | $\textbf{\color{#d91a1a}-6.98\\%}$ | | test_update | 84.0920μs | 51.2266μs | 19.5211 KOps/s | 17.8872 KOps/s | $\textbf{\color{#35bf28}+9.13\\%}$ | | test_update_nested | 95.1220μs | 59.4324μs | 16.8258 KOps/s | 15.4816 KOps/s | $\textbf{\color{#35bf28}+8.68\\%}$ | | test_update__nested | 0.1008ms | 63.5323μs | 15.7400 KOps/s | 14.7114 KOps/s | $\textbf{\color{#35bf28}+6.99\\%}$ | | test_set_nested | 81.5620μs | 45.5814μs | 21.9388 KOps/s | 20.3534 KOps/s | $\textbf{\color{#35bf28}+7.79\\%}$ | | test_set_nested_new | 81.7520μs | 49.6484μs | 20.1416 KOps/s | 18.6248 KOps/s | $\textbf{\color{#35bf28}+8.14\\%}$ | | test_select | 97.1920μs | 64.3549μs | 15.5388 KOps/s | 14.4975 KOps/s | $\textbf{\color{#35bf28}+7.18\\%}$ | | test_select_nested | 0.4375ms | 53.7560μs | 18.6026 KOps/s | 19.3889 KOps/s | $\color{#d91a1a}-4.06\\%$ | | test_exclude_nested | 99.8530μs | 69.3230μs | 14.4252 KOps/s | 14.2563 KOps/s | $\color{#35bf28}+1.18\\%$ | | test_empty[True] | 0.3296ms | 0.2876ms | 3.4769 KOps/s | 3.5080 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_empty[False] | 2.9990μs | 0.8607μs | 1.1618 MOps/s | 1.1604 MOps/s | $\color{#35bf28}+0.13\\%$ | | test_to | 67.8220μs | 40.7363μs | 24.5481 KOps/s | 22.9054 KOps/s | $\textbf{\color{#35bf28}+7.17\\%}$ | | test_to_nonblocking | 55.3600μs | 26.9438μs | 37.1143 KOps/s | 35.1141 KOps/s | $\textbf{\color{#35bf28}+5.70\\%}$ | | test_unbind_speed | 0.9536ms | 0.3056ms | 3.2720 KOps/s | 3.2068 KOps/s | $\color{#35bf28}+2.03\\%$ | | test_unbind_speed_stack0 | 0.3550ms | 0.2979ms | 3.3568 KOps/s | 3.2891 KOps/s | $\color{#35bf28}+2.06\\%$ | | test_unbind_speed_stack1 | 89.8912ms | 0.7607ms | 1.3145 KOps/s | 1.3023 KOps/s | $\color{#35bf28}+0.93\\%$ | | test_split | 91.6341ms | 2.3639ms | 423.0274 Ops/s | 408.7425 Ops/s | $\color{#35bf28}+3.49\\%$ | | test_chunk | 2.3614ms | 2.1693ms | 460.9766 Ops/s | 447.9500 Ops/s | $\color{#35bf28}+2.91\\%$ | | test_creation[device0] | 0.2438ms | 0.1077ms | 9.2879 KOps/s | 8.9563 KOps/s | $\color{#35bf28}+3.70\\%$ | | test_creation_from_tensor | 0.2282ms | 0.1036ms | 9.6508 KOps/s | 9.2681 KOps/s | $\color{#35bf28}+4.13\\%$ | | test_add_one[memmap_tensor0] | 62.7300μs | 9.1518μs | 109.2677 KOps/s | 108.1392 KOps/s | $\color{#35bf28}+1.04\\%$ | | test_contiguous[memmap_tensor0] | 26.7310μs | 2.2796μs | 438.6798 KOps/s | 445.2869 KOps/s | $\color{#d91a1a}-1.48\\%$ | | test_stack[memmap_tensor0] | 30.8600μs | 7.0403μs | 142.0386 KOps/s | 144.6378 KOps/s | $\color{#d91a1a}-1.80\\%$ | | test_memmaptd_index | 1.2010ms | 0.4353ms | 2.2971 KOps/s | 2.2051 KOps/s | $\color{#35bf28}+4.17\\%$ | | test_memmaptd_index_astensor | 92.7479ms | 0.5790ms | 1.7270 KOps/s | 1.7686 KOps/s | $\color{#d91a1a}-2.35\\%$ | | test_memmaptd_index_op | 1.5427ms | 1.0942ms | 913.8784 Ops/s | 881.1911 Ops/s | $\color{#35bf28}+3.71\\%$ | | test_serialize_model | 93.6259ms | 90.5736ms | 11.0407 Ops/s | 10.8366 Ops/s | $\color{#35bf28}+1.88\\%$ | | test_serialize_model_pickle | 1.3484s | 1.2359s | 0.8091 Ops/s | 0.8083 Ops/s | $\color{#35bf28}+0.10\\%$ | | test_serialize_weights | 91.0367ms | 86.0873ms | 11.6161 Ops/s | 10.9291 Ops/s | $\textbf{\color{#35bf28}+6.29\\%}$ | | test_serialize_weights_returnearly | 59.3357ms | 53.9733ms | 18.5277 Ops/s | 17.5907 Ops/s | $\textbf{\color{#35bf28}+5.33\\%}$ | | test_serialize_weights_pickle | 1.3957s | 1.2426s | 0.8048 Ops/s | 0.8526 Ops/s | $\textbf{\color{#d91a1a}-5.61\\%}$ | | test_reshape_pytree | 71.3010μs | 38.8547μs | 25.7369 KOps/s | 25.4810 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_reshape_td | 73.3710μs | 45.2001μs | 22.1239 KOps/s | 22.7138 KOps/s | $\color{#d91a1a}-2.60\\%$ | | test_view_pytree | 0.1313ms | 38.3231μs | 26.0939 KOps/s | 25.6660 KOps/s | $\color{#35bf28}+1.67\\%$ | | test_view_td | 0.2171ms | 50.9876μs | 19.6126 KOps/s | 19.1617 KOps/s | $\color{#35bf28}+2.35\\%$ | | test_unbind_pytree | 0.1796ms | 38.8417μs | 25.7455 KOps/s | 26.5107 KOps/s | $\color{#d91a1a}-2.89\\%$ | | test_unbind_td | 0.4384ms | 47.0597μs | 21.2496 KOps/s | 21.2867 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_split_pytree | 98.5710μs | 51.4363μs | 19.4415 KOps/s | 18.5577 KOps/s | $\color{#35bf28}+4.76\\%$ | | test_split_td | 0.2044ms | 61.7402μs | 16.1969 KOps/s | 16.1863 KOps/s | $\color{#35bf28}+0.07\\%$ | | test_add_pytree | 93.2320μs | 62.3329μs | 16.0429 KOps/s | 16.0156 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_add_td | 0.1327ms | 94.6768μs | 10.5623 KOps/s | 10.0623 KOps/s | $\color{#35bf28}+4.97\\%$ | | test_compile_add_one_nested[tensordict-compile] | 0.4154ms | 0.2138ms | 4.6767 KOps/s | 4.5962 KOps/s | $\color{#35bf28}+1.75\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.2684ms | 0.1807ms | 5.5347 KOps/s | 5.4966 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.2296ms | 0.1547ms | 6.4651 KOps/s | 6.4017 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.2784ms | 0.2199ms | 4.5475 KOps/s | 4.9253 KOps/s | $\textbf{\color{#d91a1a}-7.67\\%}$ | | test_compile_copy_nested[tensordict-compile] | 65.2410μs | 23.4327μs | 42.6754 KOps/s | 43.5827 KOps/s | $\color{#d91a1a}-2.08\\%$ | | test_compile_copy_nested[tensordict-eager] | 79.4210μs | 49.8822μs | 20.0472 KOps/s | 20.4728 KOps/s | $\color{#d91a1a}-2.08\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1433ms | 74.2686μs | 13.4646 KOps/s | 13.6894 KOps/s | $\color{#d91a1a}-1.64\\%$ | | test_compile_copy_nested[pytree-eager] | 0.1062ms | 60.1611μs | 16.6220 KOps/s | 16.6668 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.4053ms | 0.3391ms | 2.9493 KOps/s | 2.9402 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.3638ms | 0.2246ms | 4.4529 KOps/s | 4.4220 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.1682ms | 0.1345ms | 7.4366 KOps/s | 7.4251 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.1263ms | 67.9011μs | 14.7273 KOps/s | 15.7716 KOps/s | $\textbf{\color{#d91a1a}-6.62\\%}$ | | test_compile_add_one_flat[pytree-compile] | 0.3959ms | 0.3368ms | 2.9692 KOps/s | 2.9364 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.7857ms | 0.6735ms | 1.4848 KOps/s | 1.4939 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.7993ms | 0.2734ms | 3.6575 KOps/s | 3.6407 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.4147ms | 0.3394ms | 2.9462 KOps/s | 2.9309 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.1643ms | 80.3594μs | 12.4441 KOps/s | 12.7674 KOps/s | $\color{#d91a1a}-2.53\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.2662ms | 0.1352ms | 7.3972 KOps/s | 7.4216 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.6350ms | 0.5671ms | 1.7632 KOps/s | 1.7150 KOps/s | $\color{#35bf28}+2.81\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.4177ms | 0.3361ms | 2.9752 KOps/s | 2.9063 KOps/s | $\color{#35bf28}+2.37\\%$ | | test_compile_copy_flat[tensordict-compile] | 51.0510μs | 19.9622μs | 50.0948 KOps/s | 52.0083 KOps/s | $\color{#d91a1a}-3.68\\%$ | | test_compile_copy_flat[tensordict-eager] | 63.7810μs | 32.9156μs | 30.3807 KOps/s | 31.1374 KOps/s | $\color{#d91a1a}-2.43\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1238ms | 76.5395μs | 13.0652 KOps/s | 12.8801 KOps/s | $\color{#35bf28}+1.44\\%$ | | test_compile_copy_flat[pytree-eager] | 83.6810μs | 60.6057μs | 16.5001 KOps/s | 16.3417 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.5385ms | 0.9482ms | 1.0546 KOps/s | 1.0427 KOps/s | $\color{#35bf28}+1.14\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.5512ms | 3.4362ms | 291.0168 Ops/s | 290.8822 Ops/s | $\color{#35bf28}+0.05\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.5272ms | 0.9313ms | 1.0738 KOps/s | 1.0596 KOps/s | $\color{#35bf28}+1.34\\%$ | | test_compile_assign_and_add[pytree-eager] | 3.6218ms | 3.4885ms | 286.6546 Ops/s | 287.7530 Ops/s | $\color{#d91a1a}-0.38\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.1511ms | 0.1145ms | 8.7308 KOps/s | 8.7609 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.2349ms | 67.0668μs | 14.9105 KOps/s | 15.0182 KOps/s | $\color{#d91a1a}-0.72\\%$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.1453ms | 0.1079ms | 9.2667 KOps/s | 9.2761 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 90.0410μs | 49.9116μs | 20.0354 KOps/s | 20.6334 KOps/s | $\color{#d91a1a}-2.90\\%$ | | test_compile_indexing[tensor-pytree-compile] | 0.1615ms | 0.1125ms | 8.8911 KOps/s | 9.2404 KOps/s | $\color{#d91a1a}-3.78\\%$ | | test_compile_indexing[tensor-pytree-eager] | 83.6710μs | 51.0541μs | 19.5871 KOps/s | 20.5656 KOps/s | $\color{#d91a1a}-4.76\\%$ | | test_compile_indexing[slice-tensordict-compile] | 0.1935ms | 0.1433ms | 6.9760 KOps/s | 6.9657 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.1991ms | 27.7582μs | 36.0254 KOps/s | 35.6029 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.1755ms | 0.1365ms | 7.3264 KOps/s | 7.3157 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 51.9610μs | 23.5333μs | 42.4930 KOps/s | 42.6068 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.1963ms | 0.1387ms | 7.2099 KOps/s | 7.2043 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_compile_indexing[slice-pytree-eager] | 50.2700μs | 23.4942μs | 42.5637 KOps/s | 42.4147 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_compile_indexing[int-tensordict-compile] | 0.2082ms | 0.1432ms | 6.9828 KOps/s | 6.8161 KOps/s | $\color{#35bf28}+2.45\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.4859ms | 27.7855μs | 35.9900 KOps/s | 35.0022 KOps/s | $\color{#35bf28}+2.82\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.2292ms | 0.1376ms | 7.2672 KOps/s | 7.2788 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_compile_indexing[int-tensorclass-eager] | 63.1400μs | 23.7627μs | 42.0828 KOps/s | 42.1511 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_compile_indexing[int-pytree-compile] | 0.1912ms | 0.1359ms | 7.3606 KOps/s | 7.1568 KOps/s | $\color{#35bf28}+2.85\\%$ | | test_compile_indexing[int-pytree-eager] | 51.6610μs | 23.2935μs | 42.9305 KOps/s | 43.4133 KOps/s | $\color{#d91a1a}-1.11\\%$ | | test_mod_add[eager] | 67.2710μs | 38.4569μs | 26.0031 KOps/s | 24.7101 KOps/s | $\textbf{\color{#35bf28}+5.23\\%}$ | | test_mod_add[compile] | 0.1626ms | 70.5395μs | 14.1764 KOps/s | 13.5992 KOps/s | $\color{#35bf28}+4.24\\%$ | | test_mod_add[compile-overhead] | 0.2596ms | 0.1490ms | 6.7111 KOps/s | 6.6340 KOps/s | $\color{#35bf28}+1.16\\%$ | | test_mod_wrap[eager] | 0.3586ms | 0.2617ms | 3.8212 KOps/s | 3.6885 KOps/s | $\color{#35bf28}+3.60\\%$ | | test_mod_wrap[compile] | 1.3664ms | 0.2988ms | 3.3463 KOps/s | 3.3087 KOps/s | $\color{#35bf28}+1.14\\%$ | | test_mod_wrap[compile-overhead] | 8.2961ms | 4.3083ms | 232.1117 Ops/s | 231.5053 Ops/s | $\color{#35bf28}+0.26\\%$ | | test_mod_wrap_and_backward[eager] | 1.6379ms | 1.4881ms | 672.0121 Ops/s | 671.0434 Ops/s | $\color{#35bf28}+0.14\\%$ | | test_mod_wrap_and_backward[compile] | 1.5785ms | 1.4827ms | 674.4307 Ops/s | 717.1715 Ops/s | $\textbf{\color{#d91a1a}-5.96\\%}$ | | test_mod_wrap_and_backward[compile-overhead] | 1.5119ms | 1.0199ms | 980.4625 Ops/s | 1.0984 KOps/s | $\textbf{\color{#d91a1a}-10.73\\%}$ | | test_seq_add[eager] | 0.1622ms | 0.1158ms | 8.6377 KOps/s | 8.3928 KOps/s | $\color{#35bf28}+2.92\\%$ | | test_seq_add[compile] | 0.2296ms | 92.1988μs | 10.8461 KOps/s | 10.8513 KOps/s | $\color{#d91a1a}-0.05\\%$ | | test_seq_add[compile-overhead] | 0.1891ms | 0.1283ms | 7.7941 KOps/s | 7.9904 KOps/s | $\color{#d91a1a}-2.46\\%$ | | test_seq_wrap[eager] | 0.5303ms | 0.4531ms | 2.2070 KOps/s | 2.1915 KOps/s | $\color{#35bf28}+0.71\\%$ | | test_seq_wrap[compile] | 1.6051ms | 0.3392ms | 2.9485 KOps/s | 2.9826 KOps/s | $\color{#d91a1a}-1.14\\%$ | | test_seq_wrap[compile-overhead] | 0.3030s | 0.1453s | 6.8840 Ops/s | 6.7899 Ops/s | $\color{#35bf28}+1.39\\%$ | | test_func_call_runtime[False-eager] | 0.8502ms | 0.7760ms | 1.2887 KOps/s | 1.3057 KOps/s | $\color{#d91a1a}-1.30\\%$ | | test_func_call_runtime[False-compile] | 1.0317ms | 0.8267ms | 1.2097 KOps/s | 1.2000 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.4311ms | 0.3727ms | 2.6834 KOps/s | 2.6694 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_func_call_runtime[True-eager] | 1.0626ms | 0.9752ms | 1.0254 KOps/s | 1.0359 KOps/s | $\color{#d91a1a}-1.01\\%$ | | test_func_call_runtime[True-compile] | 0.9949ms | 0.8742ms | 1.1440 KOps/s | 1.1213 KOps/s | $\color{#35bf28}+2.02\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.4786ms | 0.4151ms | 2.4091 KOps/s | 2.4052 KOps/s | $\color{#35bf28}+0.16\\%$ | | test_func_call_cm_runtime[False-eager] | 1.0350ms | 0.7955ms | 1.2571 KOps/s | 1.3104 KOps/s | $\color{#d91a1a}-4.07\\%$ | | test_func_call_cm_runtime[False-compile] | 0.9074ms | 0.8301ms | 1.2046 KOps/s | 1.1997 KOps/s | $\color{#35bf28}+0.41\\%$ | | test_func_call_cm_runtime[False-compile-overhead] | 0.4349ms | 0.3750ms | 2.6666 KOps/s | 2.6838 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_func_call_cm_runtime[True-eager] | 1.2325ms | 1.0871ms | 919.8415 Ops/s | 925.0839 Ops/s | $\color{#d91a1a}-0.57\\%$ | | test_func_call_cm_runtime[True-compile] | 1.1301ms | 1.0521ms | 950.5090 Ops/s | 935.8289 Ops/s | $\color{#35bf28}+1.57\\%$ | | test_func_call_cm_runtime[True-compile-overhead] | 1.1298ms | 1.0553ms | 947.6420 Ops/s | 943.1171 Ops/s | $\color{#35bf28}+0.48\\%$ | | test_distributed | 0.1641ms | 68.3703μs | 14.6262 KOps/s | 14.3287 KOps/s | $\color{#35bf28}+2.08\\%$ | | test_tdmodule | 73.5920μs | 14.6976μs | 68.0385 KOps/s | 62.6985 KOps/s | $\textbf{\color{#35bf28}+8.52\\%}$ | | test_tdmodule_dispatch | 46.2500μs | 30.2275μs | 33.0824 KOps/s | 30.5015 KOps/s | $\textbf{\color{#35bf28}+8.46\\%}$ | | test_tdseq | 36.9510μs | 15.5983μs | 64.1097 KOps/s | 58.3920 KOps/s | $\textbf{\color{#35bf28}+9.79\\%}$ | | test_tdseq_dispatch | 61.7820μs | 32.8569μs | 30.4350 KOps/s | 27.8683 KOps/s | $\textbf{\color{#35bf28}+9.21\\%}$ | | test_instantiation_functorch | 2.1664ms | 2.0584ms | 485.8053 Ops/s | 478.4698 Ops/s | $\color{#35bf28}+1.53\\%$ | | test_instantiation_td | 2.1281ms | 1.3388ms | 746.9183 Ops/s | 743.5816 Ops/s | $\color{#35bf28}+0.45\\%$ | | test_exec_functorch | 0.3023ms | 0.2357ms | 4.2421 KOps/s | 4.2657 KOps/s | $\color{#d91a1a}-0.55\\%$ | | test_exec_functional_call | 0.2891ms | 0.2295ms | 4.3571 KOps/s | 4.3610 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_exec_td | 0.2948ms | 0.2311ms | 4.3278 KOps/s | 4.1485 KOps/s | $\color{#35bf28}+4.32\\%$ | | test_exec_td_decorator | 0.9157ms | 0.2871ms | 3.4836 KOps/s | 3.4818 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_vmap_mlp_speed[True-True] | 0.7845ms | 0.6711ms | 1.4901 KOps/s | 1.4509 KOps/s | $\color{#35bf28}+2.70\\%$ | | test_vmap_mlp_speed[True-False] | 0.7343ms | 0.6681ms | 1.4968 KOps/s | 1.4662 KOps/s | $\color{#35bf28}+2.09\\%$ | | test_vmap_mlp_speed[False-True] | 0.7046ms | 0.5857ms | 1.7075 KOps/s | 1.6687 KOps/s | $\color{#35bf28}+2.32\\%$ | | test_vmap_mlp_speed[False-False] | 0.6970ms | 0.5863ms | 1.7055 KOps/s | 1.6785 KOps/s | $\color{#35bf28}+1.61\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.4617ms | 0.7174ms | 1.3938 KOps/s | 1.3677 KOps/s | $\color{#35bf28}+1.91\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8912ms | 0.7200ms | 1.3888 KOps/s | 1.3642 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7670ms | 0.6333ms | 1.5790 KOps/s | 1.5565 KOps/s | $\color{#35bf28}+1.45\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8599ms | 0.6439ms | 1.5531 KOps/s | 1.5584 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_vmap_transformer_speed[True-True] | 9.5634ms | 9.0605ms | 110.3695 Ops/s | 111.6383 Ops/s | $\color{#d91a1a}-1.14\\%$ | | test_vmap_transformer_speed[True-False] | 9.3532ms | 8.9546ms | 111.6744 Ops/s | 111.9769 Ops/s | $\color{#d91a1a}-0.27\\%$ | | test_vmap_transformer_speed[False-True] | 9.0376ms | 8.8453ms | 113.0546 Ops/s | 112.8926 Ops/s | $\color{#35bf28}+0.14\\%$ | | test_vmap_transformer_speed[False-False] | 9.0813ms | 8.8615ms | 112.8482 Ops/s | 113.2455 Ops/s | $\color{#d91a1a}-0.35\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 21.2330ms | 21.0011ms | 47.6165 Ops/s | 47.8375 Ops/s | $\color{#d91a1a}-0.46\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 21.3271ms | 21.1140ms | 47.3619 Ops/s | 47.6682 Ops/s | $\color{#d91a1a}-0.64\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 21.1075ms | 20.8943ms | 47.8599 Ops/s | 48.1287 Ops/s | $\color{#d91a1a}-0.56\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 21.1652ms | 20.8688ms | 47.9185 Ops/s | 47.9936 Ops/s | $\color{#d91a1a}-0.16\\%$ | | test_to_module_speed[True] | 1.2714ms | 1.1389ms | 878.0193 Ops/s | 866.4275 Ops/s | $\color{#35bf28}+1.34\\%$ | | test_to_module_speed[False] | 1.2621ms | 1.1271ms | 887.2536 Ops/s | 891.4606 Ops/s | $\color{#d91a1a}-0.47\\%$ | | test_tc_init | 63.0710μs | 38.3756μs | 26.0582 KOps/s | 24.9271 KOps/s | $\color{#35bf28}+4.54\\%$ | | test_tc_init_nested | 0.1143ms | 77.2851μs | 12.9391 KOps/s | 12.2053 KOps/s | $\textbf{\color{#35bf28}+6.01\\%}$ | | test_tc_first_layer_tensor | 4.2002μs | 0.7944μs | 1.2587 MOps/s | 1.2613 MOps/s | $\color{#d91a1a}-0.20\\%$ | | test_tc_first_layer_nontensor | 0.1109ms | 2.5895μs | 386.1719 KOps/s | 389.8215 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_tc_second_layer_tensor | 6.7070μs | 1.6325μs | 612.5640 KOps/s | 610.2271 KOps/s | $\color{#35bf28}+0.38\\%$ | | test_tc_second_layer_nontensor | 18.4200μs | 3.4340μs | 291.2063 KOps/s | 294.1085 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_unbind | 0.3233s | 12.7149ms | 78.6476 Ops/s | 78.9603 Ops/s | $\color{#d91a1a}-0.40\\%$ | | test_full_like | 0.6557ms | 0.5771ms | 1.7329 KOps/s | 1.7234 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_zeros_like | 0.2683ms | 0.1977ms | 5.0592 KOps/s | 5.0619 KOps/s | $\color{#d91a1a}-0.05\\%$ | | test_ones_like | 0.2276ms | 0.1974ms | 5.0648 KOps/s | 5.0647 KOps/s | $+0.00\\%$ | | test_clone | 0.5351ms | 0.4135ms | 2.4185 KOps/s | 2.4101 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_squeeze | 34.7010μs | 11.4419μs | 87.3981 KOps/s | 88.9303 KOps/s | $\color{#d91a1a}-1.72\\%$ | | test_unsqueeze | 0.2449ms | 83.4415μs | 11.9844 KOps/s | 12.2948 KOps/s | $\color{#d91a1a}-2.52\\%$ | | test_split | 0.4643ms | 0.1763ms | 5.6735 KOps/s | 5.5921 KOps/s | $\color{#35bf28}+1.46\\%$ | | test_permute | 0.2685ms | 0.1956ms | 5.1131 KOps/s | 5.2725 KOps/s | $\color{#d91a1a}-3.02\\%$ | | test_stack | 1.2558ms | 0.8948ms | 1.1176 KOps/s | 1.1266 KOps/s | $\color{#d91a1a}-0.80\\%$ | | test_cat | 1.2527ms | 1.2316ms | 811.9453 Ops/s | 812.0058 Ops/s | $-0.01\\%$ |