pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
832 stars 74 forks source link

[Performance] Faster multithreaded pin_memory #919

Closed vmoens closed 3 months ago

github-actions[bot] commented 3 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}15$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 57.1070μs | 22.0848μs | 45.2800 KOps/s | 46.5607 KOps/s | $\color{#d91a1a}-2.75\\%$ | | test_plain_set_stack_nested | 75.2520μs | 22.0770μs | 45.2961 KOps/s | 46.0789 KOps/s | $\color{#d91a1a}-1.70\\%$ | | test_plain_set_nested_inplace | 62.7690μs | 24.3556μs | 41.0583 KOps/s | 42.4431 KOps/s | $\color{#d91a1a}-3.26\\%$ | | test_plain_set_stack_nested_inplace | 56.5260μs | 24.3044μs | 41.1448 KOps/s | 42.2092 KOps/s | $\color{#d91a1a}-2.52\\%$ | | test_items | 29.9760μs | 2.6850μs | 372.4393 KOps/s | 381.6137 KOps/s | $\color{#d91a1a}-2.40\\%$ | | test_items_nested | 2.5647ms | 0.3663ms | 2.7300 KOps/s | 2.9058 KOps/s | $\textbf{\color{#d91a1a}-6.05\\%}$ | | test_items_nested_locked | 0.5732ms | 0.3465ms | 2.8859 KOps/s | 2.9504 KOps/s | $\color{#d91a1a}-2.19\\%$ | | test_items_nested_leaf | 0.1644ms | 82.9631μs | 12.0536 KOps/s | 11.2950 KOps/s | $\textbf{\color{#35bf28}+6.72\\%}$ | | test_items_stack_nested | 0.5008ms | 0.3442ms | 2.9052 KOps/s | 2.9522 KOps/s | $\color{#d91a1a}-1.59\\%$ | | test_items_stack_nested_leaf | 0.3604ms | 87.3829μs | 11.4439 KOps/s | 11.5336 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_items_stack_nested_locked | 0.6305ms | 0.3480ms | 2.8732 KOps/s | 2.9272 KOps/s | $\color{#d91a1a}-1.85\\%$ | | test_keys | 45.3260μs | 3.9185μs | 255.1993 KOps/s | 258.0787 KOps/s | $\color{#d91a1a}-1.12\\%$ | | test_keys_nested | 0.2922ms | 0.1437ms | 6.9603 KOps/s | 6.8264 KOps/s | $\color{#35bf28}+1.96\\%$ | | test_keys_nested_locked | 0.7192ms | 0.1487ms | 6.7264 KOps/s | 6.6672 KOps/s | $\color{#35bf28}+0.89\\%$ | | test_keys_nested_leaf | 0.2097ms | 0.1240ms | 8.0657 KOps/s | 8.0042 KOps/s | $\color{#35bf28}+0.77\\%$ | | test_keys_stack_nested | 0.2397ms | 0.1441ms | 6.9415 KOps/s | 6.8290 KOps/s | $\color{#35bf28}+1.65\\%$ | | test_keys_stack_nested_leaf | 0.2516ms | 0.1273ms | 7.8561 KOps/s | 8.0614 KOps/s | $\color{#d91a1a}-2.55\\%$ | | test_keys_stack_nested_locked | 0.2548ms | 0.1492ms | 6.7027 KOps/s | 6.6663 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_values | 11.3162μs | 1.1660μs | 857.6269 KOps/s | 813.6885 KOps/s | $\textbf{\color{#35bf28}+5.40\\%}$ | | test_values_nested | 0.1219ms | 50.4613μs | 19.8172 KOps/s | 19.6042 KOps/s | $\color{#35bf28}+1.09\\%$ | | test_values_nested_locked | 98.0640μs | 50.3925μs | 19.8442 KOps/s | 19.7789 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_values_nested_leaf | 0.1006ms | 45.1342μs | 22.1562 KOps/s | 21.7799 KOps/s | $\color{#35bf28}+1.73\\%$ | | test_values_stack_nested | 99.0760μs | 50.9914μs | 19.6111 KOps/s | 19.4891 KOps/s | $\color{#35bf28}+0.63\\%$ | | test_values_stack_nested_leaf | 95.9810μs | 45.0391μs | 22.2029 KOps/s | 22.1744 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_values_stack_nested_locked | 0.1123ms | 50.2023μs | 19.9194 KOps/s | 19.3290 KOps/s | $\color{#35bf28}+3.05\\%$ | | test_membership | 46.2770μs | 0.9452μs | 1.0580 MOps/s | 1.0614 MOps/s | $\color{#d91a1a}-0.32\\%$ | | test_membership_nested | 29.7360μs | 2.6459μs | 377.9434 KOps/s | 373.8076 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_membership_nested_leaf | 36.6080μs | 2.6588μs | 376.1156 KOps/s | 373.2082 KOps/s | $\color{#35bf28}+0.78\\%$ | | test_membership_stacked_nested | 54.2220μs | 2.6478μs | 377.6681 KOps/s | 374.6930 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_membership_stacked_nested_leaf | 22.6930μs | 2.6645μs | 375.3064 KOps/s | 370.5416 KOps/s | $\color{#35bf28}+1.29\\%$ | | test_membership_nested_last | 36.5790μs | 4.0199μs | 248.7599 KOps/s | 250.4403 KOps/s | $\color{#d91a1a}-0.67\\%$ | | test_membership_nested_leaf_last | 33.6330μs | 3.9935μs | 250.4069 KOps/s | 251.6013 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_membership_stacked_nested_last | 35.8670μs | 3.9709μs | 251.8300 KOps/s | 251.3566 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_membership_stacked_nested_leaf_last | 52.5190μs | 3.9829μs | 251.0722 KOps/s | 247.7282 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_nested_getleaf | 39.3050μs | 10.5866μs | 94.4588 KOps/s | 94.2089 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_nested_get | 46.8780μs | 9.9423μs | 100.5801 KOps/s | 99.1299 KOps/s | $\color{#35bf28}+1.46\\%$ | | test_stacked_getleaf | 59.4420μs | 10.5156μs | 95.0970 KOps/s | 94.7991 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_stacked_get | 41.4580μs | 9.9460μs | 100.5428 KOps/s | 100.1864 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_nested_getitemleaf | 50.0140μs | 11.0426μs | 90.5587 KOps/s | 90.4676 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_nested_getitem | 42.5800μs | 10.0947μs | 99.0622 KOps/s | 97.6742 KOps/s | $\color{#35bf28}+1.42\\%$ | | test_stacked_getitemleaf | 43.9530μs | 11.0586μs | 90.4270 KOps/s | 91.2082 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_stacked_getitem | 43.0810μs | 10.1785μs | 98.2459 KOps/s | 98.2177 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_lock_nested | 86.7710ms | 0.5895ms | 1.6964 KOps/s | 1.9731 KOps/s | $\textbf{\color{#d91a1a}-14.02\\%}$ | | test_lock_stack_nested | 0.7836ms | 0.4755ms | 2.1030 KOps/s | 2.1065 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_unlock_nested | 95.8833ms | 0.5203ms | 1.9221 KOps/s | 2.3781 KOps/s | $\textbf{\color{#d91a1a}-19.17\\%}$ | | test_unlock_stack_nested | 0.5972ms | 0.3941ms | 2.5372 KOps/s | 2.5708 KOps/s | $\color{#d91a1a}-1.31\\%$ | | test_flatten_speed | 0.5740ms | 0.1042ms | 9.6010 KOps/s | 9.7047 KOps/s | $\color{#d91a1a}-1.07\\%$ | | test_unflatten_speed | 0.6977ms | 0.4417ms | 2.2642 KOps/s | 2.2885 KOps/s | $\color{#d91a1a}-1.06\\%$ | | test_common_ops | 5.2855ms | 1.1251ms | 888.8422 Ops/s | 904.6470 Ops/s | $\color{#d91a1a}-1.75\\%$ | | test_creation | 0.1089ms | 2.0623μs | 484.9015 KOps/s | 493.1683 KOps/s | $\color{#d91a1a}-1.68\\%$ | | test_creation_empty | 46.6380μs | 18.5018μs | 54.0488 KOps/s | 57.8514 KOps/s | $\textbf{\color{#d91a1a}-6.57\\%}$ | | test_creation_nested_1 | 61.3150μs | 21.6463μs | 46.1973 KOps/s | 48.5324 KOps/s | $\color{#d91a1a}-4.81\\%$ | | test_creation_nested_2 | 79.1290μs | 25.3946μs | 39.3785 KOps/s | 41.3014 KOps/s | $\color{#d91a1a}-4.66\\%$ | | test_clone | 0.1395ms | 17.4474μs | 57.3150 KOps/s | 60.7700 KOps/s | $\textbf{\color{#d91a1a}-5.69\\%}$ | | test_getitem[int] | 0.8845ms | 16.7204μs | 59.8072 KOps/s | 61.5784 KOps/s | $\color{#d91a1a}-2.88\\%$ | | test_getitem[slice_int] | 0.1238ms | 30.5658μs | 32.7163 KOps/s | 32.3253 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_getitem[range] | 0.3257ms | 58.4175μs | 17.1182 KOps/s | 17.4992 KOps/s | $\color{#d91a1a}-2.18\\%$ | | test_getitem[tuple] | 0.1418ms | 24.9710μs | 40.0465 KOps/s | 40.1770 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_getitem[list] | 0.3249ms | 52.4130μs | 19.0792 KOps/s | 18.9603 KOps/s | $\color{#35bf28}+0.63\\%$ | | test_setitem_dim[int] | 87.2950μs | 40.6649μs | 24.5912 KOps/s | 24.4863 KOps/s | $\color{#35bf28}+0.43\\%$ | | test_setitem_dim[slice_int] | 0.1289ms | 72.9932μs | 13.6999 KOps/s | 13.8116 KOps/s | $\color{#d91a1a}-0.81\\%$ | | test_setitem_dim[range] | 0.1553ms | 95.5499μs | 10.4657 KOps/s | 10.7017 KOps/s | $\color{#d91a1a}-2.21\\%$ | | test_setitem_dim[tuple] | 0.1602ms | 59.0275μs | 16.9413 KOps/s | 17.4802 KOps/s | $\color{#d91a1a}-3.08\\%$ | | test_setitem | 0.2148ms | 29.6636μs | 33.7114 KOps/s | 34.8521 KOps/s | $\color{#d91a1a}-3.27\\%$ | | test_set | 0.1890ms | 28.6961μs | 34.8480 KOps/s | 35.5242 KOps/s | $\color{#d91a1a}-1.90\\%$ | | test_set_shared | 4.6712ms | 0.2203ms | 4.5403 KOps/s | 4.5086 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_update | 0.2407ms | 35.8640μs | 27.8831 KOps/s | 28.9971 KOps/s | $\color{#d91a1a}-3.84\\%$ | | test_update_nested | 0.2072ms | 45.5551μs | 21.9514 KOps/s | 22.2387 KOps/s | $\color{#d91a1a}-1.29\\%$ | | test_update__nested | 0.2197ms | 35.1654μs | 28.4371 KOps/s | 29.6857 KOps/s | $\color{#d91a1a}-4.21\\%$ | | test_set_nested | 0.1774ms | 30.8699μs | 32.3940 KOps/s | 33.1996 KOps/s | $\color{#d91a1a}-2.43\\%$ | | test_set_nested_new | 0.2028ms | 35.9757μs | 27.7966 KOps/s | 28.7791 KOps/s | $\color{#d91a1a}-3.41\\%$ | | test_select | 0.2110ms | 53.5041μs | 18.6901 KOps/s | 19.1152 KOps/s | $\color{#d91a1a}-2.22\\%$ | | test_select_nested | 0.1132ms | 59.7009μs | 16.7502 KOps/s | 17.0218 KOps/s | $\color{#d91a1a}-1.60\\%$ | | test_exclude_nested | 0.1450ms | 78.2145μs | 12.7853 KOps/s | 12.8458 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_empty[True] | 1.0666ms | 0.3208ms | 3.1172 KOps/s | 3.0335 KOps/s | $\color{#35bf28}+2.76\\%$ | | test_empty[False] | 11.5582μs | 1.1570μs | 864.3416 KOps/s | 857.3994 KOps/s | $\color{#35bf28}+0.81\\%$ | | test_unbind_speed | 0.4335ms | 0.3146ms | 3.1786 KOps/s | 3.2095 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_unbind_speed_stack0 | 0.4943ms | 0.3117ms | 3.2083 KOps/s | 3.2790 KOps/s | $\color{#d91a1a}-2.16\\%$ | | test_unbind_speed_stack1 | 92.9876ms | 0.8145ms | 1.2277 KOps/s | 1.3326 KOps/s | $\textbf{\color{#d91a1a}-7.87\\%}$ | | test_split | 92.2825ms | 2.1577ms | 463.4498 Ops/s | 465.7897 Ops/s | $\color{#d91a1a}-0.50\\%$ | | test_chunk | 0.1018s | 2.1917ms | 456.2772 Ops/s | 465.5593 Ops/s | $\color{#d91a1a}-1.99\\%$ | | test_creation[device0] | 5.0242ms | 0.1241ms | 8.0569 KOps/s | 8.0355 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_creation_from_tensor | 0.3154ms | 0.1221ms | 8.1924 KOps/s | 8.2032 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_add_one[memmap_tensor0] | 0.3803ms | 7.9200μs | 126.2628 KOps/s | 131.3086 KOps/s | $\color{#d91a1a}-3.84\\%$ | | test_contiguous[memmap_tensor0] | 44.8340μs | 2.0139μs | 496.5413 KOps/s | 494.2394 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_stack[memmap_tensor0] | 58.9010μs | 5.6602μs | 176.6709 KOps/s | 175.4559 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_memmaptd_index | 1.1663ms | 0.4050ms | 2.4693 KOps/s | 2.4622 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_memmaptd_index_astensor | 1.0043ms | 0.4891ms | 2.0445 KOps/s | 2.0626 KOps/s | $\color{#d91a1a}-0.88\\%$ | | test_memmaptd_index_op | 1.5265ms | 1.0372ms | 964.1534 Ops/s | 980.6135 Ops/s | $\color{#d91a1a}-1.68\\%$ | | test_serialize_model | 0.1332s | 0.1226s | 8.1546 Ops/s | 7.3475 Ops/s | $\textbf{\color{#35bf28}+10.98\\%}$ | | test_serialize_model_pickle | 0.4791s | 0.3965s | 2.5219 Ops/s | 2.4964 Ops/s | $\color{#35bf28}+1.02\\%$ | | test_serialize_weights | 0.1216s | 0.1175s | 8.5093 Ops/s | 8.2631 Ops/s | $\color{#35bf28}+2.98\\%$ | | test_serialize_weights_returnearly | 0.1788s | 0.1629s | 6.1377 Ops/s | 6.0761 Ops/s | $\color{#35bf28}+1.01\\%$ | | test_serialize_weights_pickle | 1.0431s | 0.7298s | 1.3702 Ops/s | 2.3937 Ops/s | $\textbf{\color{#d91a1a}-42.76\\%}$ | | test_serialize_weights_filesystem | 0.1530s | 0.1449s | 6.9031 Ops/s | 6.3896 Ops/s | $\textbf{\color{#35bf28}+8.04\\%}$ | | test_serialize_model_filesystem | 0.2383s | 0.1601s | 6.2451 Ops/s | 6.4426 Ops/s | $\color{#d91a1a}-3.07\\%$ | | test_reshape_pytree | 91.8330μs | 39.8776μs | 25.0767 KOps/s | 25.3731 KOps/s | $\color{#d91a1a}-1.17\\%$ | | test_reshape_td | 0.1085ms | 47.7381μs | 20.9476 KOps/s | 20.9652 KOps/s | $\color{#d91a1a}-0.08\\%$ | | test_view_pytree | 98.8350μs | 39.7555μs | 25.1538 KOps/s | 25.1534 KOps/s | $+0.00\\%$ | | test_view_td | 0.1150ms | 52.7244μs | 18.9666 KOps/s | 19.2064 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_unbind_pytree | 79.6300μs | 36.9842μs | 27.0386 KOps/s | 27.0531 KOps/s | $\color{#d91a1a}-0.05\\%$ | | test_unbind_td | 0.3973ms | 46.7042μs | 21.4113 KOps/s | 21.8441 KOps/s | $\color{#d91a1a}-1.98\\%$ | | test_split_pytree | 79.6800μs | 40.2410μs | 24.8503 KOps/s | 25.1433 KOps/s | $\color{#d91a1a}-1.17\\%$ | | test_split_td | 0.5615ms | 58.0752μs | 17.2191 KOps/s | 17.1075 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_add_pytree | 0.1071ms | 48.3430μs | 20.6855 KOps/s | 21.9905 KOps/s | $\textbf{\color{#d91a1a}-5.93\\%}$ | | test_add_td | 0.1675ms | 82.7515μs | 12.0844 KOps/s | 12.3817 KOps/s | $\color{#d91a1a}-2.40\\%$ | | test_compile_add_one_nested[tensordict-compile] | 0.1370ms | 55.9359μs | 17.8776 KOps/s | 18.2217 KOps/s | $\color{#d91a1a}-1.89\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.3368ms | 0.1901ms | 5.2596 KOps/s | 5.3141 KOps/s | $\color{#d91a1a}-1.03\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.3181ms | 55.7755μs | 17.9290 KOps/s | 18.6105 KOps/s | $\color{#d91a1a}-3.66\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.3888ms | 0.1470ms | 6.8017 KOps/s | 6.8024 KOps/s | $\color{#d91a1a}-0.01\\%$ | | test_compile_copy_nested[tensordict-compile] | 71.7050μs | 21.1254μs | 47.3365 KOps/s | 48.8215 KOps/s | $\color{#d91a1a}-3.04\\%$ | | test_compile_copy_nested[tensordict-eager] | 0.1265ms | 65.7974μs | 15.1982 KOps/s | 15.5732 KOps/s | $\color{#d91a1a}-2.41\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1544ms | 79.5820μs | 12.5657 KOps/s | 12.3397 KOps/s | $\color{#35bf28}+1.83\\%$ | | test_compile_copy_nested[pytree-eager] | 0.1408ms | 72.5307μs | 13.7873 KOps/s | 13.5962 KOps/s | $\color{#35bf28}+1.41\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.2944ms | 0.1781ms | 5.6144 KOps/s | 5.6511 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.4503ms | 0.1958ms | 5.1071 KOps/s | 5.1395 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.1073ms | 38.8643μs | 25.7305 KOps/s | 24.8402 KOps/s | $\color{#35bf28}+3.58\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 1.2776ms | 69.6154μs | 14.3646 KOps/s | 14.2909 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.3857ms | 0.1775ms | 5.6345 KOps/s | 5.5619 KOps/s | $\color{#35bf28}+1.31\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.3967ms | 0.2962ms | 3.3756 KOps/s | 3.4060 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.5129ms | 0.2108ms | 4.7446 KOps/s | 4.7165 KOps/s | $\color{#35bf28}+0.60\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.5159ms | 0.1796ms | 5.5686 KOps/s | 5.5692 KOps/s | $\color{#d91a1a}-0.01\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.2066ms | 62.9604μs | 15.8830 KOps/s | 15.4452 KOps/s | $\color{#35bf28}+2.83\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.1123ms | 40.7874μs | 24.5174 KOps/s | 24.8835 KOps/s | $\color{#d91a1a}-1.47\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.4971ms | 0.2470ms | 4.0488 KOps/s | 4.0635 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.3065ms | 0.1768ms | 5.6562 KOps/s | 5.7060 KOps/s | $\color{#d91a1a}-0.87\\%$ | | test_compile_copy_flat[tensordict-compile] | 0.1948ms | 0.1117ms | 8.9519 KOps/s | 9.0112 KOps/s | $\color{#d91a1a}-0.66\\%$ | | test_compile_copy_flat[tensordict-eager] | 0.1136ms | 57.8552μs | 17.2845 KOps/s | 17.5904 KOps/s | $\color{#d91a1a}-1.74\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1562ms | 80.0526μs | 12.4918 KOps/s | 12.3973 KOps/s | $\color{#35bf28}+0.76\\%$ | | test_compile_copy_flat[pytree-eager] | 0.1343ms | 71.6020μs | 13.9661 KOps/s | 13.7389 KOps/s | $\color{#35bf28}+1.65\\%$ | | test_compile_assign_and_add[tensordict-compile] | 0.2815ms | 0.1902ms | 5.2580 KOps/s | 5.0754 KOps/s | $\color{#35bf28}+3.60\\%$ | | test_compile_assign_and_add[tensordict-eager] | 2.6204ms | 1.6638ms | 601.0226 Ops/s | 603.4665 Ops/s | $\color{#d91a1a}-0.40\\%$ | | test_compile_assign_and_add[pytree-compile] | 0.2768ms | 0.1876ms | 5.3303 KOps/s | 5.2717 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_compile_assign_and_add[pytree-eager] | 1.8700ms | 1.1201ms | 892.8033 Ops/s | 905.3829 Ops/s | $\color{#d91a1a}-1.39\\%$ | | test_compile_assign_and_add_stack[compile] | 0.5511ms | 0.4235ms | 2.3610 KOps/s | 2.3028 KOps/s | $\color{#35bf28}+2.53\\%$ | | test_compile_assign_and_add_stack[eager] | 4.1742ms | 3.8635ms | 258.8326 Ops/s | 261.8443 Ops/s | $\color{#d91a1a}-1.15\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.1024ms | 33.1166μs | 30.1963 KOps/s | 31.0478 KOps/s | $\color{#d91a1a}-2.74\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.6998ms | 48.9006μs | 20.4497 KOps/s | 20.7475 KOps/s | $\color{#d91a1a}-1.44\\%$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.1019ms | 28.9215μs | 34.5763 KOps/s | 36.4296 KOps/s | $\textbf{\color{#d91a1a}-5.09\\%}$ | | test_compile_indexing[tensor-tensorclass-eager] | 86.6330μs | 31.2260μs | 32.0246 KOps/s | 33.2024 KOps/s | $\color{#d91a1a}-3.55\\%$ | | test_compile_indexing[tensor-pytree-compile] | 88.3960μs | 29.1469μs | 34.3090 KOps/s | 35.9374 KOps/s | $\color{#d91a1a}-4.53\\%$ | | test_compile_indexing[tensor-pytree-eager] | 0.1258ms | 30.8766μs | 32.3870 KOps/s | 32.6666 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_compile_indexing[slice-tensordict-compile] | 0.1489ms | 73.4865μs | 13.6079 KOps/s | 13.5835 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.5812ms | 27.8692μs | 35.8820 KOps/s | 35.9103 KOps/s | $\color{#d91a1a}-0.08\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.1510ms | 67.8387μs | 14.7408 KOps/s | 14.7829 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 0.1284ms | 25.6001μs | 39.0623 KOps/s | 40.2156 KOps/s | $\color{#d91a1a}-2.87\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.1571ms | 67.7565μs | 14.7587 KOps/s | 14.9812 KOps/s | $\color{#d91a1a}-1.48\\%$ | | test_compile_indexing[slice-pytree-eager] | 75.2610μs | 24.6646μs | 40.5439 KOps/s | 40.0283 KOps/s | $\color{#35bf28}+1.29\\%$ | | test_compile_indexing[int-tensordict-compile] | 0.1901ms | 72.9827μs | 13.7019 KOps/s | 13.7371 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_compile_indexing[int-tensordict-eager] | 1.1926ms | 27.7452μs | 36.0423 KOps/s | 35.8271 KOps/s | $\color{#35bf28}+0.60\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.1362ms | 66.8537μs | 14.9580 KOps/s | 14.7940 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_compile_indexing[int-tensorclass-eager] | 69.9720μs | 24.3126μs | 41.1309 KOps/s | 40.4708 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_compile_indexing[int-pytree-compile] | 0.1900ms | 67.4198μs | 14.8324 KOps/s | 14.8926 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_compile_indexing[int-pytree-eager] | 0.5133ms | 24.1045μs | 41.4859 KOps/s | 40.4930 KOps/s | $\color{#35bf28}+2.45\\%$ | | test_mod_add[eager] | 74.5100μs | 25.6268μs | 39.0216 KOps/s | 42.9430 KOps/s | $\textbf{\color{#d91a1a}-9.13\\%}$ | | test_mod_add[compile] | 98.6660μs | 37.9978μs | 26.3173 KOps/s | 25.5954 KOps/s | $\color{#35bf28}+2.82\\%$ | | test_mod_add[compile-overhead] | 88.1350μs | 38.3422μs | 26.0809 KOps/s | 26.1693 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_mod_wrap[eager] | 0.4218ms | 0.2111ms | 4.7381 KOps/s | 4.7242 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_mod_wrap[compile] | 1.8983ms | 0.2280ms | 4.3857 KOps/s | 4.2432 KOps/s | $\color{#35bf28}+3.36\\%$ | | test_mod_wrap[compile-overhead] | 0.4457ms | 0.2280ms | 4.3863 KOps/s | 4.2699 KOps/s | $\color{#35bf28}+2.73\\%$ | | test_mod_wrap_and_backward[eager] | 12.8418ms | 11.2115ms | 89.1944 Ops/s | 89.7296 Ops/s | $\color{#d91a1a}-0.60\\%$ | | test_mod_wrap_and_backward[compile] | 13.4741ms | 11.2329ms | 89.0244 Ops/s | 86.0770 Ops/s | $\color{#35bf28}+3.42\\%$ | | test_mod_wrap_and_backward[compile-overhead] | 13.7288ms | 11.2775ms | 88.6718 Ops/s | 81.8049 Ops/s | $\textbf{\color{#35bf28}+8.39\\%}$ | | test_seq_add[eager] | 0.1568ms | 87.0560μs | 11.4869 KOps/s | 11.2579 KOps/s | $\color{#35bf28}+2.03\\%$ | | test_seq_add[compile] | 0.1540ms | 62.1949μs | 16.0785 KOps/s | 15.9270 KOps/s | $\color{#35bf28}+0.95\\%$ | | test_seq_add[compile-overhead] | 0.1520ms | 60.3985μs | 16.5567 KOps/s | 16.3756 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_seq_wrap[eager] | 0.7344ms | 0.3864ms | 2.5877 KOps/s | 2.6219 KOps/s | $\color{#d91a1a}-1.30\\%$ | | test_seq_wrap[compile] | 0.4944ms | 0.2646ms | 3.7789 KOps/s | 3.7070 KOps/s | $\color{#35bf28}+1.94\\%$ | | test_seq_wrap[compile-overhead] | 0.5208ms | 0.2675ms | 3.7384 KOps/s | 3.5900 KOps/s | $\color{#35bf28}+4.13\\%$ | | test_func_call_runtime[False-eager] | 0.6660ms | 0.5408ms | 1.8492 KOps/s | 1.8535 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_func_call_runtime[False-compile] | 0.7525ms | 0.4954ms | 2.0186 KOps/s | 1.9587 KOps/s | $\color{#35bf28}+3.06\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.6315ms | 0.4919ms | 2.0330 KOps/s | 1.9636 KOps/s | $\color{#35bf28}+3.53\\%$ | | test_func_call_runtime[True-eager] | 1.0910ms | 0.7666ms | 1.3045 KOps/s | 1.3101 KOps/s | $\color{#d91a1a}-0.42\\%$ | | test_func_call_runtime[True-compile] | 0.8222ms | 0.5178ms | 1.9312 KOps/s | 1.8912 KOps/s | $\color{#35bf28}+2.12\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.6857ms | 0.5139ms | 1.9458 KOps/s | 1.8962 KOps/s | $\color{#35bf28}+2.62\\%$ | | test_func_call_cm_runtime[False-eager] | 1.1503ms | 0.5473ms | 1.8271 KOps/s | 1.8807 KOps/s | $\color{#d91a1a}-2.85\\%$ | | test_func_call_cm_runtime[False-compile] | 0.6275ms | 0.4958ms | 2.0168 KOps/s | 1.9571 KOps/s | $\color{#35bf28}+3.05\\%$ | | test_func_call_cm_runtime[False-compile-overhead] | 0.9046ms | 0.4974ms | 2.0106 KOps/s | 1.9571 KOps/s | $\color{#35bf28}+2.73\\%$ | | test_func_call_cm_runtime[True-eager] | 1.3226ms | 0.9074ms | 1.1021 KOps/s | 1.1176 KOps/s | $\color{#d91a1a}-1.39\\%$ | | test_func_call_cm_runtime[True-compile] | 1.2419ms | 0.8551ms | 1.1694 KOps/s | 1.1688 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_func_call_cm_runtime[True-compile-overhead] | 0.9823ms | 0.8446ms | 1.1839 KOps/s | 1.1657 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_distributed | 0.3758ms | 0.1349ms | 7.4133 KOps/s | 7.4385 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_tdmodule | 31.8600μs | 17.4894μs | 57.1773 KOps/s | 60.8961 KOps/s | $\textbf{\color{#d91a1a}-6.11\\%}$ | | test_tdmodule_dispatch | 72.1350μs | 37.2150μs | 26.8709 KOps/s | 28.8362 KOps/s | $\textbf{\color{#d91a1a}-6.82\\%}$ | | test_tdseq | 49.4330μs | 20.1840μs | 49.5441 KOps/s | 52.2877 KOps/s | $\textbf{\color{#d91a1a}-5.25\\%}$ | | test_tdseq_dispatch | 0.1054ms | 41.2464μs | 24.2446 KOps/s | 25.4546 KOps/s | $\color{#d91a1a}-4.75\\%$ | | test_instantiation_functorch | 1.9128ms | 1.6561ms | 603.8350 Ops/s | 603.4674 Ops/s | $\color{#35bf28}+0.06\\%$ | | test_instantiation_td | 2.0475ms | 1.2079ms | 827.8964 Ops/s | 814.1796 Ops/s | $\color{#35bf28}+1.68\\%$ | | test_exec_functorch | 0.3095ms | 0.1790ms | 5.5876 KOps/s | 5.5841 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_exec_functional_call | 0.3135ms | 0.1681ms | 5.9501 KOps/s | 5.8044 KOps/s | $\color{#35bf28}+2.51\\%$ | | test_exec_td | 0.2613ms | 0.1698ms | 5.8904 KOps/s | 5.8734 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_exec_td_decorator | 1.1878ms | 0.2241ms | 4.4618 KOps/s | 4.4116 KOps/s | $\color{#35bf28}+1.14\\%$ | | test_vmap_mlp_speed[True-True] | 1.0173ms | 0.6071ms | 1.6471 KOps/s | 1.6654 KOps/s | $\color{#d91a1a}-1.09\\%$ | | test_vmap_mlp_speed[True-False] | 0.9379ms | 0.5965ms | 1.6765 KOps/s | 1.6839 KOps/s | $\color{#d91a1a}-0.44\\%$ | | test_vmap_mlp_speed[False-True] | 0.7724ms | 0.5001ms | 1.9995 KOps/s | 2.0327 KOps/s | $\color{#d91a1a}-1.63\\%$ | | test_vmap_mlp_speed[False-False] | 0.7471ms | 0.4945ms | 2.0223 KOps/s | 2.0388 KOps/s | $\color{#d91a1a}-0.81\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.5956ms | 0.6565ms | 1.5233 KOps/s | 1.5262 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9133ms | 0.6554ms | 1.5257 KOps/s | 1.5272 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8390ms | 0.5417ms | 1.8460 KOps/s | 1.8577 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7731ms | 0.5384ms | 1.8572 KOps/s | 1.8674 KOps/s | $\color{#d91a1a}-0.55\\%$ | | test_to_module_speed[True] | 2.0889ms | 1.3333ms | 749.9943 Ops/s | 737.3661 Ops/s | $\color{#35bf28}+1.71\\%$ | | test_to_module_speed[False] | 2.0521ms | 1.2956ms | 771.8411 Ops/s | 764.8524 Ops/s | $\color{#35bf28}+0.91\\%$ | | test_tc_init | 89.6780μs | 44.9286μs | 22.2576 KOps/s | 24.5015 KOps/s | $\textbf{\color{#d91a1a}-9.16\\%}$ | | test_tc_init_nested | 0.1636ms | 91.4097μs | 10.9398 KOps/s | 12.3431 KOps/s | $\textbf{\color{#d91a1a}-11.37\\%}$ | | test_tc_first_layer_tensor | 17.9540μs | 1.4879μs | 672.1054 KOps/s | 696.1627 KOps/s | $\color{#d91a1a}-3.46\\%$ | | test_tc_first_layer_nontensor | 26.3290μs | 4.3789μs | 228.3696 KOps/s | 237.0219 KOps/s | $\color{#d91a1a}-3.65\\%$ | | test_tc_second_layer_tensor | 26.3090μs | 2.7903μs | 358.3806 KOps/s | 370.8100 KOps/s | $\color{#d91a1a}-3.35\\%$ | | test_tc_second_layer_nontensor | 39.8850μs | 5.6303μs | 177.6102 KOps/s | 182.7295 KOps/s | $\color{#d91a1a}-2.80\\%$ | | test_unbind | 0.4872s | 14.3025ms | 69.9178 Ops/s | 71.6086 Ops/s | $\color{#d91a1a}-2.36\\%$ | | test_full_like | 9.9486ms | 7.8108ms | 128.0276 Ops/s | 122.5817 Ops/s | $\color{#35bf28}+4.44\\%$ | | test_zeros_like | 14.8902ms | 7.1184ms | 140.4815 Ops/s | 127.8337 Ops/s | $\textbf{\color{#35bf28}+9.89\\%}$ | | test_ones_like | 13.9081ms | 7.6295ms | 131.0701 Ops/s | 118.1249 Ops/s | $\textbf{\color{#35bf28}+10.96\\%}$ | | test_clone | 18.7560ms | 9.7591ms | 102.4683 Ops/s | 104.9291 Ops/s | $\color{#d91a1a}-2.35\\%$ | | test_squeeze | 94.8080μs | 13.3321μs | 75.0069 KOps/s | 74.9941 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_unsqueeze | 0.2076ms | 93.4858μs | 10.6968 KOps/s | 10.8238 KOps/s | $\color{#d91a1a}-1.17\\%$ | | test_split | 0.4664ms | 0.1999ms | 5.0013 KOps/s | 4.9534 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_permute | 0.3686ms | 0.2234ms | 4.4755 KOps/s | 4.4994 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_stack | 31.3722ms | 24.6634ms | 40.5459 Ops/s | 39.5499 Ops/s | $\color{#35bf28}+2.52\\%$ | | test_cat | 31.1815ms | 24.8725ms | 40.2050 Ops/s | 40.9227 Ops/s | $\color{#d91a1a}-1.75\\%$ |
github-actions[bot] commented 3 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 225. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}62$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ------------------------------------ | | test_plain_set_nested | 0.1652ms | 18.1731μs | 55.0265 KOps/s | 56.4716 KOps/s | $\color{#d91a1a}-2.56\\%$ | | test_plain_set_stack_nested | 49.3010μs | 18.4184μs | 54.2935 KOps/s | 56.2893 KOps/s | $\color{#d91a1a}-3.55\\%$ | | test_plain_set_nested_inplace | 38.3610μs | 19.5038μs | 51.2720 KOps/s | 53.1848 KOps/s | $\color{#d91a1a}-3.60\\%$ | | test_plain_set_stack_nested_inplace | 54.4210μs | 19.6695μs | 50.8400 KOps/s | 53.3307 KOps/s | $\color{#d91a1a}-4.67\\%$ | | test_items | 26.5200μs | 4.7346μs | 211.2103 KOps/s | 211.4765 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_items_nested | 0.4097ms | 0.3695ms | 2.7066 KOps/s | 2.7318 KOps/s | $\color{#d91a1a}-0.92\\%$ | | test_items_nested_locked | 0.3977ms | 0.3656ms | 2.7354 KOps/s | 2.7296 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_items_nested_leaf | 0.1106ms | 85.2963μs | 11.7238 KOps/s | 11.9391 KOps/s | $\color{#d91a1a}-1.80\\%$ | | test_items_stack_nested | 0.3948ms | 0.3687ms | 2.7126 KOps/s | 2.7087 KOps/s | $\color{#35bf28}+0.14\\%$ | | test_items_stack_nested_leaf | 0.1111ms | 85.3649μs | 11.7144 KOps/s | 11.8183 KOps/s | $\color{#d91a1a}-0.88\\%$ | | test_items_stack_nested_locked | 0.3903ms | 0.3600ms | 2.7779 KOps/s | 2.7243 KOps/s | $\color{#35bf28}+1.97\\%$ | | test_keys | 21.6800μs | 4.4751μs | 223.4577 KOps/s | 227.8195 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_keys_nested | 0.1097ms | 71.3560μs | 14.0142 KOps/s | 15.2793 KOps/s | $\textbf{\color{#d91a1a}-8.28\\%}$ | | test_keys_nested_locked | 0.7583ms | 76.4659μs | 13.0777 KOps/s | 13.7896 KOps/s | $\textbf{\color{#d91a1a}-5.16\\%}$ | | test_keys_nested_leaf | 78.8510μs | 62.1188μs | 16.0982 KOps/s | 17.3428 KOps/s | $\textbf{\color{#d91a1a}-7.18\\%}$ | | test_keys_stack_nested | 0.1009ms | 72.2711μs | 13.8368 KOps/s | 15.1204 KOps/s | $\textbf{\color{#d91a1a}-8.49\\%}$ | | test_keys_stack_nested_leaf | 82.3520μs | 60.5770μs | 16.5079 KOps/s | 17.5412 KOps/s | $\textbf{\color{#d91a1a}-5.89\\%}$ | | test_keys_stack_nested_locked | 91.5320μs | 76.1046μs | 13.1398 KOps/s | 14.1075 KOps/s | $\textbf{\color{#d91a1a}-6.86\\%}$ | | test_values | 8.9303μs | 1.7972μs | 556.4277 KOps/s | 571.5008 KOps/s | $\color{#d91a1a}-2.64\\%$ | | test_values_nested | 49.5710μs | 35.8459μs | 27.8972 KOps/s | 29.4111 KOps/s | $\textbf{\color{#d91a1a}-5.15\\%}$ | | test_values_nested_locked | 56.2110μs | 37.1864μs | 26.8915 KOps/s | 27.8251 KOps/s | $\color{#d91a1a}-3.36\\%$ | | test_values_nested_leaf | 0.2288ms | 32.3204μs | 30.9402 KOps/s | 33.1128 KOps/s | $\textbf{\color{#d91a1a}-6.56\\%}$ | | test_values_stack_nested | 0.2376ms | 36.0922μs | 27.7068 KOps/s | 28.6209 KOps/s | $\color{#d91a1a}-3.19\\%$ | | test_values_stack_nested_leaf | 0.2255ms | 32.3496μs | 30.9123 KOps/s | 31.9920 KOps/s | $\color{#d91a1a}-3.37\\%$ | | test_values_stack_nested_locked | 54.0210μs | 37.3098μs | 26.8026 KOps/s | 27.3418 KOps/s | $\color{#d91a1a}-1.97\\%$ | | test_membership | 1.4210μs | 0.5969μs | 1.6754 MOps/s | 1.8039 MOps/s | $\textbf{\color{#d91a1a}-7.12\\%}$ | | test_membership_nested | 20.1310μs | 2.0875μs | 479.0413 KOps/s | 515.9024 KOps/s | $\textbf{\color{#d91a1a}-7.14\\%}$ | | test_membership_nested_leaf | 16.6800μs | 1.9898μs | 502.5573 KOps/s | 516.2376 KOps/s | $\color{#d91a1a}-2.65\\%$ | | test_membership_stacked_nested | 15.6510μs | 2.1401μs | 467.2657 KOps/s | 487.1205 KOps/s | $\color{#d91a1a}-4.08\\%$ | | test_membership_stacked_nested_leaf | 30.6110μs | 2.1007μs | 476.0311 KOps/s | 483.6900 KOps/s | $\color{#d91a1a}-1.58\\%$ | | test_membership_nested_last | 20.0810μs | 3.1341μs | 319.0677 KOps/s | 334.5914 KOps/s | $\color{#d91a1a}-4.64\\%$ | | test_membership_nested_leaf_last | 30.7510μs | 3.1562μs | 316.8330 KOps/s | 338.2318 KOps/s | $\textbf{\color{#d91a1a}-6.33\\%}$ | | test_membership_stacked_nested_last | 24.1710μs | 3.1284μs | 319.6477 KOps/s | 108.0223 KOps/s | $\textbf{\color{#35bf28}+195.91\\%}$ | | test_membership_stacked_nested_leaf_last | 17.6500μs | 3.1154μs | 320.9848 KOps/s | 108.4653 KOps/s | $\textbf{\color{#35bf28}+195.93\\%}$ | | test_nested_getleaf | 37.6700μs | 8.6081μs | 116.1699 KOps/s | 124.4431 KOps/s | $\textbf{\color{#d91a1a}-6.65\\%}$ | | test_nested_get | 27.7910μs | 8.0731μs | 123.8676 KOps/s | 132.9464 KOps/s | $\textbf{\color{#d91a1a}-6.83\\%}$ | | test_stacked_getleaf | 20.8710μs | 8.5880μs | 116.4419 KOps/s | 123.7366 KOps/s | $\textbf{\color{#d91a1a}-5.90\\%}$ | | test_stacked_get | 35.4300μs | 8.0077μs | 124.8802 KOps/s | 133.4293 KOps/s | $\textbf{\color{#d91a1a}-6.41\\%}$ | | test_nested_getitemleaf | 34.1710μs | 8.7867μs | 113.8083 KOps/s | 123.0885 KOps/s | $\textbf{\color{#d91a1a}-7.54\\%}$ | | test_nested_getitem | 26.6000μs | 8.2442μs | 121.2975 KOps/s | 131.0736 KOps/s | $\textbf{\color{#d91a1a}-7.46\\%}$ | | test_stacked_getitemleaf | 37.8210μs | 8.7182μs | 114.7024 KOps/s | 122.8826 KOps/s | $\textbf{\color{#d91a1a}-6.66\\%}$ | | test_stacked_getitem | 30.1910μs | 8.2181μs | 121.6826 KOps/s | 131.3094 KOps/s | $\textbf{\color{#d91a1a}-7.33\\%}$ | | test_lock_nested | 1.2933ms | 0.4729ms | 2.1146 KOps/s | 2.1470 KOps/s | $\color{#d91a1a}-1.51\\%$ | | test_lock_stack_nested | 0.4666ms | 0.4407ms | 2.2693 KOps/s | 2.3779 KOps/s | $\color{#d91a1a}-4.57\\%$ | | test_unlock_nested | 0.8318ms | 0.3956ms | 2.5281 KOps/s | 2.5843 KOps/s | $\color{#d91a1a}-2.17\\%$ | | test_unlock_stack_nested | 0.3832ms | 0.3640ms | 2.7472 KOps/s | 2.9475 KOps/s | $\textbf{\color{#d91a1a}-6.79\\%}$ | | test_flatten_speed | 0.1908ms | 0.1049ms | 9.5309 KOps/s | 9.6025 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_unflatten_speed | 0.3221ms | 0.3015ms | 3.3166 KOps/s | 3.4801 KOps/s | $\color{#d91a1a}-4.70\\%$ | | test_common_ops | 1.6507ms | 1.3526ms | 739.2913 Ops/s | 748.5035 Ops/s | $\color{#d91a1a}-1.23\\%$ | | test_creation | 16.9410μs | 1.7310μs | 577.6859 KOps/s | 613.6676 KOps/s | $\textbf{\color{#d91a1a}-5.86\\%}$ | | test_creation_empty | 43.5910μs | 18.6437μs | 53.6373 KOps/s | 53.7933 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_creation_nested_1 | 48.9310μs | 20.4949μs | 48.7925 KOps/s | 48.9650 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_creation_nested_2 | 44.9310μs | 22.9678μs | 43.5392 KOps/s | 43.6332 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_clone | 0.1924ms | 32.1034μs | 31.1493 KOps/s | 34.8782 KOps/s | $\textbf{\color{#d91a1a}-10.69\\%}$ | | test_getitem[int] | 1.0864ms | 18.2144μs | 54.9015 KOps/s | 58.8240 KOps/s | $\textbf{\color{#d91a1a}-6.67\\%}$ | | test_getitem[slice_int] | 0.1414ms | 30.4706μs | 32.8185 KOps/s | 35.2325 KOps/s | $\textbf{\color{#d91a1a}-6.85\\%}$ | | test_getitem[range] | 0.2544ms | 0.1168ms | 8.5590 KOps/s | 8.9500 KOps/s | $\color{#d91a1a}-4.37\\%$ | | test_getitem[tuple] | 0.1552ms | 26.5893μs | 37.6091 KOps/s | 40.1730 KOps/s | $\textbf{\color{#d91a1a}-6.38\\%}$ | | test_getitem[list] | 0.2213ms | 0.1064ms | 9.4003 KOps/s | 9.7335 KOps/s | $\color{#d91a1a}-3.42\\%$ | | test_setitem_dim[int] | 72.8710μs | 55.0641μs | 18.1606 KOps/s | 18.2617 KOps/s | $\color{#d91a1a}-0.55\\%$ | | test_setitem_dim[slice_int] | 0.2478ms | 80.1135μs | 12.4823 KOps/s | 12.5884 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_setitem_dim[range] | 0.1649ms | 0.1428ms | 7.0046 KOps/s | 7.0818 KOps/s | $\color{#d91a1a}-1.09\\%$ | | test_setitem_dim[tuple] | 94.4920μs | 72.1991μs | 13.8506 KOps/s | 13.8791 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_setitem | 0.2385ms | 45.5420μs | 21.9578 KOps/s | 23.4335 KOps/s | $\textbf{\color{#d91a1a}-6.30\\%}$ | | test_set | 0.2232ms | 44.9902μs | 22.2271 KOps/s | 24.1571 KOps/s | $\textbf{\color{#d91a1a}-7.99\\%}$ | | test_set_shared | 0.3858ms | 55.8476μs | 17.9059 KOps/s | 19.2606 KOps/s | $\textbf{\color{#d91a1a}-7.03\\%}$ | | test_update | 78.1410μs | 53.9749μs | 18.5271 KOps/s | 19.4828 KOps/s | $\color{#d91a1a}-4.91\\%$ | | test_update_nested | 0.1900ms | 61.0307μs | 16.3852 KOps/s | 16.9587 KOps/s | $\color{#d91a1a}-3.38\\%$ | | test_update__nested | 0.2374ms | 63.5987μs | 15.7236 KOps/s | 15.4826 KOps/s | $\color{#35bf28}+1.56\\%$ | | test_set_nested | 0.2030ms | 47.1906μs | 21.1907 KOps/s | 21.2315 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_set_nested_new | 0.1964ms | 51.9713μs | 19.2414 KOps/s | 19.6713 KOps/s | $\color{#d91a1a}-2.19\\%$ | | test_select | 0.2083ms | 66.2664μs | 15.0906 KOps/s | 15.3643 KOps/s | $\color{#d91a1a}-1.78\\%$ | | test_select_nested | 73.7220μs | 51.8846μs | 19.2735 KOps/s | 19.4587 KOps/s | $\color{#d91a1a}-0.95\\%$ | | test_exclude_nested | 94.3820μs | 71.9110μs | 13.9061 KOps/s | 14.5590 KOps/s | $\color{#d91a1a}-4.48\\%$ | | test_empty[True] | 0.3183ms | 0.2899ms | 3.4492 KOps/s | 3.5470 KOps/s | $\color{#d91a1a}-2.76\\%$ | | test_empty[False] | 2.4490μs | 0.9224μs | 1.0841 MOps/s | 1.1351 MOps/s | $\color{#d91a1a}-4.49\\%$ | | test_to | 0.1524ms | 37.5294μs | 26.6458 KOps/s | 26.3336 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_to_nonblocking | 45.4710μs | 23.5941μs | 42.3834 KOps/s | 41.6892 KOps/s | $\color{#35bf28}+1.67\\%$ | | test_unbind_speed | 0.4022ms | 0.3160ms | 3.1649 KOps/s | 3.2941 KOps/s | $\color{#d91a1a}-3.92\\%$ | | test_unbind_speed_stack0 | 0.3669ms | 0.3081ms | 3.2461 KOps/s | 3.3834 KOps/s | $\color{#d91a1a}-4.06\\%$ | | test_unbind_speed_stack1 | 92.2375ms | 0.7896ms | 1.2665 KOps/s | 1.3285 KOps/s | $\color{#d91a1a}-4.67\\%$ | | test_split | 93.3784ms | 2.4274ms | 411.9703 Ops/s | 439.7694 Ops/s | $\textbf{\color{#d91a1a}-6.32\\%}$ | | test_chunk | 93.5205ms | 2.4252ms | 412.3407 Ops/s | 436.2673 Ops/s | $\textbf{\color{#d91a1a}-5.48\\%}$ | | test_creation[device0] | 0.2217ms | 0.1059ms | 9.4426 KOps/s | 9.5805 KOps/s | $\color{#d91a1a}-1.44\\%$ | | test_creation_from_tensor | 0.1554ms | 0.1021ms | 9.7963 KOps/s | 9.7669 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_add_one[memmap_tensor0] | 96.6020μs | 9.4961μs | 105.3059 KOps/s | 115.8761 KOps/s | $\textbf{\color{#d91a1a}-9.12\\%}$ | | test_contiguous[memmap_tensor0] | 28.6110μs | 2.2761μs | 439.3512 KOps/s | 449.9559 KOps/s | $\color{#d91a1a}-2.36\\%$ | | test_stack[memmap_tensor0] | 25.1600μs | 7.1330μs | 140.1932 KOps/s | 153.4442 KOps/s | $\textbf{\color{#d91a1a}-8.64\\%}$ | | test_memmaptd_index | 1.2235ms | 0.4547ms | 2.1994 KOps/s | 2.3082 KOps/s | $\color{#d91a1a}-4.72\\%$ | | test_memmaptd_index_astensor | 0.8023ms | 0.5225ms | 1.9140 KOps/s | 2.0056 KOps/s | $\color{#d91a1a}-4.57\\%$ | | test_memmaptd_index_op | 1.5282ms | 1.1307ms | 884.4158 Ops/s | 941.8615 Ops/s | $\textbf{\color{#d91a1a}-6.10\\%}$ | | test_serialize_model | 92.1614ms | 90.1035ms | 11.0984 Ops/s | 10.8855 Ops/s | $\color{#35bf28}+1.96\\%$ | | test_serialize_model_pickle | 1.3478s | 1.2361s | 0.8090 Ops/s | 0.8080 Ops/s | $\color{#35bf28}+0.13\\%$ | | test_serialize_weights | 0.1844s | 97.3870ms | 10.2683 Ops/s | 10.8954 Ops/s | $\textbf{\color{#d91a1a}-5.76\\%}$ | | test_serialize_weights_returnearly | 0.2938s | 70.1856ms | 14.2479 Ops/s | 17.6011 Ops/s | $\textbf{\color{#d91a1a}-19.05\\%}$ | | test_serialize_weights_pickle | 1.4009s | 1.2442s | 0.8037 Ops/s | 0.8081 Ops/s | $\color{#d91a1a}-0.55\\%$ | | test_reshape_pytree | 65.0710μs | 40.6542μs | 24.5977 KOps/s | 26.4850 KOps/s | $\textbf{\color{#d91a1a}-7.13\\%}$ | | test_reshape_td | 0.2411ms | 47.2565μs | 21.1611 KOps/s | 23.6024 KOps/s | $\textbf{\color{#d91a1a}-10.34\\%}$ | | test_view_pytree | 80.7420μs | 40.1196μs | 24.9254 KOps/s | 26.6299 KOps/s | $\textbf{\color{#d91a1a}-6.40\\%}$ | | test_view_td | 0.2808ms | 52.1218μs | 19.1858 KOps/s | 20.8998 KOps/s | $\textbf{\color{#d91a1a}-8.20\\%}$ | | test_unbind_pytree | 0.1298ms | 38.5052μs | 25.9705 KOps/s | 27.0123 KOps/s | $\color{#d91a1a}-3.86\\%$ | | test_unbind_td | 0.4289ms | 47.0562μs | 21.2512 KOps/s | 22.3212 KOps/s | $\color{#d91a1a}-4.79\\%$ | | test_split_pytree | 0.3474ms | 52.7021μs | 18.9746 KOps/s | 20.1790 KOps/s | $\textbf{\color{#d91a1a}-5.97\\%}$ | | test_split_td | 0.3480ms | 64.4012μs | 15.5277 KOps/s | 16.9367 KOps/s | $\textbf{\color{#d91a1a}-8.32\\%}$ | | test_add_pytree | 0.2918ms | 64.7408μs | 15.4462 KOps/s | 16.1752 KOps/s | $\color{#d91a1a}-4.51\\%$ | | test_add_td | 0.3201ms | 98.5577μs | 10.1463 KOps/s | 10.0828 KOps/s | $\color{#35bf28}+0.63\\%$ | | test_compile_add_one_nested[tensordict-compile] | 0.4190ms | 0.2165ms | 4.6189 KOps/s | 4.7588 KOps/s | $\color{#d91a1a}-2.94\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.3304ms | 0.1760ms | 5.6818 KOps/s | 5.7447 KOps/s | $\color{#d91a1a}-1.10\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.1841ms | 0.1505ms | 6.6436 KOps/s | 6.7726 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.3489ms | 0.2044ms | 4.8930 KOps/s | 5.1439 KOps/s | $\color{#d91a1a}-4.88\\%$ | | test_compile_copy_nested[tensordict-compile] | 0.1592ms | 22.3428μs | 44.7571 KOps/s | 45.1396 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_compile_copy_nested[tensordict-eager] | 0.1256ms | 49.4729μs | 20.2131 KOps/s | 20.7855 KOps/s | $\color{#d91a1a}-2.75\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1110ms | 76.6694μs | 13.0430 KOps/s | 13.4753 KOps/s | $\color{#d91a1a}-3.21\\%$ | | test_compile_copy_nested[pytree-eager] | 81.0610μs | 62.0316μs | 16.1208 KOps/s | 16.3166 KOps/s | $\color{#d91a1a}-1.20\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.4516ms | 0.3330ms | 3.0026 KOps/s | 3.0461 KOps/s | $\color{#d91a1a}-1.43\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.3536ms | 0.2256ms | 4.4330 KOps/s | 4.4665 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.2086ms | 0.1361ms | 7.3454 KOps/s | 7.6647 KOps/s | $\color{#d91a1a}-4.17\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.2148ms | 64.0114μs | 15.6222 KOps/s | 15.7585 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.3835ms | 0.3339ms | 2.9950 KOps/s | 3.0342 KOps/s | $\color{#d91a1a}-1.29\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.8118ms | 0.6672ms | 1.4989 KOps/s | 1.5345 KOps/s | $\color{#d91a1a}-2.32\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.3823ms | 0.2754ms | 3.6307 KOps/s | 3.6594 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.4225ms | 0.3357ms | 2.9792 KOps/s | 3.0171 KOps/s | $\color{#d91a1a}-1.26\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.2197ms | 77.3118μs | 12.9346 KOps/s | 13.0832 KOps/s | $\color{#d91a1a}-1.14\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.2947ms | 0.1358ms | 7.3621 KOps/s | 7.3638 KOps/s | $\color{#d91a1a}-0.02\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.7522ms | 0.5755ms | 1.7376 KOps/s | 1.7952 KOps/s | $\color{#d91a1a}-3.21\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.4559ms | 0.3344ms | 2.9904 KOps/s | 3.0439 KOps/s | $\color{#d91a1a}-1.76\\%$ | | test_compile_copy_flat[tensordict-compile] | 0.1696ms | 18.9601μs | 52.7423 KOps/s | 54.3807 KOps/s | $\color{#d91a1a}-3.01\\%$ | | test_compile_copy_flat[tensordict-eager] | 56.8920μs | 32.5825μs | 30.6914 KOps/s | 31.2924 KOps/s | $\color{#d91a1a}-1.92\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1065ms | 77.2478μs | 12.9453 KOps/s | 13.0445 KOps/s | $\color{#d91a1a}-0.76\\%$ | | test_compile_copy_flat[pytree-eager] | 92.1510μs | 60.4317μs | 16.5476 KOps/s | 16.2252 KOps/s | $\color{#35bf28}+1.99\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.6209ms | 0.9540ms | 1.0482 KOps/s | 1.0677 KOps/s | $\color{#d91a1a}-1.83\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.6892ms | 3.4767ms | 287.6302 Ops/s | 305.4643 Ops/s | $\textbf{\color{#d91a1a}-5.84\\%}$ | | test_compile_assign_and_add[pytree-compile] | 2.5830ms | 0.9428ms | 1.0607 KOps/s | 1.0857 KOps/s | $\color{#d91a1a}-2.30\\%$ | | test_compile_assign_and_add[pytree-eager] | 3.5043ms | 3.4176ms | 292.6026 Ops/s | 308.6347 Ops/s | $\textbf{\color{#d91a1a}-5.19\\%}$ | | test_compile_indexing[tensor-tensordict-compile] | 0.1494ms | 0.1148ms | 8.7121 KOps/s | 8.6024 KOps/s | $\color{#35bf28}+1.27\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.2436ms | 65.3005μs | 15.3138 KOps/s | 15.3667 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.2693ms | 0.1078ms | 9.2791 KOps/s | 9.6376 KOps/s | $\color{#d91a1a}-3.72\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 0.1969ms | 48.0889μs | 20.7948 KOps/s | 22.7746 KOps/s | $\textbf{\color{#d91a1a}-8.69\\%}$ | | test_compile_indexing[tensor-pytree-compile] | 0.2596ms | 0.1095ms | 9.1365 KOps/s | 9.3770 KOps/s | $\color{#d91a1a}-2.56\\%$ | | test_compile_indexing[tensor-pytree-eager] | 0.2463ms | 48.1600μs | 20.7641 KOps/s | 21.5079 KOps/s | $\color{#d91a1a}-3.46\\%$ | | test_compile_indexing[slice-tensordict-compile] | 0.3142ms | 0.1468ms | 6.8119 KOps/s | 7.0114 KOps/s | $\color{#d91a1a}-2.85\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.1978ms | 28.7636μs | 34.7662 KOps/s | 37.7414 KOps/s | $\textbf{\color{#d91a1a}-7.88\\%}$ | | test_compile_indexing[slice-tensorclass-compile] | 0.3363ms | 0.1371ms | 7.2939 KOps/s | 7.3774 KOps/s | $\color{#d91a1a}-1.13\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 57.3210μs | 23.9720μs | 41.7153 KOps/s | 44.4213 KOps/s | $\textbf{\color{#d91a1a}-6.09\\%}$ | | test_compile_indexing[slice-pytree-compile] | 0.3161ms | 0.1361ms | 7.3463 KOps/s | 7.5514 KOps/s | $\color{#d91a1a}-2.72\\%$ | | test_compile_indexing[slice-pytree-eager] | 0.2251ms | 23.9694μs | 41.7199 KOps/s | 44.9119 KOps/s | $\textbf{\color{#d91a1a}-7.11\\%}$ | | test_compile_indexing[int-tensordict-compile] | 0.2831ms | 0.1455ms | 6.8707 KOps/s | 7.0146 KOps/s | $\color{#d91a1a}-2.05\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.4762ms | 28.0275μs | 35.6792 KOps/s | 37.6916 KOps/s | $\textbf{\color{#d91a1a}-5.34\\%}$ | | test_compile_indexing[int-tensorclass-compile] | 0.2848ms | 0.1366ms | 7.3181 KOps/s | 7.4073 KOps/s | $\color{#d91a1a}-1.20\\%$ | | test_compile_indexing[int-tensorclass-eager] | 46.7510μs | 24.0852μs | 41.5192 KOps/s | 44.7014 KOps/s | $\textbf{\color{#d91a1a}-7.12\\%}$ | | test_compile_indexing[int-pytree-compile] | 0.2837ms | 0.1363ms | 7.3342 KOps/s | 7.5358 KOps/s | $\color{#d91a1a}-2.67\\%$ | | test_compile_indexing[int-pytree-eager] | 89.4210μs | 23.8598μs | 41.9115 KOps/s | 44.8001 KOps/s | $\textbf{\color{#d91a1a}-6.45\\%}$ | | test_mod_add[eager] | 0.1864ms | 41.9933μs | 23.8133 KOps/s | 25.8989 KOps/s | $\textbf{\color{#d91a1a}-8.05\\%}$ | | test_mod_add[compile] | 0.1106ms | 75.2154μs | 13.2952 KOps/s | 14.3854 KOps/s | $\textbf{\color{#d91a1a}-7.58\\%}$ | | test_mod_add[compile-overhead] | 0.2598ms | 0.1491ms | 6.7071 KOps/s | 6.6820 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_mod_wrap[eager] | 0.4037ms | 0.2683ms | 3.7277 KOps/s | 3.9145 KOps/s | $\color{#d91a1a}-4.77\\%$ | | test_mod_wrap[compile] | 0.4415ms | 0.2972ms | 3.3651 KOps/s | 3.4240 KOps/s | $\color{#d91a1a}-1.72\\%$ | | test_mod_wrap[compile-overhead] | 8.5468ms | 4.4022ms | 227.1612 Ops/s | 227.6520 Ops/s | $\color{#d91a1a}-0.22\\%$ | | test_mod_wrap_and_backward[eager] | 1.6014ms | 1.4382ms | 695.3220 Ops/s | 706.9091 Ops/s | $\color{#d91a1a}-1.64\\%$ | | test_mod_wrap_and_backward[compile] | 2.0009ms | 1.4637ms | 683.1979 Ops/s | 696.7020 Ops/s | $\color{#d91a1a}-1.94\\%$ | | test_mod_wrap_and_backward[compile-overhead] | 1.4872ms | 1.0067ms | 993.3635 Ops/s | 1.0050 KOps/s | $\color{#d91a1a}-1.16\\%$ | | test_seq_add[eager] | 0.2916ms | 0.1221ms | 8.1913 KOps/s | 8.8686 KOps/s | $\textbf{\color{#d91a1a}-7.64\\%}$ | | test_seq_add[compile] | 0.2564ms | 88.2827μs | 11.3273 KOps/s | 11.5862 KOps/s | $\color{#d91a1a}-2.24\\%$ | | test_seq_add[compile-overhead] | 0.2910ms | 0.1297ms | 7.7117 KOps/s | 8.1838 KOps/s | $\textbf{\color{#d91a1a}-5.77\\%}$ | | test_seq_wrap[eager] | 0.6014ms | 0.4520ms | 2.2123 KOps/s | 2.3651 KOps/s | $\textbf{\color{#d91a1a}-6.46\\%}$ | | test_seq_wrap[compile] | 0.4863ms | 0.3477ms | 2.8762 KOps/s | 3.0901 KOps/s | $\textbf{\color{#d91a1a}-6.92\\%}$ | | test_seq_wrap[compile-overhead] | 0.3082s | 0.1477s | 6.7712 Ops/s | 6.6901 Ops/s | $\color{#35bf28}+1.21\\%$ | | test_func_call_runtime[False-eager] | 0.9491ms | 0.7821ms | 1.2785 KOps/s | 1.3755 KOps/s | $\textbf{\color{#d91a1a}-7.05\\%}$ | | test_func_call_runtime[False-compile] | 0.9675ms | 0.8225ms | 1.2158 KOps/s | 1.2550 KOps/s | $\color{#d91a1a}-3.13\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.4612ms | 0.3712ms | 2.6937 KOps/s | 2.7346 KOps/s | $\color{#d91a1a}-1.49\\%$ | | test_func_call_runtime[True-eager] | 1.2762ms | 0.9420ms | 1.0616 KOps/s | 1.0669 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_func_call_runtime[True-compile] | 1.0071ms | 0.8651ms | 1.1560 KOps/s | 1.1812 KOps/s | $\color{#d91a1a}-2.14\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.5297ms | 0.4127ms | 2.4231 KOps/s | 2.4631 KOps/s | $\color{#d91a1a}-1.63\\%$ | | test_func_call_cm_runtime[False-eager] | 0.8835ms | 0.7409ms | 1.3498 KOps/s | 1.3907 KOps/s | $\color{#d91a1a}-2.94\\%$ | | test_func_call_cm_runtime[False-compile] | 1.0007ms | 0.8568ms | 1.1671 KOps/s | 1.2489 KOps/s | $\textbf{\color{#d91a1a}-6.55\\%}$ | | test_func_call_cm_runtime[False-compile-overhead] | 0.4967ms | 0.3715ms | 2.6917 KOps/s | 2.7393 KOps/s | $\color{#d91a1a}-1.74\\%$ | | test_func_call_cm_runtime[True-eager] | 1.2233ms | 1.0531ms | 949.5605 Ops/s | 957.3900 Ops/s | $\color{#d91a1a}-0.82\\%$ | | test_func_call_cm_runtime[True-compile] | 1.1861ms | 1.0177ms | 982.5864 Ops/s | 998.6531 Ops/s | $\color{#d91a1a}-1.61\\%$ | | test_func_call_cm_runtime[True-compile-overhead] | 1.1836ms | 1.0179ms | 982.3950 Ops/s | 999.1158 Ops/s | $\color{#d91a1a}-1.67\\%$ | | test_distributed | 1.3315ms | 75.7076μs | 13.2087 KOps/s | 14.4939 KOps/s | $\textbf{\color{#d91a1a}-8.87\\%}$ | | test_tdmodule | 36.4510μs | 16.7010μs | 59.8768 KOps/s | 59.2259 KOps/s | $\color{#35bf28}+1.10\\%$ | | test_tdmodule_dispatch | 50.3510μs | 33.6933μs | 29.6795 KOps/s | 29.0738 KOps/s | $\color{#35bf28}+2.08\\%$ | | test_tdseq | 33.3400μs | 17.5092μs | 57.1129 KOps/s | 55.5835 KOps/s | $\color{#35bf28}+2.75\\%$ | | test_tdseq_dispatch | 56.4010μs | 36.1323μs | 27.6761 KOps/s | 26.8352 KOps/s | $\color{#35bf28}+3.13\\%$ | | test_instantiation_functorch | 2.2482ms | 2.0846ms | 479.6975 Ops/s | 500.9655 Ops/s | $\color{#d91a1a}-4.25\\%$ | | test_instantiation_td | 2.0965ms | 1.3409ms | 745.7872 Ops/s | 758.5437 Ops/s | $\color{#d91a1a}-1.68\\%$ | | test_exec_functorch | 0.3269ms | 0.2299ms | 4.3491 KOps/s | 4.4523 KOps/s | $\color{#d91a1a}-2.32\\%$ | | test_exec_functional_call | 0.3679ms | 0.2194ms | 4.5580 KOps/s | 4.4163 KOps/s | $\color{#35bf28}+3.21\\%$ | | test_exec_td | 0.3681ms | 0.2192ms | 4.5630 KOps/s | 4.2675 KOps/s | $\textbf{\color{#35bf28}+6.92\\%}$ | | test_exec_td_decorator | 1.1326ms | 0.2759ms | 3.6239 KOps/s | 3.5181 KOps/s | $\color{#35bf28}+3.01\\%$ | | test_vmap_mlp_speed[True-True] | 0.8089ms | 0.6664ms | 1.5006 KOps/s | 1.5263 KOps/s | $\color{#d91a1a}-1.68\\%$ | | test_vmap_mlp_speed[True-False] | 0.7828ms | 0.6612ms | 1.5124 KOps/s | 1.5312 KOps/s | $\color{#d91a1a}-1.23\\%$ | | test_vmap_mlp_speed[False-True] | 0.7305ms | 0.5747ms | 1.7399 KOps/s | 1.6947 KOps/s | $\color{#35bf28}+2.67\\%$ | | test_vmap_mlp_speed[False-False] | 0.6976ms | 0.5762ms | 1.7355 KOps/s | 1.6795 KOps/s | $\color{#35bf28}+3.34\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1793ms | 0.7161ms | 1.3965 KOps/s | 1.4255 KOps/s | $\color{#d91a1a}-2.03\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9039ms | 0.7143ms | 1.3999 KOps/s | 1.4248 KOps/s | $\color{#d91a1a}-1.75\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7975ms | 0.6158ms | 1.6239 KOps/s | 1.6441 KOps/s | $\color{#d91a1a}-1.23\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7816ms | 0.6155ms | 1.6247 KOps/s | 1.6166 KOps/s | $\color{#35bf28}+0.50\\%$ | | test_vmap_transformer_speed[True-True] | 8.8860ms | 8.7132ms | 114.7687 Ops/s | 116.8480 Ops/s | $\color{#d91a1a}-1.78\\%$ | | test_vmap_transformer_speed[True-False] | 8.8058ms | 8.6807ms | 115.1980 Ops/s | 116.9009 Ops/s | $\color{#d91a1a}-1.46\\%$ | | test_vmap_transformer_speed[False-True] | 8.7589ms | 8.6071ms | 116.1831 Ops/s | 118.5365 Ops/s | $\color{#d91a1a}-1.99\\%$ | | test_vmap_transformer_speed[False-False] | 8.7582ms | 8.6108ms | 116.1332 Ops/s | 118.4918 Ops/s | $\color{#d91a1a}-1.99\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 20.4419ms | 20.3342ms | 49.1783 Ops/s | 50.1799 Ops/s | $\color{#d91a1a}-2.00\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 20.3956ms | 20.1972ms | 49.5118 Ops/s | 50.2657 Ops/s | $\color{#d91a1a}-1.50\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 20.2327ms | 20.0699ms | 49.8258 Ops/s | 50.5194 Ops/s | $\color{#d91a1a}-1.37\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 20.2707ms | 20.0482ms | 49.8798 Ops/s | 50.6917 Ops/s | $\color{#d91a1a}-1.60\\%$ | | test_to_module_speed[True] | 2.2048ms | 1.1493ms | 870.0761 Ops/s | 878.6347 Ops/s | $\color{#d91a1a}-0.97\\%$ | | test_to_module_speed[False] | 1.2235ms | 1.1295ms | 885.3218 Ops/s | 880.3989 Ops/s | $\color{#35bf28}+0.56\\%$ | | test_tc_init | 70.0810μs | 38.2572μs | 26.1388 KOps/s | 24.7356 KOps/s | $\textbf{\color{#35bf28}+5.67\\%}$ | | test_tc_init_nested | 0.1081ms | 79.4155μs | 12.5920 KOps/s | 11.7731 KOps/s | $\textbf{\color{#35bf28}+6.96\\%}$ | | test_tc_first_layer_tensor | 3.8268μs | 0.8419μs | 1.1878 MOps/s | 1.2526 MOps/s | $\textbf{\color{#d91a1a}-5.18\\%}$ | | test_tc_first_layer_nontensor | 24.7510μs | 2.7790μs | 359.8481 KOps/s | 394.5480 KOps/s | $\textbf{\color{#d91a1a}-8.79\\%}$ | | test_tc_second_layer_tensor | 7.2137μs | 1.7059μs | 586.1856 KOps/s | 615.6377 KOps/s | $\color{#d91a1a}-4.78\\%$ | | test_tc_second_layer_nontensor | 0.1054ms | 3.6605μs | 273.1880 KOps/s | 297.0952 KOps/s | $\textbf{\color{#d91a1a}-8.05\\%}$ | | test_unbind | 0.3349s | 12.5307ms | 79.8038 Ops/s | 83.9787 Ops/s | $\color{#d91a1a}-4.97\\%$ | | test_full_like | 0.7543ms | 0.5778ms | 1.7308 KOps/s | 1.7324 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_zeros_like | 0.3383ms | 0.1978ms | 5.0556 KOps/s | 5.0553 KOps/s | $+0.01\\%$ | | test_ones_like | 0.3423ms | 0.1976ms | 5.0606 KOps/s | 5.0564 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_clone | 0.5429ms | 0.4153ms | 2.4077 KOps/s | 2.4144 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_squeeze | 30.1200μs | 11.6293μs | 85.9900 KOps/s | 90.1609 KOps/s | $\color{#d91a1a}-4.63\\%$ | | test_unsqueeze | 0.2527ms | 81.9667μs | 12.2001 KOps/s | 12.4515 KOps/s | $\color{#d91a1a}-2.02\\%$ | | test_split | 0.4280ms | 0.1769ms | 5.6540 KOps/s | 5.5142 KOps/s | $\color{#35bf28}+2.54\\%$ | | test_permute | 0.3266ms | 0.1903ms | 5.2556 KOps/s | 5.0555 KOps/s | $\color{#35bf28}+3.96\\%$ | | test_stack | 1.3655ms | 0.9153ms | 1.0926 KOps/s | 1.1336 KOps/s | $\color{#d91a1a}-3.62\\%$ | | test_cat | 1.3782ms | 1.2315ms | 812.0035 Ops/s | 812.1277 Ops/s | $\color{#d91a1a}-0.02\\%$ |