pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[Feature] grad and data for tensorclasses #904

Closed vmoens closed 1 month ago

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}24$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 51.0260μs | 21.1951μs | 47.1807 KOps/s | 46.3557 KOps/s | $\color{#35bf28}+1.78\\%$ | | test_plain_set_stack_nested | 54.4310μs | 21.5088μs | 46.4926 KOps/s | 46.1894 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_plain_set_nested_inplace | 66.0230μs | 23.4562μs | 42.6327 KOps/s | 42.1527 KOps/s | $\color{#35bf28}+1.14\\%$ | | test_plain_set_stack_nested_inplace | 79.0180μs | 23.4014μs | 42.7324 KOps/s | 42.4058 KOps/s | $\color{#35bf28}+0.77\\%$ | | test_items | 29.3850μs | 2.6703μs | 374.4874 KOps/s | 384.9355 KOps/s | $\color{#d91a1a}-2.71\\%$ | | test_items_nested | 0.5178ms | 0.3658ms | 2.7335 KOps/s | 2.7672 KOps/s | $\color{#d91a1a}-1.22\\%$ | | test_items_nested_locked | 0.4740ms | 0.3648ms | 2.7409 KOps/s | 2.7505 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_items_nested_leaf | 0.1751ms | 87.4353μs | 11.4370 KOps/s | 11.5714 KOps/s | $\color{#d91a1a}-1.16\\%$ | | test_items_stack_nested | 0.5989ms | 0.3632ms | 2.7532 KOps/s | 2.7484 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_items_stack_nested_leaf | 0.1525ms | 88.0684μs | 11.3548 KOps/s | 11.3414 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_items_stack_nested_locked | 1.4865ms | 0.3666ms | 2.7275 KOps/s | 2.7348 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_keys | 29.5650μs | 3.8694μs | 258.4408 KOps/s | 250.0362 KOps/s | $\color{#35bf28}+3.36\\%$ | | test_keys_nested | 0.2478ms | 0.1438ms | 6.9538 KOps/s | 6.9634 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_keys_nested_locked | 0.8173ms | 0.1498ms | 6.6751 KOps/s | 6.6394 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_keys_nested_leaf | 0.2116ms | 0.1226ms | 8.1594 KOps/s | 8.1674 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_keys_stack_nested | 0.2522ms | 0.1451ms | 6.8899 KOps/s | 6.9555 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_keys_stack_nested_leaf | 0.2143ms | 0.1229ms | 8.1343 KOps/s | 8.1141 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_keys_stack_nested_locked | 0.2411ms | 0.1494ms | 6.6944 KOps/s | 6.6516 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_values | 8.6060μs | 1.1520μs | 868.0416 KOps/s | 856.5357 KOps/s | $\color{#35bf28}+1.34\\%$ | | test_values_nested | 89.9570μs | 50.1450μs | 19.9422 KOps/s | 19.8873 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_values_nested_locked | 0.1297ms | 50.4074μs | 19.8384 KOps/s | 19.9931 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_values_nested_leaf | 82.2340μs | 45.3495μs | 22.0510 KOps/s | 22.3714 KOps/s | $\color{#d91a1a}-1.43\\%$ | | test_values_stack_nested | 94.5470μs | 50.6952μs | 19.7257 KOps/s | 19.7640 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_values_stack_nested_leaf | 0.1070ms | 45.7251μs | 21.8698 KOps/s | 22.2294 KOps/s | $\color{#d91a1a}-1.62\\%$ | | test_values_stack_nested_locked | 0.1336ms | 50.6324μs | 19.7502 KOps/s | 19.7601 KOps/s | $\color{#d91a1a}-0.05\\%$ | | test_membership | 2.4485μs | 0.7288μs | 1.3722 MOps/s | 1.1091 MOps/s | $\textbf{\color{#35bf28}+23.72\\%}$ | | test_membership_nested | 29.0350μs | 2.6812μs | 372.9733 KOps/s | 366.3534 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_membership_nested_leaf | 49.6520μs | 2.6898μs | 371.7808 KOps/s | 365.2147 KOps/s | $\color{#35bf28}+1.80\\%$ | | test_membership_stacked_nested | 23.2440μs | 2.6834μs | 372.6587 KOps/s | 359.7256 KOps/s | $\color{#35bf28}+3.60\\%$ | | test_membership_stacked_nested_leaf | 38.2210μs | 2.7370μs | 365.3602 KOps/s | 328.8534 KOps/s | $\textbf{\color{#35bf28}+11.10\\%}$ | | test_membership_nested_last | 30.6070μs | 4.0451μs | 247.2103 KOps/s | 249.4387 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_membership_nested_leaf_last | 23.7240μs | 4.0440μs | 247.2775 KOps/s | 247.2145 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_membership_stacked_nested_last | 35.7370μs | 4.5713μs | 218.7583 KOps/s | 252.3663 KOps/s | $\textbf{\color{#d91a1a}-13.32\\%}$ | | test_membership_stacked_nested_leaf_last | 24.2450μs | 4.6727μs | 214.0097 KOps/s | 248.4814 KOps/s | $\textbf{\color{#d91a1a}-13.87\\%}$ | | test_nested_getleaf | 34.1540μs | 10.9142μs | 91.6240 KOps/s | 90.8065 KOps/s | $\color{#35bf28}+0.90\\%$ | | test_nested_get | 41.3370μs | 10.5260μs | 95.0032 KOps/s | 97.2059 KOps/s | $\color{#d91a1a}-2.27\\%$ | | test_stacked_getleaf | 36.4580μs | 11.0031μs | 90.8832 KOps/s | 91.6695 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_stacked_get | 35.2860μs | 10.3175μs | 96.9229 KOps/s | 96.7558 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_nested_getitemleaf | 43.5710μs | 11.3649μs | 87.9899 KOps/s | 87.5971 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_nested_getitem | 46.4670μs | 10.5228μs | 95.0315 KOps/s | 95.7963 KOps/s | $\color{#d91a1a}-0.80\\%$ | | test_stacked_getitemleaf | 48.4410μs | 11.4198μs | 87.5670 KOps/s | 89.6079 KOps/s | $\color{#d91a1a}-2.28\\%$ | | test_stacked_getitem | 47.2780μs | 10.5456μs | 94.8266 KOps/s | 97.2471 KOps/s | $\color{#d91a1a}-2.49\\%$ | | test_lock_nested | 0.9992ms | 0.5075ms | 1.9703 KOps/s | 1.7058 KOps/s | $\textbf{\color{#35bf28}+15.50\\%}$ | | test_lock_stack_nested | 0.8167ms | 0.4819ms | 2.0752 KOps/s | 2.0634 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_unlock_nested | 0.8084ms | 0.4281ms | 2.3357 KOps/s | 1.9573 KOps/s | $\textbf{\color{#35bf28}+19.33\\%}$ | | test_unlock_stack_nested | 0.6438ms | 0.3975ms | 2.5157 KOps/s | 2.5124 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_flatten_speed | 0.2405ms | 0.1081ms | 9.2478 KOps/s | 9.4442 KOps/s | $\color{#d91a1a}-2.08\\%$ | | test_unflatten_speed | 0.5470ms | 0.4442ms | 2.2513 KOps/s | 2.2572 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_common_ops | 1.8211ms | 1.0812ms | 924.8665 Ops/s | 897.3385 Ops/s | $\color{#35bf28}+3.07\\%$ | | test_creation | 92.2520μs | 2.4792μs | 403.3505 KOps/s | 396.9578 KOps/s | $\color{#35bf28}+1.61\\%$ | | test_creation_empty | 54.6820μs | 17.4540μs | 57.2933 KOps/s | 53.9922 KOps/s | $\textbf{\color{#35bf28}+6.11\\%}$ | | test_creation_nested_1 | 63.0080μs | 21.0167μs | 47.5813 KOps/s | 46.1294 KOps/s | $\color{#35bf28}+3.15\\%$ | | test_creation_nested_2 | 57.7680μs | 24.7647μs | 40.3801 KOps/s | 38.9029 KOps/s | $\color{#35bf28}+3.80\\%$ | | test_clone | 72.2440μs | 16.9607μs | 58.9598 KOps/s | 57.2911 KOps/s | $\color{#35bf28}+2.91\\%$ | | test_getitem[int] | 0.9338ms | 12.6450μs | 79.0827 KOps/s | 78.5365 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_getitem[slice_int] | 0.1222ms | 31.8758μs | 31.3718 KOps/s | 30.8818 KOps/s | $\color{#35bf28}+1.59\\%$ | | test_getitem[range] | 0.2701ms | 55.9735μs | 17.8656 KOps/s | 17.5680 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_getitem[tuple] | 0.1838ms | 26.2366μs | 38.1147 KOps/s | 37.7443 KOps/s | $\color{#35bf28}+0.98\\%$ | | test_getitem[list] | 0.3431ms | 49.0621μs | 20.3823 KOps/s | 19.1984 KOps/s | $\textbf{\color{#35bf28}+6.17\\%}$ | | test_setitem_dim[int] | 55.9240μs | 31.0047μs | 32.2531 KOps/s | 31.1486 KOps/s | $\color{#35bf28}+3.55\\%$ | | test_setitem_dim[slice_int] | 0.1184ms | 66.9752μs | 14.9309 KOps/s | 14.1100 KOps/s | $\textbf{\color{#35bf28}+5.82\\%}$ | | test_setitem_dim[range] | 0.1307ms | 87.5227μs | 11.4256 KOps/s | 11.0484 KOps/s | $\color{#35bf28}+3.41\\%$ | | test_setitem_dim[tuple] | 85.1590μs | 54.7042μs | 18.2801 KOps/s | 17.3749 KOps/s | $\textbf{\color{#35bf28}+5.21\\%}$ | | test_setitem | 0.1104ms | 28.1436μs | 35.5320 KOps/s | 33.2765 KOps/s | $\textbf{\color{#35bf28}+6.78\\%}$ | | test_set | 0.1823ms | 27.3329μs | 36.5859 KOps/s | 34.6009 KOps/s | $\textbf{\color{#35bf28}+5.74\\%}$ | | test_set_shared | 3.3364ms | 0.2165ms | 4.6200 KOps/s | 4.6547 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_update | 0.1985ms | 33.6879μs | 29.6843 KOps/s | 27.8567 KOps/s | $\textbf{\color{#35bf28}+6.56\\%}$ | | test_update_nested | 0.1578ms | 43.2339μs | 23.1300 KOps/s | 22.0045 KOps/s | $\textbf{\color{#35bf28}+5.11\\%}$ | | test_update__nested | 0.1581ms | 34.0066μs | 29.4061 KOps/s | 28.9460 KOps/s | $\color{#35bf28}+1.59\\%$ | | test_set_nested | 0.1393ms | 30.1530μs | 33.1642 KOps/s | 31.3991 KOps/s | $\textbf{\color{#35bf28}+5.62\\%}$ | | test_set_nested_new | 0.1797ms | 34.8941μs | 28.6582 KOps/s | 27.1920 KOps/s | $\textbf{\color{#35bf28}+5.39\\%}$ | | test_select | 0.1221ms | 52.0430μs | 19.2149 KOps/s | 18.7216 KOps/s | $\color{#35bf28}+2.63\\%$ | | test_select_nested | 0.1528ms | 60.7521μs | 16.4603 KOps/s | 16.7550 KOps/s | $\color{#d91a1a}-1.76\\%$ | | test_exclude_nested | 0.1570ms | 80.4873μs | 12.4243 KOps/s | 12.6809 KOps/s | $\color{#d91a1a}-2.02\\%$ | | test_empty[True] | 0.7512ms | 0.3436ms | 2.9103 KOps/s | 2.9749 KOps/s | $\color{#d91a1a}-2.17\\%$ | | test_empty[False] | 13.7390μs | 1.2425μs | 804.8024 KOps/s | 796.8952 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_unbind_speed | 0.5130ms | 0.3222ms | 3.1040 KOps/s | 3.1136 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_unbind_speed_stack0 | 0.7449ms | 0.3205ms | 3.1198 KOps/s | 3.1544 KOps/s | $\color{#d91a1a}-1.10\\%$ | | test_unbind_speed_stack1 | 83.6750ms | 0.8287ms | 1.2067 KOps/s | 1.3022 KOps/s | $\textbf{\color{#d91a1a}-7.33\\%}$ | | test_split | 76.4589ms | 2.2146ms | 451.5464 Ops/s | 442.7136 Ops/s | $\color{#35bf28}+2.00\\%$ | | test_chunk | 78.5357ms | 2.2177ms | 450.9204 Ops/s | 410.3240 Ops/s | $\textbf{\color{#35bf28}+9.89\\%}$ | | test_creation[device0] | 4.1083ms | 0.1223ms | 8.1791 KOps/s | 8.2124 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_creation_from_tensor | 0.2574ms | 0.1185ms | 8.4391 KOps/s | 8.3805 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_add_one[memmap_tensor0] | 0.1606ms | 7.8955μs | 126.6552 KOps/s | 124.8608 KOps/s | $\color{#35bf28}+1.44\\%$ | | test_contiguous[memmap_tensor0] | 23.3440μs | 2.2221μs | 450.0332 KOps/s | 467.9648 KOps/s | $\color{#d91a1a}-3.83\\%$ | | test_stack[memmap_tensor0] | 78.0850μs | 5.9195μs | 168.9344 KOps/s | 169.4848 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_memmaptd_index | 1.2972ms | 0.4319ms | 2.3154 KOps/s | 2.3388 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_memmaptd_index_astensor | 0.7470ms | 0.5026ms | 1.9897 KOps/s | 1.9637 KOps/s | $\color{#35bf28}+1.32\\%$ | | test_memmaptd_index_op | 1.4077ms | 1.0214ms | 979.0341 Ops/s | 950.8310 Ops/s | $\color{#35bf28}+2.97\\%$ | | test_serialize_model | 0.2044s | 0.1407s | 7.1056 Ops/s | 7.8946 Ops/s | $\textbf{\color{#d91a1a}-9.99\\%}$ | | test_serialize_model_pickle | 0.4515s | 0.3966s | 2.5214 Ops/s | 2.4897 Ops/s | $\color{#35bf28}+1.27\\%$ | | test_serialize_weights | 0.1298s | 0.1245s | 8.0341 Ops/s | 7.1043 Ops/s | $\textbf{\color{#35bf28}+13.09\\%}$ | | test_serialize_weights_returnearly | 0.1852s | 0.1678s | 5.9585 Ops/s | 5.9026 Ops/s | $\color{#35bf28}+0.95\\%$ | | test_serialize_weights_pickle | 1.0533s | 0.7427s | 1.3464 Ops/s | 2.4235 Ops/s | $\textbf{\color{#d91a1a}-44.45\\%}$ | | test_serialize_weights_filesystem | 0.1550s | 0.1434s | 6.9743 Ops/s | 6.9419 Ops/s | $\color{#35bf28}+0.47\\%$ | | test_serialize_model_filesystem | 0.1548s | 0.1457s | 6.8647 Ops/s | 6.0947 Ops/s | $\textbf{\color{#35bf28}+12.63\\%}$ | | test_reshape_pytree | 85.7710μs | 39.6912μs | 25.1945 KOps/s | 26.0755 KOps/s | $\color{#d91a1a}-3.38\\%$ | | test_reshape_td | 0.1050ms | 50.5104μs | 19.7979 KOps/s | 20.1431 KOps/s | $\color{#d91a1a}-1.71\\%$ | | test_view_pytree | 88.3160μs | 39.6711μs | 25.2073 KOps/s | 25.1947 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_view_td | 0.1440ms | 57.5327μs | 17.3814 KOps/s | 17.2109 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_unbind_pytree | 97.2110μs | 36.0554μs | 27.7351 KOps/s | 27.3301 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_unbind_td | 0.3761ms | 47.4005μs | 21.0968 KOps/s | 20.8849 KOps/s | $\color{#35bf28}+1.01\\%$ | | test_split_pytree | 80.1400μs | 38.7093μs | 25.8336 KOps/s | 26.2727 KOps/s | $\color{#d91a1a}-1.67\\%$ | | test_split_td | 0.5560ms | 61.3841μs | 16.2909 KOps/s | 16.4323 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_add_pytree | 90.5190μs | 44.1528μs | 22.6486 KOps/s | 22.4159 KOps/s | $\color{#35bf28}+1.04\\%$ | | test_add_td | 0.1481ms | 80.3172μs | 12.4506 KOps/s | 11.8108 KOps/s | $\textbf{\color{#35bf28}+5.42\\%}$ | | test_distributed | 1.4906ms | 0.1315ms | 7.6017 KOps/s | 7.3497 KOps/s | $\color{#35bf28}+3.43\\%$ | | test_tdmodule | 55.6340μs | 15.8907μs | 62.9297 KOps/s | 57.0194 KOps/s | $\textbf{\color{#35bf28}+10.37\\%}$ | | test_tdmodule_dispatch | 57.8780μs | 33.9663μs | 29.4409 KOps/s | 27.1012 KOps/s | $\textbf{\color{#35bf28}+8.63\\%}$ | | test_tdseq | 34.0540μs | 17.8646μs | 55.9766 KOps/s | 52.1978 KOps/s | $\textbf{\color{#35bf28}+7.24\\%}$ | | test_tdseq_dispatch | 64.8910μs | 37.9470μs | 26.3525 KOps/s | 23.9626 KOps/s | $\textbf{\color{#35bf28}+9.97\\%}$ | | test_instantiation_functorch | 1.8680ms | 1.5864ms | 630.3631 Ops/s | 623.8587 Ops/s | $\color{#35bf28}+1.04\\%$ | | test_instantiation_td | 81.1004ms | 1.2600ms | 793.6748 Ops/s | 856.6215 Ops/s | $\textbf{\color{#d91a1a}-7.35\\%}$ | | test_exec_functorch | 0.3169ms | 0.1820ms | 5.4955 KOps/s | 5.3777 KOps/s | $\color{#35bf28}+2.19\\%$ | | test_exec_functional_call | 0.3317ms | 0.1730ms | 5.7795 KOps/s | 5.8070 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_exec_td | 0.2820ms | 0.1733ms | 5.7687 KOps/s | 5.4209 KOps/s | $\textbf{\color{#35bf28}+6.42\\%}$ | | test_exec_td_decorator | 1.0195ms | 0.2565ms | 3.8988 KOps/s | 3.8601 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_vmap_mlp_speed[True-True] | 1.0644ms | 0.6027ms | 1.6592 KOps/s | 1.6300 KOps/s | $\color{#35bf28}+1.79\\%$ | | test_vmap_mlp_speed[True-False] | 0.8950ms | 0.5963ms | 1.6770 KOps/s | 1.6479 KOps/s | $\color{#35bf28}+1.77\\%$ | | test_vmap_mlp_speed[False-True] | 0.7310ms | 0.4995ms | 2.0019 KOps/s | 1.9776 KOps/s | $\color{#35bf28}+1.23\\%$ | | test_vmap_mlp_speed[False-False] | 0.7960ms | 0.4956ms | 2.0178 KOps/s | 1.9839 KOps/s | $\color{#35bf28}+1.71\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.0688ms | 0.6886ms | 1.4523 KOps/s | 1.4249 KOps/s | $\color{#35bf28}+1.93\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 1.0353ms | 0.6887ms | 1.4521 KOps/s | 1.4359 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 1.0194ms | 0.5774ms | 1.7318 KOps/s | 1.7132 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.9246ms | 0.5804ms | 1.7229 KOps/s | 1.7162 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_to_module_speed[True] | 2.8180ms | 1.7979ms | 556.2123 Ops/s | 557.2429 Ops/s | $\color{#d91a1a}-0.18\\%$ | | test_to_module_speed[False] | 2.0787ms | 1.7650ms | 566.5763 Ops/s | 571.7137 Ops/s | $\color{#d91a1a}-0.90\\%$ | | test_tc_init | 88.7970μs | 44.0693μs | 22.6915 KOps/s | 23.5764 KOps/s | $\color{#d91a1a}-3.75\\%$ | | test_tc_init_nested | 0.1585ms | 87.0463μs | 11.4881 KOps/s | 11.4958 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_tc_first_layer_tensor | 34.0440μs | 9.2608μs | 107.9824 KOps/s | 111.7676 KOps/s | $\color{#d91a1a}-3.39\\%$ | | test_tc_first_layer_nontensor | 32.6210μs | 9.1976μs | 108.7246 KOps/s | 110.2235 KOps/s | $\color{#d91a1a}-1.36\\%$ | | test_tc_second_layer_tensor | 32.0700μs | 2.8612μs | 349.5059 KOps/s | 360.9619 KOps/s | $\color{#d91a1a}-3.17\\%$ | | test_tc_second_layer_nontensor | 37.0490μs | 10.3962μs | 96.1885 KOps/s | 99.4335 KOps/s | $\color{#d91a1a}-3.26\\%$ | | test_unbind | 97.3084ms | 12.8342ms | 77.9171 Ops/s | 73.0647 Ops/s | $\textbf{\color{#35bf28}+6.64\\%}$ | | test_full_like | 11.7384ms | 7.9320ms | 126.0722 Ops/s | 134.4754 Ops/s | $\textbf{\color{#d91a1a}-6.25\\%}$ | | test_zeros_like | 11.5188ms | 7.6050ms | 131.4928 Ops/s | 143.3878 Ops/s | $\textbf{\color{#d91a1a}-8.30\\%}$ | | test_ones_like | 12.0563ms | 7.5870ms | 131.8052 Ops/s | 126.5863 Ops/s | $\color{#35bf28}+4.12\\%$ | | test_clone | 12.8221ms | 8.8912ms | 112.4714 Ops/s | 111.5478 Ops/s | $\color{#35bf28}+0.83\\%$ | | test_squeeze | 70.1910μs | 14.0237μs | 71.3080 KOps/s | 70.5200 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_unsqueeze | 0.2108ms | 97.0056μs | 10.3087 KOps/s | 10.1014 KOps/s | $\color{#35bf28}+2.05\\%$ | | test_split | 0.4477ms | 0.2092ms | 4.7790 KOps/s | 4.8188 KOps/s | $\color{#d91a1a}-0.83\\%$ | | test_permute | 0.3574ms | 0.2290ms | 4.3674 KOps/s | 4.4706 KOps/s | $\color{#d91a1a}-2.31\\%$ | | test_stack | 28.5600ms | 24.4114ms | 40.9644 Ops/s | 40.4560 Ops/s | $\color{#35bf28}+1.26\\%$ | | test_cat | 29.5394ms | 24.0248ms | 41.6236 Ops/s | 41.0165 Ops/s | $\color{#35bf28}+1.48\\%$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 40.1010μs | 16.6648μs | 60.0066 KOps/s | 56.9594 KOps/s | $\textbf{\color{#35bf28}+5.35\\%}$ | | test_plain_set_stack_nested | 0.1402ms | 16.7211μs | 59.8047 KOps/s | 56.7335 KOps/s | $\textbf{\color{#35bf28}+5.41\\%}$ | | test_plain_set_nested_inplace | 38.8410μs | 17.9962μs | 55.5673 KOps/s | 53.1535 KOps/s | $\color{#35bf28}+4.54\\%$ | | test_plain_set_stack_nested_inplace | 39.8310μs | 17.9466μs | 55.7208 KOps/s | 53.6446 KOps/s | $\color{#35bf28}+3.87\\%$ | | test_items | 17.4100μs | 4.7546μs | 210.3225 KOps/s | 211.0559 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_items_nested | 0.4440ms | 0.4008ms | 2.4952 KOps/s | 2.5619 KOps/s | $\color{#d91a1a}-2.60\\%$ | | test_items_nested_locked | 0.4222ms | 0.3998ms | 2.5012 KOps/s | 2.5249 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_items_nested_leaf | 0.1060ms | 85.7015μs | 11.6684 KOps/s | 11.5463 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_items_stack_nested | 0.4458ms | 0.3936ms | 2.5408 KOps/s | 2.5276 KOps/s | $\color{#35bf28}+0.53\\%$ | | test_items_stack_nested_leaf | 0.1016ms | 86.5591μs | 11.5528 KOps/s | 11.5199 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_items_stack_nested_locked | 0.4302ms | 0.4021ms | 2.4869 KOps/s | 2.5312 KOps/s | $\color{#d91a1a}-1.75\\%$ | | test_keys | 17.2800μs | 4.3639μs | 229.1506 KOps/s | 228.7743 KOps/s | $\color{#35bf28}+0.16\\%$ | | test_keys_nested | 85.9730μs | 67.4913μs | 14.8167 KOps/s | 15.1929 KOps/s | $\color{#d91a1a}-2.48\\%$ | | test_keys_nested_locked | 0.9158ms | 72.3816μs | 13.8157 KOps/s | 13.5941 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_keys_nested_leaf | 76.7720μs | 56.9047μs | 17.5732 KOps/s | 17.7862 KOps/s | $\color{#d91a1a}-1.20\\%$ | | test_keys_stack_nested | 0.2399ms | 66.5337μs | 15.0300 KOps/s | 15.0114 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_keys_stack_nested_leaf | 0.2236ms | 56.2978μs | 17.7627 KOps/s | 17.3896 KOps/s | $\color{#35bf28}+2.15\\%$ | | test_keys_stack_nested_locked | 0.2599ms | 71.9810μs | 13.8926 KOps/s | 13.8236 KOps/s | $\color{#35bf28}+0.50\\%$ | | test_values | 62.4450μs | 1.7493μs | 571.6460 KOps/s | 562.6350 KOps/s | $\color{#35bf28}+1.60\\%$ | | test_values_nested | 49.1610μs | 33.8985μs | 29.4998 KOps/s | 29.6296 KOps/s | $\color{#d91a1a}-0.44\\%$ | | test_values_nested_locked | 0.2309ms | 35.6621μs | 28.0410 KOps/s | 27.9598 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_values_nested_leaf | 52.3610μs | 30.1087μs | 33.2129 KOps/s | 33.0408 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_values_stack_nested | 57.6730μs | 34.6930μs | 28.8243 KOps/s | 29.1207 KOps/s | $\color{#d91a1a}-1.02\\%$ | | test_values_stack_nested_leaf | 0.1650ms | 30.8026μs | 32.4648 KOps/s | 32.6667 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_values_stack_nested_locked | 0.1367ms | 36.6219μs | 27.3061 KOps/s | 27.7118 KOps/s | $\color{#d91a1a}-1.46\\%$ | | test_membership | 1.3275μs | 0.5392μs | 1.8548 MOps/s | 1.8464 MOps/s | $\color{#35bf28}+0.45\\%$ | | test_membership_nested | 15.2210μs | 2.0978μs | 476.6923 KOps/s | 480.6010 KOps/s | $\color{#d91a1a}-0.81\\%$ | | test_membership_nested_leaf | 10.2505μs | 2.0172μs | 495.7457 KOps/s | 492.4195 KOps/s | $\color{#35bf28}+0.68\\%$ | | test_membership_stacked_nested | 19.5010μs | 2.0681μs | 483.5318 KOps/s | 478.4099 KOps/s | $\color{#35bf28}+1.07\\%$ | | test_membership_stacked_nested_leaf | 15.1300μs | 2.0815μs | 480.4138 KOps/s | 480.5678 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_membership_nested_last | 27.4010μs | 2.9802μs | 335.5507 KOps/s | 330.7265 KOps/s | $\color{#35bf28}+1.46\\%$ | | test_membership_nested_leaf_last | 27.6300μs | 2.9697μs | 336.7400 KOps/s | 330.8515 KOps/s | $\color{#35bf28}+1.78\\%$ | | test_membership_stacked_nested_last | 28.2210μs | 9.1427μs | 109.3775 KOps/s | 288.0346 KOps/s | $\textbf{\color{#d91a1a}-62.03\\%}$ | | test_membership_stacked_nested_leaf_last | 25.1810μs | 9.2178μs | 108.4861 KOps/s | 290.7614 KOps/s | $\textbf{\color{#d91a1a}-62.69\\%}$ | | test_nested_getleaf | 26.3300μs | 8.0977μs | 123.4926 KOps/s | 124.3144 KOps/s | $\color{#d91a1a}-0.66\\%$ | | test_nested_get | 23.1710μs | 7.6213μs | 131.2110 KOps/s | 132.2257 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_stacked_getleaf | 24.9810μs | 8.0739μs | 123.8556 KOps/s | 123.5019 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_stacked_get | 29.6410μs | 7.5431μs | 132.5716 KOps/s | 132.4665 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_nested_getitemleaf | 64.6610μs | 8.1834μs | 122.1988 KOps/s | 122.3977 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_nested_getitem | 23.2510μs | 7.7221μs | 129.4992 KOps/s | 129.8298 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_stacked_getitemleaf | 24.3410μs | 8.2016μs | 121.9271 KOps/s | 121.4170 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_stacked_getitem | 31.0110μs | 7.7392μs | 129.2129 KOps/s | 129.1854 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_lock_nested | 1.0553ms | 0.4738ms | 2.1108 KOps/s | 2.1170 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_lock_stack_nested | 0.5442ms | 0.4218ms | 2.3707 KOps/s | 2.2934 KOps/s | $\color{#35bf28}+3.37\\%$ | | test_unlock_nested | 0.8169ms | 0.3921ms | 2.5507 KOps/s | 2.5439 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_unlock_stack_nested | 0.5283ms | 0.3418ms | 2.9260 KOps/s | 2.8395 KOps/s | $\color{#35bf28}+3.05\\%$ | | test_flatten_speed | 0.2164ms | 0.1049ms | 9.5337 KOps/s | 9.4190 KOps/s | $\color{#35bf28}+1.22\\%$ | | test_unflatten_speed | 0.3125ms | 0.2914ms | 3.4314 KOps/s | 3.3808 KOps/s | $\color{#35bf28}+1.50\\%$ | | test_common_ops | 1.5964ms | 1.3379ms | 747.4414 Ops/s | 725.8187 Ops/s | $\color{#35bf28}+2.98\\%$ | | test_creation | 16.8510μs | 1.9599μs | 510.2184 KOps/s | 514.7404 KOps/s | $\color{#d91a1a}-0.88\\%$ | | test_creation_empty | 33.7910μs | 17.1420μs | 58.3362 KOps/s | 53.0736 KOps/s | $\textbf{\color{#35bf28}+9.92\\%}$ | | test_creation_nested_1 | 0.1149ms | 19.1035μs | 52.3463 KOps/s | 47.7541 KOps/s | $\textbf{\color{#35bf28}+9.62\\%}$ | | test_creation_nested_2 | 41.7710μs | 22.0749μs | 45.3003 KOps/s | 42.3061 KOps/s | $\textbf{\color{#35bf28}+7.08\\%}$ | | test_clone | 0.1774ms | 30.3060μs | 32.9968 KOps/s | 32.7868 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_getitem[int] | 1.2204ms | 16.5929μs | 60.2668 KOps/s | 59.7231 KOps/s | $\color{#35bf28}+0.91\\%$ | | test_getitem[slice_int] | 0.2035ms | 29.4975μs | 33.9012 KOps/s | 33.0570 KOps/s | $\color{#35bf28}+2.55\\%$ | | test_getitem[range] | 0.2385ms | 0.1123ms | 8.9043 KOps/s | 8.8276 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_getitem[tuple] | 0.1704ms | 25.0368μs | 39.9411 KOps/s | 37.9400 KOps/s | $\textbf{\color{#35bf28}+5.27\\%}$ | | test_getitem[list] | 0.2503ms | 0.1024ms | 9.7613 KOps/s | 9.1077 KOps/s | $\textbf{\color{#35bf28}+7.18\\%}$ | | test_setitem_dim[int] | 0.1795ms | 53.7644μs | 18.5997 KOps/s | 16.9572 KOps/s | $\textbf{\color{#35bf28}+9.69\\%}$ | | test_setitem_dim[slice_int] | 0.2281ms | 82.2584μs | 12.1568 KOps/s | 11.7789 KOps/s | $\color{#35bf28}+3.21\\%$ | | test_setitem_dim[range] | 0.3050ms | 0.1467ms | 6.8177 KOps/s | 6.6574 KOps/s | $\color{#35bf28}+2.41\\%$ | | test_setitem_dim[tuple] | 0.2371ms | 73.9016μs | 13.5315 KOps/s | 12.9827 KOps/s | $\color{#35bf28}+4.23\\%$ | | test_setitem | 0.2289ms | 47.7231μs | 20.9542 KOps/s | 20.4589 KOps/s | $\color{#35bf28}+2.42\\%$ | | test_set | 0.2172ms | 46.4533μs | 21.5270 KOps/s | 20.7360 KOps/s | $\color{#35bf28}+3.81\\%$ | | test_set_shared | 0.4194ms | 54.6707μs | 18.2913 KOps/s | 17.7905 KOps/s | $\color{#35bf28}+2.81\\%$ | | test_update | 0.2011ms | 51.0447μs | 19.5907 KOps/s | 17.6888 KOps/s | $\textbf{\color{#35bf28}+10.75\\%}$ | | test_update_nested | 0.2461ms | 63.3124μs | 15.7947 KOps/s | 15.2540 KOps/s | $\color{#35bf28}+3.54\\%$ | | test_update__nested | 0.2423ms | 65.6895μs | 15.2231 KOps/s | 14.7663 KOps/s | $\color{#35bf28}+3.09\\%$ | | test_set_nested | 0.2234ms | 48.6427μs | 20.5581 KOps/s | 19.7251 KOps/s | $\color{#35bf28}+4.22\\%$ | | test_set_nested_new | 0.2316ms | 52.2692μs | 19.1317 KOps/s | 18.3201 KOps/s | $\color{#35bf28}+4.43\\%$ | | test_select | 0.2298ms | 68.3825μs | 14.6236 KOps/s | 14.2181 KOps/s | $\color{#35bf28}+2.85\\%$ | | test_select_nested | 0.1777ms | 53.3003μs | 18.7616 KOps/s | 18.4679 KOps/s | $\color{#35bf28}+1.59\\%$ | | test_exclude_nested | 0.1959ms | 72.6556μs | 13.7636 KOps/s | 13.5351 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_empty[True] | 0.3540ms | 0.3030ms | 3.3001 KOps/s | 3.3263 KOps/s | $\color{#d91a1a}-0.79\\%$ | | test_empty[False] | 2.2810μs | 0.9304μs | 1.0748 MOps/s | 1.0942 MOps/s | $\color{#d91a1a}-1.77\\%$ | | test_to | 0.1481ms | 38.5576μs | 25.9353 KOps/s | 26.1420 KOps/s | $\color{#d91a1a}-0.79\\%$ | | test_to_nonblocking | 0.1095ms | 24.7754μs | 40.3626 KOps/s | 42.0701 KOps/s | $\color{#d91a1a}-4.06\\%$ | | test_unbind_speed | 0.5051ms | 0.3078ms | 3.2485 KOps/s | 3.3027 KOps/s | $\color{#d91a1a}-1.64\\%$ | | test_unbind_speed_stack0 | 0.3999ms | 0.2946ms | 3.3944 KOps/s | 3.3024 KOps/s | $\color{#35bf28}+2.79\\%$ | | test_unbind_speed_stack1 | 93.2608ms | 0.8276ms | 1.2083 KOps/s | 1.2804 KOps/s | $\textbf{\color{#d91a1a}-5.63\\%}$ | | test_split | 92.1612ms | 2.3225ms | 430.5648 Ops/s | 435.0392 Ops/s | $\color{#d91a1a}-1.03\\%$ | | test_chunk | 2.3739ms | 2.1285ms | 469.8204 Ops/s | 430.9545 Ops/s | $\textbf{\color{#35bf28}+9.02\\%}$ | | test_creation[device0] | 0.2900ms | 0.1038ms | 9.6372 KOps/s | 9.5877 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_creation_from_tensor | 0.3145ms | 0.1060ms | 9.4317 KOps/s | 9.9745 KOps/s | $\textbf{\color{#d91a1a}-5.44\\%}$ | | test_add_one[memmap_tensor0] | 21.2510μs | 8.6594μs | 115.4819 KOps/s | 115.7026 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_contiguous[memmap_tensor0] | 0.1150ms | 2.1593μs | 463.1094 KOps/s | 461.2617 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_stack[memmap_tensor0] | 55.8410μs | 6.5297μs | 153.1463 KOps/s | 152.4144 KOps/s | $\color{#35bf28}+0.48\\%$ | | test_memmaptd_index | 1.3826ms | 0.4227ms | 2.3659 KOps/s | 2.3942 KOps/s | $\color{#d91a1a}-1.18\\%$ | | test_memmaptd_index_astensor | 0.8574ms | 0.4873ms | 2.0523 KOps/s | 2.0741 KOps/s | $\color{#d91a1a}-1.05\\%$ | | test_memmaptd_index_op | 1.4819ms | 1.0346ms | 966.5912 Ops/s | 965.4060 Ops/s | $\color{#35bf28}+0.12\\%$ | | test_serialize_model | 0.1009s | 97.0387ms | 10.3052 Ops/s | 10.0918 Ops/s | $\color{#35bf28}+2.11\\%$ | | test_serialize_model_pickle | 1.3475s | 1.2375s | 0.8081 Ops/s | 0.8072 Ops/s | $\color{#35bf28}+0.11\\%$ | | test_serialize_weights | 96.0035ms | 92.7856ms | 10.7775 Ops/s | 9.1239 Ops/s | $\textbf{\color{#35bf28}+18.12\\%}$ | | test_serialize_weights_returnearly | 89.4512ms | 72.8553ms | 13.7258 Ops/s | 14.0488 Ops/s | $\color{#d91a1a}-2.30\\%$ | | test_serialize_weights_pickle | 1.3513s | 1.2237s | 0.8172 Ops/s | 0.8182 Ops/s | $\color{#d91a1a}-0.13\\%$ | | test_reshape_pytree | 0.1838ms | 38.8866μs | 25.7158 KOps/s | 25.8300 KOps/s | $\color{#d91a1a}-0.44\\%$ | | test_reshape_td | 84.6120μs | 45.8119μs | 21.8284 KOps/s | 21.8215 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_view_pytree | 0.2745ms | 38.6121μs | 25.8986 KOps/s | 26.1689 KOps/s | $\color{#d91a1a}-1.03\\%$ | | test_view_td | 0.2486ms | 54.0185μs | 18.5122 KOps/s | 18.4220 KOps/s | $\color{#35bf28}+0.49\\%$ | | test_unbind_pytree | 0.1593ms | 36.6378μs | 27.2942 KOps/s | 26.2123 KOps/s | $\color{#35bf28}+4.13\\%$ | | test_unbind_td | 0.3751ms | 45.5833μs | 21.9379 KOps/s | 21.9666 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_split_pytree | 0.3469ms | 51.8724μs | 19.2781 KOps/s | 19.4907 KOps/s | $\color{#d91a1a}-1.09\\%$ | | test_split_td | 91.0628ms | 70.1126μs | 14.2628 KOps/s | 14.3498 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_add_pytree | 0.2050ms | 58.5318μs | 17.0847 KOps/s | 16.6386 KOps/s | $\color{#35bf28}+2.68\\%$ | | test_add_td | 0.4172ms | 0.1038ms | 9.6298 KOps/s | 9.6474 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_compile_add_one_nested[tensordict-compile] | 0.4137ms | 0.2062ms | 4.8489 KOps/s | 4.8347 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.3188ms | 0.1758ms | 5.6875 KOps/s | 5.7340 KOps/s | $\color{#d91a1a}-0.81\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.2845ms | 0.1439ms | 6.9495 KOps/s | 6.9284 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.3647ms | 0.1942ms | 5.1503 KOps/s | 5.2010 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_compile_copy_nested[tensordict-compile] | 0.1515ms | 21.5861μs | 46.3260 KOps/s | 45.8068 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_compile_copy_nested[tensordict-eager] | 0.1885ms | 48.4879μs | 20.6237 KOps/s | 20.6822 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1598ms | 72.5545μs | 13.7828 KOps/s | 13.8260 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_compile_copy_nested[pytree-eager] | 0.1208ms | 59.5846μs | 16.7829 KOps/s | 16.6599 KOps/s | $\color{#35bf28}+0.74\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.4343ms | 0.3242ms | 3.0849 KOps/s | 3.0942 KOps/s | $\color{#d91a1a}-0.30\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.3410ms | 0.2228ms | 4.4887 KOps/s | 4.4784 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.2926ms | 0.1344ms | 7.4425 KOps/s | 7.7399 KOps/s | $\color{#d91a1a}-3.84\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.2516ms | 66.5338μs | 15.0299 KOps/s | 15.7708 KOps/s | $\color{#d91a1a}-4.70\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.4283ms | 0.3244ms | 3.0825 KOps/s | 3.1017 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.8663ms | 0.6635ms | 1.5071 KOps/s | 1.6174 KOps/s | $\textbf{\color{#d91a1a}-6.82\\%}$ | | test_compile_add_self_flat[tensordict-eager] | 0.4752ms | 0.2758ms | 3.6259 KOps/s | 3.6601 KOps/s | $\color{#d91a1a}-0.93\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.4904ms | 0.3276ms | 3.0529 KOps/s | 3.0578 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.2813ms | 79.7626μs | 12.5372 KOps/s | 12.6710 KOps/s | $\color{#d91a1a}-1.06\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.2977ms | 0.1340ms | 7.4638 KOps/s | 7.4508 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.7553ms | 0.5441ms | 1.8378 KOps/s | 1.8762 KOps/s | $\color{#d91a1a}-2.05\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.4458ms | 0.3221ms | 3.1042 KOps/s | 3.0826 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_compile_copy_flat[tensordict-compile] | 0.1419ms | 18.6080μs | 53.7404 KOps/s | 51.2897 KOps/s | $\color{#35bf28}+4.78\\%$ | | test_compile_copy_flat[tensordict-eager] | 67.1420μs | 32.9510μs | 30.3481 KOps/s | 30.7431 KOps/s | $\color{#d91a1a}-1.28\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1097ms | 74.9015μs | 13.3509 KOps/s | 13.0662 KOps/s | $\color{#35bf28}+2.18\\%$ | | test_compile_copy_flat[pytree-eager] | 92.4730μs | 60.4134μs | 16.5526 KOps/s | 16.3672 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.7822ms | 0.9734ms | 1.0273 KOps/s | 1.0475 KOps/s | $\color{#d91a1a}-1.93\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.6439ms | 3.3066ms | 302.4240 Ops/s | 308.8549 Ops/s | $\color{#d91a1a}-2.08\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.5985ms | 0.9305ms | 1.0747 KOps/s | 1.0666 KOps/s | $\color{#35bf28}+0.76\\%$ | | test_compile_assign_and_add[pytree-eager] | 3.4319ms | 3.1719ms | 315.2719 Ops/s | 314.4910 Ops/s | $\color{#35bf28}+0.25\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.2813ms | 0.1128ms | 8.8664 KOps/s | 9.1573 KOps/s | $\color{#d91a1a}-3.18\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.2567ms | 66.7888μs | 14.9726 KOps/s | 16.2945 KOps/s | $\textbf{\color{#d91a1a}-8.11\\%}$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.2508ms | 0.1022ms | 9.7803 KOps/s | 9.8404 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 0.1952ms | 44.8062μs | 22.3184 KOps/s | 22.4887 KOps/s | $\color{#d91a1a}-0.76\\%$ | | test_compile_indexing[tensor-pytree-compile] | 0.2494ms | 0.1024ms | 9.7612 KOps/s | 9.3526 KOps/s | $\color{#35bf28}+4.37\\%$ | | test_compile_indexing[tensor-pytree-eager] | 0.1923ms | 45.0072μs | 22.2187 KOps/s | 21.1833 KOps/s | $\color{#35bf28}+4.89\\%$ | | test_compile_indexing[slice-tensordict-compile] | 0.2761ms | 0.1383ms | 7.2308 KOps/s | 7.2459 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.1999ms | 26.2953μs | 38.0295 KOps/s | 38.7162 KOps/s | $\color{#d91a1a}-1.77\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.2741ms | 0.1294ms | 7.7304 KOps/s | 7.7279 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 0.1351ms | 22.5439μs | 44.3578 KOps/s | 44.9007 KOps/s | $\color{#d91a1a}-1.21\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.2802ms | 0.1294ms | 7.7274 KOps/s | 7.5000 KOps/s | $\color{#35bf28}+3.03\\%$ | | test_compile_indexing[slice-pytree-eager] | 53.8010μs | 22.8799μs | 43.7065 KOps/s | 44.3448 KOps/s | $\color{#d91a1a}-1.44\\%$ | | test_compile_indexing[int-tensordict-compile] | 0.2874ms | 0.1381ms | 7.2410 KOps/s | 7.2132 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.5210ms | 26.5643μs | 37.6445 KOps/s | 39.2206 KOps/s | $\color{#d91a1a}-4.02\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.2741ms | 0.1294ms | 7.7294 KOps/s | 7.6199 KOps/s | $\color{#35bf28}+1.44\\%$ | | test_compile_indexing[int-tensorclass-eager] | 0.1382ms | 23.0498μs | 43.3844 KOps/s | 44.9965 KOps/s | $\color{#d91a1a}-3.58\\%$ | | test_compile_indexing[int-pytree-compile] | 0.2809ms | 0.1293ms | 7.7350 KOps/s | 7.4246 KOps/s | $\color{#35bf28}+4.18\\%$ | | test_compile_indexing[int-pytree-eager] | 50.3610μs | 22.7091μs | 44.0352 KOps/s | 43.9705 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_mod_add[eager] | 0.1835ms | 37.4056μs | 26.7339 KOps/s | 26.4104 KOps/s | $\color{#35bf28}+1.22\\%$ | | test_mod_add[compile] | 0.2450ms | 68.4863μs | 14.6015 KOps/s | 14.7263 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_mod_add[compile-overhead] | 0.2624ms | 0.1450ms | 6.8959 KOps/s | 6.6611 KOps/s | $\color{#35bf28}+3.52\\%$ | | test_mod_wrap[eager] | 0.4181ms | 0.2506ms | 3.9911 KOps/s | 4.0149 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_mod_wrap[compile] | 0.4621ms | 0.2984ms | 3.3511 KOps/s | 3.3508 KOps/s | $+0.01\\%$ | | test_mod_wrap[compile-overhead] | 8.1936ms | 4.3655ms | 229.0689 Ops/s | 233.5367 Ops/s | $\color{#d91a1a}-1.91\\%$ | | test_mod_wrap_and_backward[eager] | 1.7401ms | 1.4185ms | 704.9905 Ops/s | 700.1131 Ops/s | $\color{#35bf28}+0.70\\%$ | | test_mod_wrap_and_backward[compile] | 1.7857ms | 1.4743ms | 678.2800 Ops/s | 675.1183 Ops/s | $\color{#35bf28}+0.47\\%$ | | test_mod_wrap_and_backward[compile-overhead] | 1.4662ms | 0.9955ms | 1.0046 KOps/s | 992.5771 Ops/s | $\color{#35bf28}+1.21\\%$ | | test_seq_add[eager] | 0.2524ms | 0.1088ms | 9.1880 KOps/s | 8.9020 KOps/s | $\color{#35bf28}+3.21\\%$ | | test_seq_add[compile] | 0.2637ms | 87.6502μs | 11.4090 KOps/s | 11.6768 KOps/s | $\color{#d91a1a}-2.29\\%$ | | test_seq_add[compile-overhead] | 0.3037ms | 0.1263ms | 7.9160 KOps/s | 8.2274 KOps/s | $\color{#d91a1a}-3.78\\%$ | | test_seq_wrap[eager] | 0.6386ms | 0.4416ms | 2.2643 KOps/s | 2.3535 KOps/s | $\color{#d91a1a}-3.79\\%$ | | test_seq_wrap[compile] | 1.5464ms | 0.3401ms | 2.9402 KOps/s | 3.0192 KOps/s | $\color{#d91a1a}-2.62\\%$ | | test_seq_wrap[compile-overhead] | 0.3066s | 0.1467s | 6.8159 Ops/s | 6.7561 Ops/s | $\color{#35bf28}+0.89\\%$ | | test_func_call_runtime[False-eager] | 0.9729ms | 0.7410ms | 1.3495 KOps/s | 1.3618 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_func_call_runtime[False-compile] | 0.9901ms | 0.8278ms | 1.2080 KOps/s | 1.2049 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.5346ms | 0.3713ms | 2.6935 KOps/s | 2.6920 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_func_call_runtime[True-eager] | 1.2591ms | 0.9927ms | 1.0074 KOps/s | 1.0121 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_func_call_runtime[True-compile] | 1.0333ms | 0.8712ms | 1.1478 KOps/s | 1.1567 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.5792ms | 0.4128ms | 2.4225 KOps/s | 2.4325 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_distributed | 2.5607ms | 72.9434μs | 13.7093 KOps/s | 14.2064 KOps/s | $\color{#d91a1a}-3.50\\%$ | | test_tdmodule | 38.7410μs | 16.6809μs | 59.9489 KOps/s | 59.0896 KOps/s | $\color{#35bf28}+1.45\\%$ | | test_tdmodule_dispatch | 53.0410μs | 33.7473μs | 29.6320 KOps/s | 29.0043 KOps/s | $\color{#35bf28}+2.16\\%$ | | test_tdseq | 32.8810μs | 16.9850μs | 58.8753 KOps/s | 57.6200 KOps/s | $\color{#35bf28}+2.18\\%$ | | test_tdseq_dispatch | 54.2610μs | 35.9868μs | 27.7880 KOps/s | 27.2601 KOps/s | $\color{#35bf28}+1.94\\%$ | | test_instantiation_functorch | 2.2367ms | 2.0166ms | 495.8894 Ops/s | 504.9812 Ops/s | $\color{#d91a1a}-1.80\\%$ | | test_instantiation_td | 2.0427ms | 1.3104ms | 763.1180 Ops/s | 767.2721 Ops/s | $\color{#d91a1a}-0.54\\%$ | | test_exec_functorch | 0.3969ms | 0.2303ms | 4.3419 KOps/s | 4.5002 KOps/s | $\color{#d91a1a}-3.52\\%$ | | test_exec_functional_call | 0.4288ms | 0.2303ms | 4.3413 KOps/s | 4.6229 KOps/s | $\textbf{\color{#d91a1a}-6.09\\%}$ | | test_exec_td | 0.4263ms | 0.2313ms | 4.3240 KOps/s | 4.6336 KOps/s | $\textbf{\color{#d91a1a}-6.68\\%}$ | | test_exec_td_decorator | 0.5029ms | 0.3053ms | 3.2754 KOps/s | 3.4324 KOps/s | $\color{#d91a1a}-4.57\\%$ | | test_vmap_mlp_speed[True-True] | 1.1326ms | 0.6959ms | 1.4371 KOps/s | 1.5111 KOps/s | $\color{#d91a1a}-4.90\\%$ | | test_vmap_mlp_speed[True-False] | 0.8748ms | 0.6935ms | 1.4419 KOps/s | 1.5160 KOps/s | $\color{#d91a1a}-4.89\\%$ | | test_vmap_mlp_speed[False-True] | 0.7970ms | 0.6104ms | 1.6382 KOps/s | 1.7002 KOps/s | $\color{#d91a1a}-3.65\\%$ | | test_vmap_mlp_speed[False-False] | 0.7994ms | 0.6070ms | 1.6474 KOps/s | 1.6603 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1838ms | 0.7565ms | 1.3219 KOps/s | 1.3534 KOps/s | $\color{#d91a1a}-2.32\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9612ms | 0.7567ms | 1.3215 KOps/s | 1.3580 KOps/s | $\color{#d91a1a}-2.69\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.9286ms | 0.6454ms | 1.5495 KOps/s | 1.5403 KOps/s | $\color{#35bf28}+0.60\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8328ms | 0.6400ms | 1.5625 KOps/s | 1.5365 KOps/s | $\color{#35bf28}+1.70\\%$ | | test_vmap_transformer_speed[True-True] | 9.2810ms | 8.6221ms | 115.9814 Ops/s | 116.6427 Ops/s | $\color{#d91a1a}-0.57\\%$ | | test_vmap_transformer_speed[True-False] | 8.7915ms | 8.5221ms | 117.3422 Ops/s | 116.9780 Ops/s | $\color{#35bf28}+0.31\\%$ | | test_vmap_transformer_speed[False-True] | 9.1360ms | 8.5545ms | 116.8972 Ops/s | 117.5954 Ops/s | $\color{#d91a1a}-0.59\\%$ | | test_vmap_transformer_speed[False-False] | 8.6737ms | 8.4344ms | 118.5621 Ops/s | 117.2465 Ops/s | $\color{#35bf28}+1.12\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 21.4650ms | 20.4838ms | 48.8192 Ops/s | 48.5833 Ops/s | $\color{#35bf28}+0.49\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 20.9730ms | 20.4156ms | 48.9822 Ops/s | 48.7020 Ops/s | $\color{#35bf28}+0.58\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 20.8915ms | 20.1900ms | 49.5294 Ops/s | 49.3271 Ops/s | $\color{#35bf28}+0.41\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 21.0686ms | 20.2677ms | 49.3395 Ops/s | 48.9703 Ops/s | $\color{#35bf28}+0.75\\%$ | | test_to_module_speed[True] | 1.6163ms | 1.4898ms | 671.2368 Ops/s | 670.0949 Ops/s | $\color{#35bf28}+0.17\\%$ | | test_to_module_speed[False] | 1.5940ms | 1.4652ms | 682.4839 Ops/s | 678.1156 Ops/s | $\color{#35bf28}+0.64\\%$ | | test_tc_init | 56.6120μs | 36.9251μs | 27.0818 KOps/s | 25.2378 KOps/s | $\textbf{\color{#35bf28}+7.31\\%}$ | | test_tc_init_nested | 0.1845ms | 76.5208μs | 13.0683 KOps/s | 12.1407 KOps/s | $\textbf{\color{#35bf28}+7.64\\%}$ | | test_tc_first_layer_tensor | 19.8310μs | 3.9787μs | 251.3371 KOps/s | 251.4609 KOps/s | $\color{#d91a1a}-0.05\\%$ | | test_tc_first_layer_nontensor | 26.4600μs | 3.9895μs | 250.6607 KOps/s | 248.2448 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_tc_second_layer_tensor | 6.1252μs | 1.3051μs | 766.1978 KOps/s | 776.4833 KOps/s | $\color{#d91a1a}-1.32\\%$ | | test_tc_second_layer_nontensor | 20.2700μs | 4.6085μs | 216.9886 KOps/s | 216.0545 KOps/s | $\color{#35bf28}+0.43\\%$ | | test_unbind | 0.3207s | 13.0766ms | 76.4727 Ops/s | 76.0125 Ops/s | $\color{#35bf28}+0.61\\%$ | | test_full_like | 0.7636ms | 0.5769ms | 1.7333 KOps/s | 1.7290 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_zeros_like | 0.3487ms | 0.1979ms | 5.0531 KOps/s | 5.0469 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_ones_like | 0.3595ms | 0.1979ms | 5.0520 KOps/s | 5.0510 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_clone | 0.5687ms | 0.4143ms | 2.4136 KOps/s | 2.4034 KOps/s | $\color{#35bf28}+0.43\\%$ | | test_squeeze | 0.1366ms | 11.6507μs | 85.8314 KOps/s | 84.8725 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_unsqueeze | 0.2810ms | 85.9941μs | 11.6287 KOps/s | 11.6640 KOps/s | $\color{#d91a1a}-0.30\\%$ | | test_split | 0.4912ms | 0.1850ms | 5.4044 KOps/s | 5.4564 KOps/s | $\color{#d91a1a}-0.95\\%$ | | test_permute | 0.3748ms | 0.2020ms | 4.9515 KOps/s | 5.0250 KOps/s | $\color{#d91a1a}-1.46\\%$ | | test_stack | 1.3801ms | 0.8985ms | 1.1130 KOps/s | 1.1143 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_cat | 1.3710ms | 1.2320ms | 811.6871 Ops/s | 811.6402 Ops/s | $+0.01\\%$ |