pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[Performance] Faster getattr in TC #912

Closed vmoens closed 1 month ago

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}35$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ------------------------------------ | | test_plain_set_nested | 48.6710μs | 21.8041μs | 45.8630 KOps/s | 42.0835 KOps/s | $\textbf{\color{#35bf28}+8.98\\%}$ | | test_plain_set_stack_nested | 73.9080μs | 22.4205μs | 44.6020 KOps/s | 41.2137 KOps/s | $\textbf{\color{#35bf28}+8.22\\%}$ | | test_plain_set_nested_inplace | 61.5440μs | 23.7477μs | 42.1093 KOps/s | 38.3694 KOps/s | $\textbf{\color{#35bf28}+9.75\\%}$ | | test_plain_set_stack_nested_inplace | 77.1750μs | 23.8726μs | 41.8890 KOps/s | 38.9293 KOps/s | $\textbf{\color{#35bf28}+7.60\\%}$ | | test_items | 15.7700μs | 2.6087μs | 383.3266 KOps/s | 365.0836 KOps/s | $\color{#35bf28}+5.00\\%$ | | test_items_nested | 0.6562ms | 0.3609ms | 2.7711 KOps/s | 2.6906 KOps/s | $\color{#35bf28}+2.99\\%$ | | test_items_nested_locked | 0.5116ms | 0.3589ms | 2.7866 KOps/s | 2.6709 KOps/s | $\color{#35bf28}+4.33\\%$ | | test_items_nested_leaf | 0.1653ms | 86.5919μs | 11.5484 KOps/s | 11.5234 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_items_stack_nested | 0.6381ms | 0.3600ms | 2.7778 KOps/s | 2.6400 KOps/s | $\textbf{\color{#35bf28}+5.22\\%}$ | | test_items_stack_nested_leaf | 0.1875ms | 84.1767μs | 11.8798 KOps/s | 11.3617 KOps/s | $\color{#35bf28}+4.56\\%$ | | test_items_stack_nested_locked | 0.6836ms | 0.3580ms | 2.7932 KOps/s | 2.6381 KOps/s | $\textbf{\color{#35bf28}+5.88\\%}$ | | test_keys | 32.5810μs | 4.0754μs | 245.3767 KOps/s | 248.1053 KOps/s | $\color{#d91a1a}-1.10\\%$ | | test_keys_nested | 0.2651ms | 0.1426ms | 7.0147 KOps/s | 7.0809 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_keys_nested_locked | 0.6391ms | 0.1490ms | 6.7098 KOps/s | 6.7416 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_keys_nested_leaf | 0.2327ms | 0.1211ms | 8.2547 KOps/s | 8.2745 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_keys_stack_nested | 0.2905ms | 0.1415ms | 7.0665 KOps/s | 6.9854 KOps/s | $\color{#35bf28}+1.16\\%$ | | test_keys_stack_nested_leaf | 0.2338ms | 0.1210ms | 8.2628 KOps/s | 8.3099 KOps/s | $\color{#d91a1a}-0.57\\%$ | | test_keys_stack_nested_locked | 0.3116ms | 0.1463ms | 6.8352 KOps/s | 6.8089 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_values | 6.5180μs | 1.1382μs | 878.5708 KOps/s | 848.9447 KOps/s | $\color{#35bf28}+3.49\\%$ | | test_values_nested | 89.8370μs | 49.2322μs | 20.3119 KOps/s | 20.1397 KOps/s | $\color{#35bf28}+0.85\\%$ | | test_values_nested_locked | 93.8550μs | 49.2429μs | 20.3075 KOps/s | 20.1731 KOps/s | $\color{#35bf28}+0.67\\%$ | | test_values_nested_leaf | 87.9740μs | 44.1857μs | 22.6317 KOps/s | 21.9344 KOps/s | $\color{#35bf28}+3.18\\%$ | | test_values_stack_nested | 92.7530μs | 50.3741μs | 19.8515 KOps/s | 19.8807 KOps/s | $\color{#d91a1a}-0.15\\%$ | | test_values_stack_nested_leaf | 0.1220ms | 43.9086μs | 22.7746 KOps/s | 21.7563 KOps/s | $\color{#35bf28}+4.68\\%$ | | test_values_stack_nested_locked | 0.1178ms | 50.6410μs | 19.7469 KOps/s | 20.0131 KOps/s | $\color{#d91a1a}-1.33\\%$ | | test_membership | 14.6970μs | 0.8937μs | 1.1189 MOps/s | 1.0889 MOps/s | $\color{#35bf28}+2.75\\%$ | | test_membership_nested | 24.9660μs | 2.6589μs | 376.0959 KOps/s | 371.7132 KOps/s | $\color{#35bf28}+1.18\\%$ | | test_membership_nested_leaf | 34.9950μs | 2.6952μs | 371.0304 KOps/s | 370.7200 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_membership_stacked_nested | 22.4120μs | 2.6434μs | 378.2965 KOps/s | 377.1788 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_membership_stacked_nested_leaf | 51.9470μs | 2.6363μs | 379.3182 KOps/s | 376.7437 KOps/s | $\color{#35bf28}+0.68\\%$ | | test_membership_nested_last | 27.0000μs | 3.9184μs | 255.2031 KOps/s | 249.0674 KOps/s | $\color{#35bf28}+2.46\\%$ | | test_membership_nested_leaf_last | 31.6290μs | 3.9455μs | 253.4547 KOps/s | 250.6813 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_membership_stacked_nested_last | 38.6720μs | 12.9984μs | 76.9326 KOps/s | 197.7809 KOps/s | $\textbf{\color{#d91a1a}-61.10\\%}$ | | test_membership_stacked_nested_leaf_last | 37.9110μs | 13.0037μs | 76.9013 KOps/s | 195.3151 KOps/s | $\textbf{\color{#d91a1a}-60.63\\%}$ | | test_nested_getleaf | 31.2280μs | 10.6570μs | 93.8348 KOps/s | 91.0042 KOps/s | $\color{#35bf28}+3.11\\%$ | | test_nested_get | 47.2680μs | 10.0479μs | 99.5229 KOps/s | 96.2575 KOps/s | $\color{#35bf28}+3.39\\%$ | | test_stacked_getleaf | 38.2010μs | 10.5458μs | 94.8248 KOps/s | 90.6867 KOps/s | $\color{#35bf28}+4.56\\%$ | | test_stacked_get | 41.2670μs | 10.0504μs | 99.4982 KOps/s | 96.0972 KOps/s | $\color{#35bf28}+3.54\\%$ | | test_nested_getitemleaf | 34.5740μs | 11.0636μs | 90.3861 KOps/s | 87.5504 KOps/s | $\color{#35bf28}+3.24\\%$ | | test_nested_getitem | 40.1850μs | 10.2017μs | 98.0230 KOps/s | 92.9702 KOps/s | $\textbf{\color{#35bf28}+5.43\\%}$ | | test_stacked_getitemleaf | 34.8050μs | 10.9501μs | 91.3233 KOps/s | 87.4752 KOps/s | $\color{#35bf28}+4.40\\%$ | | test_stacked_getitem | 46.9960μs | 10.1360μs | 98.6587 KOps/s | 94.5953 KOps/s | $\color{#35bf28}+4.30\\%$ | | test_lock_nested | 0.9202ms | 0.5114ms | 1.9555 KOps/s | 1.6811 KOps/s | $\textbf{\color{#35bf28}+16.32\\%}$ | | test_lock_stack_nested | 0.8634ms | 0.4621ms | 2.1641 KOps/s | 2.0208 KOps/s | $\textbf{\color{#35bf28}+7.09\\%}$ | | test_unlock_nested | 0.8171ms | 0.4358ms | 2.2945 KOps/s | 2.2754 KOps/s | $\color{#35bf28}+0.84\\%$ | | test_unlock_stack_nested | 0.6574ms | 0.3798ms | 2.6331 KOps/s | 2.4319 KOps/s | $\textbf{\color{#35bf28}+8.27\\%}$ | | test_flatten_speed | 0.6057ms | 0.1057ms | 9.4566 KOps/s | 9.5258 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_unflatten_speed | 0.9702ms | 0.4503ms | 2.2207 KOps/s | 2.2320 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_common_ops | 8.4438ms | 1.1529ms | 867.3990 Ops/s | 838.7202 Ops/s | $\color{#35bf28}+3.42\\%$ | | test_creation | 29.3640μs | 2.4223μs | 412.8381 KOps/s | 415.8317 KOps/s | $\color{#d91a1a}-0.72\\%$ | | test_creation_empty | 56.2050μs | 19.5912μs | 51.0434 KOps/s | 47.4745 KOps/s | $\textbf{\color{#35bf28}+7.52\\%}$ | | test_creation_nested_1 | 56.4550μs | 23.4070μs | 42.7223 KOps/s | 40.3338 KOps/s | $\textbf{\color{#35bf28}+5.92\\%}$ | | test_creation_nested_2 | 59.9420μs | 26.8919μs | 37.1859 KOps/s | 34.5251 KOps/s | $\textbf{\color{#35bf28}+7.71\\%}$ | | test_clone | 90.2690μs | 17.2397μs | 58.0057 KOps/s | 55.9360 KOps/s | $\color{#35bf28}+3.70\\%$ | | test_getitem[int] | 0.9472ms | 12.6721μs | 78.9132 KOps/s | 75.9085 KOps/s | $\color{#35bf28}+3.96\\%$ | | test_getitem[slice_int] | 0.1283ms | 33.1432μs | 30.1721 KOps/s | 29.0740 KOps/s | $\color{#35bf28}+3.78\\%$ | | test_getitem[range] | 0.3296ms | 57.2005μs | 17.4824 KOps/s | 16.8011 KOps/s | $\color{#35bf28}+4.05\\%$ | | test_getitem[tuple] | 0.1334ms | 26.8538μs | 37.2386 KOps/s | 34.9187 KOps/s | $\textbf{\color{#35bf28}+6.64\\%}$ | | test_getitem[list] | 0.2077ms | 51.8211μs | 19.2971 KOps/s | 17.9759 KOps/s | $\textbf{\color{#35bf28}+7.35\\%}$ | | test_setitem_dim[int] | 73.4780μs | 34.9593μs | 28.6047 KOps/s | 25.8369 KOps/s | $\textbf{\color{#35bf28}+10.71\\%}$ | | test_setitem_dim[slice_int] | 0.1244ms | 73.1941μs | 13.6623 KOps/s | 12.8187 KOps/s | $\textbf{\color{#35bf28}+6.58\\%}$ | | test_setitem_dim[range] | 0.4780ms | 94.1842μs | 10.6175 KOps/s | 10.4480 KOps/s | $\color{#35bf28}+1.62\\%$ | | test_setitem_dim[tuple] | 0.1752ms | 60.7475μs | 16.4616 KOps/s | 15.7619 KOps/s | $\color{#35bf28}+4.44\\%$ | | test_setitem | 0.1648ms | 31.2242μs | 32.0265 KOps/s | 30.6977 KOps/s | $\color{#35bf28}+4.33\\%$ | | test_set | 0.1330ms | 29.9841μs | 33.3510 KOps/s | 31.7764 KOps/s | $\color{#35bf28}+4.96\\%$ | | test_set_shared | 1.2455ms | 0.2154ms | 4.6415 KOps/s | 4.4955 KOps/s | $\color{#35bf28}+3.25\\%$ | | test_update | 1.0283ms | 38.4625μs | 25.9993 KOps/s | 24.9887 KOps/s | $\color{#35bf28}+4.04\\%$ | | test_update_nested | 0.1857ms | 49.4839μs | 20.2086 KOps/s | 19.8088 KOps/s | $\color{#35bf28}+2.02\\%$ | | test_update__nested | 0.1211ms | 35.2372μs | 28.3791 KOps/s | 28.3099 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_set_nested | 0.1384ms | 33.2334μs | 30.0902 KOps/s | 28.7830 KOps/s | $\color{#35bf28}+4.54\\%$ | | test_set_nested_new | 0.1519ms | 38.2165μs | 26.1667 KOps/s | 25.0597 KOps/s | $\color{#35bf28}+4.42\\%$ | | test_select | 0.1865ms | 55.5547μs | 18.0003 KOps/s | 17.4248 KOps/s | $\color{#35bf28}+3.30\\%$ | | test_select_nested | 0.1084ms | 59.4218μs | 16.8289 KOps/s | 15.8404 KOps/s | $\textbf{\color{#35bf28}+6.24\\%}$ | | test_exclude_nested | 0.1453ms | 79.1853μs | 12.6286 KOps/s | 12.0555 KOps/s | $\color{#35bf28}+4.75\\%$ | | test_empty[True] | 0.7365ms | 0.3329ms | 3.0038 KOps/s | 2.9255 KOps/s | $\color{#35bf28}+2.68\\%$ | | test_empty[False] | 11.8320μs | 1.2278μs | 814.4922 KOps/s | 800.1100 KOps/s | $\color{#35bf28}+1.80\\%$ | | test_unbind_speed | 0.4551ms | 0.3223ms | 3.1026 KOps/s | 3.0217 KOps/s | $\color{#35bf28}+2.68\\%$ | | test_unbind_speed_stack0 | 0.4085ms | 0.3059ms | 3.2689 KOps/s | 3.0614 KOps/s | $\textbf{\color{#35bf28}+6.78\\%}$ | | test_unbind_speed_stack1 | 79.1735ms | 0.7954ms | 1.2572 KOps/s | 1.2751 KOps/s | $\color{#d91a1a}-1.40\\%$ | | test_split | 78.0785ms | 2.2659ms | 441.3236 Ops/s | 408.9659 Ops/s | $\textbf{\color{#35bf28}+7.91\\%}$ | | test_chunk | 87.2521ms | 2.3089ms | 433.1145 Ops/s | 473.8634 Ops/s | $\textbf{\color{#d91a1a}-8.60\\%}$ | | test_creation[device0] | 0.2125ms | 0.1204ms | 8.3089 KOps/s | 8.3724 KOps/s | $\color{#d91a1a}-0.76\\%$ | | test_creation_from_tensor | 4.0493ms | 0.1223ms | 8.1796 KOps/s | 8.2944 KOps/s | $\color{#d91a1a}-1.38\\%$ | | test_add_one[memmap_tensor0] | 0.1494ms | 8.1132μs | 123.2555 KOps/s | 122.4335 KOps/s | $\color{#35bf28}+0.67\\%$ | | test_contiguous[memmap_tensor0] | 22.0510μs | 2.1549μs | 464.0627 KOps/s | 459.1839 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_stack[memmap_tensor0] | 66.2240μs | 5.8649μs | 170.5069 KOps/s | 169.0129 KOps/s | $\color{#35bf28}+0.88\\%$ | | test_memmaptd_index | 1.3899ms | 0.4400ms | 2.2727 KOps/s | 2.0212 KOps/s | $\textbf{\color{#35bf28}+12.44\\%}$ | | test_memmaptd_index_astensor | 0.8002ms | 0.5161ms | 1.9376 KOps/s | 1.9399 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_memmaptd_index_op | 1.8814ms | 1.1045ms | 905.3749 Ops/s | 891.8623 Ops/s | $\color{#35bf28}+1.52\\%$ | | test_serialize_model | 0.2000s | 0.1413s | 7.0783 Ops/s | 7.6933 Ops/s | $\textbf{\color{#d91a1a}-7.99\\%}$ | | test_serialize_model_pickle | 0.4506s | 0.3929s | 2.5452 Ops/s | 2.5054 Ops/s | $\color{#35bf28}+1.59\\%$ | | test_serialize_weights | 0.1359s | 0.1255s | 7.9671 Ops/s | 6.9944 Ops/s | $\textbf{\color{#35bf28}+13.91\\%}$ | | test_serialize_weights_returnearly | 0.2383s | 0.1786s | 5.5992 Ops/s | 5.8645 Ops/s | $\color{#d91a1a}-4.52\\%$ | | test_serialize_weights_pickle | 0.4477s | 0.3993s | 2.5043 Ops/s | 1.0777 Ops/s | $\textbf{\color{#35bf28}+132.37\\%}$ | | test_serialize_weights_filesystem | 0.1506s | 0.1441s | 6.9381 Ops/s | 7.0086 Ops/s | $\color{#d91a1a}-1.00\\%$ | | test_serialize_model_filesystem | 0.2380s | 0.1674s | 5.9732 Ops/s | 6.4503 Ops/s | $\textbf{\color{#d91a1a}-7.40\\%}$ | | test_reshape_pytree | 0.1209ms | 39.1161μs | 25.5649 KOps/s | 26.1358 KOps/s | $\color{#d91a1a}-2.18\\%$ | | test_reshape_td | 0.1073ms | 50.2945μs | 19.8829 KOps/s | 19.9316 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_view_pytree | 86.4520μs | 38.9978μs | 25.6425 KOps/s | 25.9964 KOps/s | $\color{#d91a1a}-1.36\\%$ | | test_view_td | 0.1293ms | 56.5809μs | 17.6738 KOps/s | 17.7759 KOps/s | $\color{#d91a1a}-0.57\\%$ | | test_unbind_pytree | 86.4220μs | 35.8339μs | 27.9065 KOps/s | 28.1338 KOps/s | $\color{#d91a1a}-0.81\\%$ | | test_unbind_td | 0.4297ms | 48.7953μs | 20.4938 KOps/s | 20.0099 KOps/s | $\color{#35bf28}+2.42\\%$ | | test_split_pytree | 0.2470ms | 39.2723μs | 25.4632 KOps/s | 26.1513 KOps/s | $\color{#d91a1a}-2.63\\%$ | | test_split_td | 0.5967ms | 62.8893μs | 15.9009 KOps/s | 15.6986 KOps/s | $\color{#35bf28}+1.29\\%$ | | test_add_pytree | 0.1296ms | 45.1753μs | 22.1360 KOps/s | 22.4531 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_add_td | 0.4550ms | 89.7723μs | 11.1393 KOps/s | 10.8481 KOps/s | $\color{#35bf28}+2.68\\%$ | | test_distributed | 0.5189ms | 0.1297ms | 7.7091 KOps/s | 7.4232 KOps/s | $\color{#35bf28}+3.85\\%$ | | test_tdmodule | 35.9970μs | 17.1292μs | 58.3799 KOps/s | 51.5598 KOps/s | $\textbf{\color{#35bf28}+13.23\\%}$ | | test_tdmodule_dispatch | 82.5240μs | 36.2267μs | 27.6040 KOps/s | 25.0474 KOps/s | $\textbf{\color{#35bf28}+10.21\\%}$ | | test_tdseq | 39.2830μs | 18.9992μs | 52.6339 KOps/s | 47.2773 KOps/s | $\textbf{\color{#35bf28}+11.33\\%}$ | | test_tdseq_dispatch | 72.6660μs | 40.6950μs | 24.5731 KOps/s | 22.4953 KOps/s | $\textbf{\color{#35bf28}+9.24\\%}$ | | test_instantiation_functorch | 2.4234ms | 1.6169ms | 618.4583 Ops/s | 634.9100 Ops/s | $\color{#d91a1a}-2.59\\%$ | | test_instantiation_td | 1.7443ms | 1.1545ms | 866.1671 Ops/s | 859.8268 Ops/s | $\color{#35bf28}+0.74\\%$ | | test_exec_functorch | 0.3300ms | 0.1824ms | 5.4819 KOps/s | 5.5015 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_exec_functional_call | 0.4175ms | 0.1752ms | 5.7062 KOps/s | 5.8246 KOps/s | $\color{#d91a1a}-2.03\\%$ | | test_exec_td | 0.3017ms | 0.1737ms | 5.7579 KOps/s | 5.6866 KOps/s | $\color{#35bf28}+1.25\\%$ | | test_exec_td_decorator | 1.0032ms | 0.2587ms | 3.8662 KOps/s | 3.8898 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_vmap_mlp_speed[True-True] | 0.8915ms | 0.6206ms | 1.6114 KOps/s | 1.6139 KOps/s | $\color{#d91a1a}-0.15\\%$ | | test_vmap_mlp_speed[True-False] | 0.8413ms | 0.6161ms | 1.6230 KOps/s | 1.6185 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_vmap_mlp_speed[False-True] | 0.7085ms | 0.5075ms | 1.9705 KOps/s | 1.9801 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_vmap_mlp_speed[False-False] | 0.9183ms | 0.5079ms | 1.9688 KOps/s | 1.9842 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.0651ms | 0.7096ms | 1.4092 KOps/s | 1.4040 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 1.0238ms | 0.7105ms | 1.4076 KOps/s | 1.4025 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7673ms | 0.5879ms | 1.7009 KOps/s | 1.7153 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.9061ms | 0.5861ms | 1.7061 KOps/s | 1.7211 KOps/s | $\color{#d91a1a}-0.87\\%$ | | test_to_module_speed[True] | 2.4130ms | 1.7747ms | 563.4902 Ops/s | 555.1349 Ops/s | $\color{#35bf28}+1.51\\%$ | | test_to_module_speed[False] | 2.8264ms | 1.7543ms | 570.0194 Ops/s | 569.7421 Ops/s | $\color{#35bf28}+0.05\\%$ | | test_tc_init | 0.1115ms | 45.5300μs | 21.9635 KOps/s | 21.0350 KOps/s | $\color{#35bf28}+4.41\\%$ | | test_tc_init_nested | 0.1569ms | 90.1548μs | 11.0920 KOps/s | 10.5358 KOps/s | $\textbf{\color{#35bf28}+5.28\\%}$ | | test_tc_first_layer_tensor | 19.4070μs | 1.4136μs | 707.3973 KOps/s | 110.8788 KOps/s | $\textbf{\color{#35bf28}+537.99\\%}$ | | test_tc_first_layer_nontensor | 34.3240μs | 4.2217μs | 236.8737 KOps/s | 109.7121 KOps/s | $\textbf{\color{#35bf28}+115.90\\%}$ | | test_tc_second_layer_tensor | 51.5280μs | 2.5840μs | 386.9912 KOps/s | 359.2722 KOps/s | $\textbf{\color{#35bf28}+7.72\\%}$ | | test_tc_second_layer_nontensor | 37.8890μs | 5.4318μs | 184.1014 KOps/s | 99.3431 KOps/s | $\textbf{\color{#35bf28}+85.32\\%}$ | | test_unbind | 0.1052s | 13.6731ms | 73.1363 Ops/s | 68.9392 Ops/s | $\textbf{\color{#35bf28}+6.09\\%}$ | | test_full_like | 10.7494ms | 8.5776ms | 116.5825 Ops/s | 130.3227 Ops/s | $\textbf{\color{#d91a1a}-10.54\\%}$ | | test_zeros_like | 10.1934ms | 6.6972ms | 149.3161 Ops/s | 161.3835 Ops/s | $\textbf{\color{#d91a1a}-7.48\\%}$ | | test_ones_like | 15.4953ms | 7.2671ms | 137.6066 Ops/s | 150.7543 Ops/s | $\textbf{\color{#d91a1a}-8.72\\%}$ | | test_clone | 17.5536ms | 9.0322ms | 110.7154 Ops/s | 114.7542 Ops/s | $\color{#d91a1a}-3.52\\%$ | | test_squeeze | 63.1880μs | 13.0698μs | 76.5120 KOps/s | 69.2044 KOps/s | $\textbf{\color{#35bf28}+10.56\\%}$ | | test_unsqueeze | 0.2627ms | 93.3567μs | 10.7116 KOps/s | 9.9494 KOps/s | $\textbf{\color{#35bf28}+7.66\\%}$ | | test_split | 0.3562ms | 0.2019ms | 4.9523 KOps/s | 4.7357 KOps/s | $\color{#35bf28}+4.58\\%$ | | test_permute | 0.3506ms | 0.2192ms | 4.5617 KOps/s | 4.3871 KOps/s | $\color{#35bf28}+3.98\\%$ | | test_stack | 30.2920ms | 25.8650ms | 38.6623 Ops/s | 37.1753 Ops/s | $\color{#35bf28}+4.00\\%$ | | test_cat | 28.6122ms | 25.3270ms | 39.4835 Ops/s | 39.3746 Ops/s | $\color{#35bf28}+0.28\\%$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ------------------------------------ | | test_plain_set_nested | 0.1448ms | 17.4560μs | 57.2870 KOps/s | 57.7854 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_plain_set_stack_nested | 42.0310μs | 17.4978μs | 57.1501 KOps/s | 57.4298 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_plain_set_nested_inplace | 37.8910μs | 18.6186μs | 53.7097 KOps/s | 54.3742 KOps/s | $\color{#d91a1a}-1.22\\%$ | | test_plain_set_stack_nested_inplace | 39.6310μs | 18.5992μs | 53.7658 KOps/s | 54.3332 KOps/s | $\color{#d91a1a}-1.04\\%$ | | test_items | 21.8110μs | 4.6921μs | 213.1239 KOps/s | 211.1122 KOps/s | $\color{#35bf28}+0.95\\%$ | | test_items_nested | 0.4159ms | 0.3870ms | 2.5838 KOps/s | 2.5356 KOps/s | $\color{#35bf28}+1.90\\%$ | | test_items_nested_locked | 0.4569ms | 0.3867ms | 2.5857 KOps/s | 2.5203 KOps/s | $\color{#35bf28}+2.59\\%$ | | test_items_nested_leaf | 0.1096ms | 85.4716μs | 11.6998 KOps/s | 11.6988 KOps/s | $+0.01\\%$ | | test_items_stack_nested | 0.4187ms | 0.3853ms | 2.5953 KOps/s | 2.5367 KOps/s | $\color{#35bf28}+2.31\\%$ | | test_items_stack_nested_leaf | 0.1121ms | 86.2258μs | 11.5975 KOps/s | 11.4765 KOps/s | $\color{#35bf28}+1.05\\%$ | | test_items_stack_nested_locked | 0.5862ms | 0.3852ms | 2.5960 KOps/s | 2.5459 KOps/s | $\color{#35bf28}+1.97\\%$ | | test_keys | 70.0920μs | 4.3667μs | 229.0036 KOps/s | 228.5719 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_keys_nested | 0.1596ms | 67.3038μs | 14.8580 KOps/s | 14.7548 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_keys_nested_locked | 0.9373ms | 72.4470μs | 13.8032 KOps/s | 13.6871 KOps/s | $\color{#35bf28}+0.85\\%$ | | test_keys_nested_leaf | 76.8710μs | 57.2386μs | 17.4707 KOps/s | 17.2843 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_keys_stack_nested | 87.6620μs | 66.2858μs | 15.0862 KOps/s | 14.7932 KOps/s | $\color{#35bf28}+1.98\\%$ | | test_keys_stack_nested_leaf | 79.8710μs | 57.2106μs | 17.4793 KOps/s | 17.0457 KOps/s | $\color{#35bf28}+2.54\\%$ | | test_keys_stack_nested_locked | 0.1424ms | 71.6311μs | 13.9604 KOps/s | 13.7115 KOps/s | $\color{#35bf28}+1.82\\%$ | | test_values | 6.5770μs | 1.7829μs | 560.8788 KOps/s | 569.5654 KOps/s | $\color{#d91a1a}-1.53\\%$ | | test_values_nested | 48.7610μs | 33.5890μs | 29.7716 KOps/s | 29.1010 KOps/s | $\color{#35bf28}+2.30\\%$ | | test_values_nested_locked | 0.1050ms | 35.1477μs | 28.4514 KOps/s | 27.6085 KOps/s | $\color{#35bf28}+3.05\\%$ | | test_values_nested_leaf | 50.2010μs | 29.7067μs | 33.6625 KOps/s | 32.7833 KOps/s | $\color{#35bf28}+2.68\\%$ | | test_values_stack_nested | 65.9510μs | 34.3419μs | 29.1189 KOps/s | 28.3499 KOps/s | $\color{#35bf28}+2.71\\%$ | | test_values_stack_nested_leaf | 58.9610μs | 30.5719μs | 32.7098 KOps/s | 31.9955 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_values_stack_nested_locked | 55.3310μs | 36.0192μs | 27.7630 KOps/s | 27.0082 KOps/s | $\color{#35bf28}+2.79\\%$ | | test_membership | 9.7411μs | 0.5435μs | 1.8398 MOps/s | 1.8284 MOps/s | $\color{#35bf28}+0.62\\%$ | | test_membership_nested | 11.0150μs | 1.9649μs | 508.9406 KOps/s | 492.0799 KOps/s | $\color{#35bf28}+3.43\\%$ | | test_membership_nested_leaf | 12.0150μs | 1.9373μs | 516.1894 KOps/s | 500.5772 KOps/s | $\color{#35bf28}+3.12\\%$ | | test_membership_stacked_nested | 25.5800μs | 2.0227μs | 494.3976 KOps/s | 491.8671 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_membership_stacked_nested_leaf | 19.6300μs | 1.9894μs | 502.6536 KOps/s | 478.1496 KOps/s | $\textbf{\color{#35bf28}+5.12\\%}$ | | test_membership_nested_last | 17.4300μs | 2.9625μs | 337.5531 KOps/s | 333.8223 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_membership_nested_leaf_last | 21.8110μs | 2.9760μs | 336.0201 KOps/s | 327.3755 KOps/s | $\color{#35bf28}+2.64\\%$ | | test_membership_stacked_nested_last | 23.5310μs | 5.7986μs | 172.4546 KOps/s | 335.4675 KOps/s | $\textbf{\color{#d91a1a}-48.59\\%}$ | | test_membership_stacked_nested_leaf_last | 30.8600μs | 5.7250μs | 174.6733 KOps/s | 337.7372 KOps/s | $\textbf{\color{#d91a1a}-48.28\\%}$ | | test_nested_getleaf | 23.2900μs | 7.9894μs | 125.1655 KOps/s | 121.8224 KOps/s | $\color{#35bf28}+2.74\\%$ | | test_nested_get | 23.0410μs | 7.5784μs | 131.9533 KOps/s | 128.9369 KOps/s | $\color{#35bf28}+2.34\\%$ | | test_stacked_getleaf | 27.1710μs | 8.0157μs | 124.7552 KOps/s | 121.2312 KOps/s | $\color{#35bf28}+2.91\\%$ | | test_stacked_get | 21.7700μs | 7.5383μs | 132.6553 KOps/s | 128.4753 KOps/s | $\color{#35bf28}+3.25\\%$ | | test_nested_getitemleaf | 23.8400μs | 8.1563μs | 122.6040 KOps/s | 119.2744 KOps/s | $\color{#35bf28}+2.79\\%$ | | test_nested_getitem | 77.3920μs | 7.6774μs | 130.2531 KOps/s | 126.2859 KOps/s | $\color{#35bf28}+3.14\\%$ | | test_stacked_getitemleaf | 22.7510μs | 8.1798μs | 122.2518 KOps/s | 119.1400 KOps/s | $\color{#35bf28}+2.61\\%$ | | test_stacked_getitem | 25.6900μs | 7.6980μs | 129.9040 KOps/s | 126.4465 KOps/s | $\color{#35bf28}+2.73\\%$ | | test_lock_nested | 9.7904ms | 0.4815ms | 2.0769 KOps/s | 2.0993 KOps/s | $\color{#d91a1a}-1.07\\%$ | | test_lock_stack_nested | 0.4745ms | 0.4305ms | 2.3228 KOps/s | 2.2703 KOps/s | $\color{#35bf28}+2.31\\%$ | | test_unlock_nested | 0.8755ms | 0.3912ms | 2.5563 KOps/s | 2.5270 KOps/s | $\color{#35bf28}+1.16\\%$ | | test_unlock_stack_nested | 0.4015ms | 0.3486ms | 2.8683 KOps/s | 2.7842 KOps/s | $\color{#35bf28}+3.02\\%$ | | test_flatten_speed | 89.8006ms | 0.1200ms | 8.3302 KOps/s | 9.4373 KOps/s | $\textbf{\color{#d91a1a}-11.73\\%}$ | | test_unflatten_speed | 0.4916ms | 0.2940ms | 3.4008 KOps/s | 3.3662 KOps/s | $\color{#35bf28}+1.03\\%$ | | test_common_ops | 1.7900ms | 1.3375ms | 747.6581 Ops/s | 770.7100 Ops/s | $\color{#d91a1a}-2.99\\%$ | | test_creation | 25.6410μs | 1.9857μs | 503.6086 KOps/s | 495.0497 KOps/s | $\color{#35bf28}+1.73\\%$ | | test_creation_empty | 38.7010μs | 18.5555μs | 53.8924 KOps/s | 56.3275 KOps/s | $\color{#d91a1a}-4.32\\%$ | | test_creation_nested_1 | 44.4500μs | 20.4046μs | 49.0086 KOps/s | 51.0109 KOps/s | $\color{#d91a1a}-3.93\\%$ | | test_creation_nested_2 | 54.2810μs | 23.5398μs | 42.4812 KOps/s | 43.9368 KOps/s | $\color{#d91a1a}-3.31\\%$ | | test_clone | 0.1805ms | 30.4638μs | 32.8259 KOps/s | 33.5671 KOps/s | $\color{#d91a1a}-2.21\\%$ | | test_getitem[int] | 1.0995ms | 16.8531μs | 59.3364 KOps/s | 58.6726 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_getitem[slice_int] | 0.1425ms | 28.2757μs | 35.3661 KOps/s | 35.0122 KOps/s | $\color{#35bf28}+1.01\\%$ | | test_getitem[range] | 0.2425ms | 0.1160ms | 8.6177 KOps/s | 8.6257 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_getitem[tuple] | 0.1428ms | 24.8593μs | 40.2264 KOps/s | 40.0205 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_getitem[list] | 0.2733ms | 0.1092ms | 9.1607 KOps/s | 9.5332 KOps/s | $\color{#d91a1a}-3.91\\%$ | | test_setitem_dim[int] | 0.2231ms | 55.5856μs | 17.9903 KOps/s | 19.2672 KOps/s | $\textbf{\color{#d91a1a}-6.63\\%}$ | | test_setitem_dim[slice_int] | 99.3420μs | 78.3315μs | 12.7663 KOps/s | 13.2204 KOps/s | $\color{#d91a1a}-3.44\\%$ | | test_setitem_dim[range] | 0.2955ms | 0.1429ms | 6.9969 KOps/s | 7.1010 KOps/s | $\color{#d91a1a}-1.47\\%$ | | test_setitem_dim[tuple] | 0.2115ms | 70.6934μs | 14.1456 KOps/s | 14.6265 KOps/s | $\color{#d91a1a}-3.29\\%$ | | test_setitem | 0.1966ms | 44.1773μs | 22.6361 KOps/s | 23.5437 KOps/s | $\color{#d91a1a}-3.86\\%$ | | test_set | 0.2452ms | 44.1160μs | 22.6675 KOps/s | 23.9603 KOps/s | $\textbf{\color{#d91a1a}-5.40\\%}$ | | test_set_shared | 0.3655ms | 53.0638μs | 18.8452 KOps/s | 18.8471 KOps/s | $-0.01\\%$ | | test_update | 0.2117ms | 52.3054μs | 19.1185 KOps/s | 19.6189 KOps/s | $\color{#d91a1a}-2.55\\%$ | | test_update_nested | 0.2089ms | 60.2791μs | 16.5895 KOps/s | 16.9300 KOps/s | $\color{#d91a1a}-2.01\\%$ | | test_update__nested | 0.2297ms | 60.3755μs | 16.5630 KOps/s | 16.4942 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_set_nested | 0.1907ms | 45.2705μs | 22.0894 KOps/s | 22.2948 KOps/s | $\color{#d91a1a}-0.92\\%$ | | test_set_nested_new | 0.1986ms | 50.0894μs | 19.9643 KOps/s | 20.8168 KOps/s | $\color{#d91a1a}-4.10\\%$ | | test_select | 0.2427ms | 65.1414μs | 15.3512 KOps/s | 15.3468 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_select_nested | 0.5067ms | 53.2721μs | 18.7715 KOps/s | 18.9825 KOps/s | $\color{#d91a1a}-1.11\\%$ | | test_exclude_nested | 0.2083ms | 71.1282μs | 14.0591 KOps/s | 14.0967 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_empty[True] | 0.3396ms | 0.2938ms | 3.4032 KOps/s | 3.4012 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_empty[False] | 2.3811μs | 0.9296μs | 1.0757 MOps/s | 1.0819 MOps/s | $\color{#d91a1a}-0.58\\%$ | | test_to | 0.1105ms | 37.4204μs | 26.7234 KOps/s | 26.8881 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_to_nonblocking | 68.4610μs | 22.7505μs | 43.9550 KOps/s | 43.9953 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_unbind_speed | 0.3620ms | 0.3038ms | 3.2911 KOps/s | 3.2152 KOps/s | $\color{#35bf28}+2.36\\%$ | | test_unbind_speed_stack0 | 0.3552ms | 0.2985ms | 3.3500 KOps/s | 3.3190 KOps/s | $\color{#35bf28}+0.93\\%$ | | test_unbind_speed_stack1 | 90.2073ms | 0.7700ms | 1.2987 KOps/s | 1.2624 KOps/s | $\color{#35bf28}+2.88\\%$ | | test_split | 91.5644ms | 2.3045ms | 433.9257 Ops/s | 437.3200 Ops/s | $\color{#d91a1a}-0.78\\%$ | | test_chunk | 2.2639ms | 2.1073ms | 474.5340 Ops/s | 436.9471 Ops/s | $\textbf{\color{#35bf28}+8.60\\%}$ | | test_creation[device0] | 0.2517ms | 0.1033ms | 9.6843 KOps/s | 9.3995 KOps/s | $\color{#35bf28}+3.03\\%$ | | test_creation_from_tensor | 0.2771ms | 0.1049ms | 9.5327 KOps/s | 9.6337 KOps/s | $\color{#d91a1a}-1.05\\%$ | | test_add_one[memmap_tensor0] | 76.6210μs | 8.6183μs | 116.0319 KOps/s | 116.8940 KOps/s | $\color{#d91a1a}-0.74\\%$ | | test_contiguous[memmap_tensor0] | 29.0400μs | 2.0198μs | 495.0914 KOps/s | 477.8760 KOps/s | $\color{#35bf28}+3.60\\%$ | | test_stack[memmap_tensor0] | 37.8010μs | 6.6636μs | 150.0681 KOps/s | 150.1708 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_memmaptd_index | 93.2466ms | 0.4739ms | 2.1101 KOps/s | 2.4091 KOps/s | $\textbf{\color{#d91a1a}-12.41\\%}$ | | test_memmaptd_index_astensor | 0.7361ms | 0.4792ms | 2.0867 KOps/s | 2.1085 KOps/s | $\color{#d91a1a}-1.03\\%$ | | test_memmaptd_index_op | 1.4566ms | 1.0602ms | 943.2178 Ops/s | 967.7877 Ops/s | $\color{#d91a1a}-2.54\\%$ | | test_serialize_model | 0.1010s | 96.1436ms | 10.4011 Ops/s | 10.1004 Ops/s | $\color{#35bf28}+2.98\\%$ | | test_serialize_model_pickle | 1.3731s | 1.2395s | 0.8068 Ops/s | 0.8059 Ops/s | $\color{#35bf28}+0.11\\%$ | | test_serialize_weights | 0.1846s | 0.1025s | 9.7518 Ops/s | 10.3485 Ops/s | $\textbf{\color{#d91a1a}-5.77\\%}$ | | test_serialize_weights_returnearly | 85.5912ms | 73.2055ms | 13.6602 Ops/s | 12.0513 Ops/s | $\textbf{\color{#35bf28}+13.35\\%}$ | | test_serialize_weights_pickle | 1.3538s | 1.2362s | 0.8089 Ops/s | 0.8086 Ops/s | $\color{#35bf28}+0.04\\%$ | | test_reshape_pytree | 71.9410μs | 37.9285μs | 26.3654 KOps/s | 26.1394 KOps/s | $\color{#35bf28}+0.86\\%$ | | test_reshape_td | 0.2399ms | 43.2227μs | 23.1360 KOps/s | 22.3790 KOps/s | $\color{#35bf28}+3.38\\%$ | | test_view_pytree | 0.1324ms | 37.7598μs | 26.4832 KOps/s | 26.7579 KOps/s | $\color{#d91a1a}-1.03\\%$ | | test_view_td | 0.2435ms | 50.1365μs | 19.9455 KOps/s | 20.8594 KOps/s | $\color{#d91a1a}-4.38\\%$ | | test_unbind_pytree | 0.1540ms | 36.4155μs | 27.4608 KOps/s | 27.2233 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_unbind_td | 0.3932ms | 46.0573μs | 21.7121 KOps/s | 21.4417 KOps/s | $\color{#35bf28}+1.26\\%$ | | test_split_pytree | 0.1473ms | 50.3570μs | 19.8582 KOps/s | 19.7133 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_split_td | 91.2343ms | 68.0815μs | 14.6883 KOps/s | 14.1837 KOps/s | $\color{#35bf28}+3.56\\%$ | | test_add_pytree | 0.2084ms | 58.2988μs | 17.1530 KOps/s | 16.9559 KOps/s | $\color{#35bf28}+1.16\\%$ | | test_add_td | 0.2978ms | 93.8740μs | 10.6526 KOps/s | 10.9451 KOps/s | $\color{#d91a1a}-2.67\\%$ | | test_compile_add_one_nested[tensordict-compile] | 0.4300ms | 0.2134ms | 4.6857 KOps/s | 4.7067 KOps/s | $\color{#d91a1a}-0.45\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.3219ms | 0.1749ms | 5.7171 KOps/s | 5.7126 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.2962ms | 0.1459ms | 6.8559 KOps/s | 6.8681 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.3617ms | 0.1910ms | 5.2360 KOps/s | 5.2366 KOps/s | $\color{#d91a1a}-0.01\\%$ | | test_compile_copy_nested[tensordict-compile] | 0.1482ms | 22.4414μs | 44.5605 KOps/s | 44.3211 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_compile_copy_nested[tensordict-eager] | 0.1722ms | 47.9677μs | 20.8474 KOps/s | 20.3978 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_compile_copy_nested[pytree-compile] | 0.2652ms | 73.2752μs | 13.6472 KOps/s | 13.6100 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_compile_copy_nested[pytree-eager] | 0.1215ms | 59.3726μs | 16.8428 KOps/s | 16.7066 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.4501ms | 0.3220ms | 3.1054 KOps/s | 3.1017 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.3774ms | 0.2221ms | 4.5032 KOps/s | 4.4821 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.2779ms | 0.1343ms | 7.4485 KOps/s | 7.6230 KOps/s | $\color{#d91a1a}-2.29\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.2543ms | 64.3272μs | 15.5455 KOps/s | 15.8714 KOps/s | $\color{#d91a1a}-2.05\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.4557ms | 0.3197ms | 3.1279 KOps/s | 3.1244 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.8129ms | 0.6214ms | 1.6092 KOps/s | 1.6260 KOps/s | $\color{#d91a1a}-1.03\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.4235ms | 0.2728ms | 3.6657 KOps/s | 3.6874 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.4540ms | 0.3206ms | 3.1195 KOps/s | 3.0842 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.2490ms | 75.9112μs | 13.1733 KOps/s | 13.1021 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.3003ms | 0.1363ms | 7.3362 KOps/s | 7.5740 KOps/s | $\color{#d91a1a}-3.14\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.7050ms | 0.5363ms | 1.8648 KOps/s | 1.8758 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.4647ms | 0.3193ms | 3.1315 KOps/s | 3.1135 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_compile_copy_flat[tensordict-compile] | 0.1565ms | 19.1317μs | 52.2691 KOps/s | 51.8794 KOps/s | $\color{#35bf28}+0.75\\%$ | | test_compile_copy_flat[tensordict-eager] | 0.1055ms | 32.4791μs | 30.7890 KOps/s | 31.0075 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_compile_copy_flat[pytree-compile] | 0.2224ms | 77.4828μs | 12.9061 KOps/s | 12.9582 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_compile_copy_flat[pytree-eager] | 82.2710μs | 60.5991μs | 16.5019 KOps/s | 16.6389 KOps/s | $\color{#d91a1a}-0.82\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.4501ms | 0.9056ms | 1.1042 KOps/s | 1.0794 KOps/s | $\color{#35bf28}+2.30\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.6655ms | 3.3402ms | 299.3816 Ops/s | 305.4258 Ops/s | $\color{#d91a1a}-1.98\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.4824ms | 0.9007ms | 1.1102 KOps/s | 1.1093 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_compile_assign_and_add[pytree-eager] | 3.4201ms | 3.2565ms | 307.0777 Ops/s | 310.6060 Ops/s | $\color{#d91a1a}-1.14\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.3101ms | 0.1124ms | 8.8950 KOps/s | 8.9228 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.3349ms | 66.3147μs | 15.0796 KOps/s | 16.1036 KOps/s | $\textbf{\color{#d91a1a}-6.36\\%}$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.2547ms | 0.1052ms | 9.5091 KOps/s | 9.2384 KOps/s | $\color{#35bf28}+2.93\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 0.2347ms | 48.2919μs | 20.7074 KOps/s | 21.1639 KOps/s | $\color{#d91a1a}-2.16\\%$ | | test_compile_indexing[tensor-pytree-compile] | 0.3180ms | 0.1104ms | 9.0566 KOps/s | 9.1778 KOps/s | $\color{#d91a1a}-1.32\\%$ | | test_compile_indexing[tensor-pytree-eager] | 0.2472ms | 49.5447μs | 20.1838 KOps/s | 21.2838 KOps/s | $\textbf{\color{#d91a1a}-5.17\\%}$ | | test_compile_indexing[slice-tensordict-compile] | 0.3187ms | 0.1454ms | 6.8776 KOps/s | 7.1426 KOps/s | $\color{#d91a1a}-3.71\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.1949ms | 26.6655μs | 37.5017 KOps/s | 38.0973 KOps/s | $\color{#d91a1a}-1.56\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.3091ms | 0.1332ms | 7.5058 KOps/s | 7.5423 KOps/s | $\color{#d91a1a}-0.48\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 0.1545ms | 22.5546μs | 44.3369 KOps/s | 44.8739 KOps/s | $\color{#d91a1a}-1.20\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.3162ms | 0.1363ms | 7.3387 KOps/s | 7.6492 KOps/s | $\color{#d91a1a}-4.06\\%$ | | test_compile_indexing[slice-pytree-eager] | 0.1822ms | 23.2561μs | 42.9996 KOps/s | 45.5046 KOps/s | $\textbf{\color{#d91a1a}-5.50\\%}$ | | test_compile_indexing[int-tensordict-compile] | 0.3163ms | 0.1421ms | 7.0364 KOps/s | 7.1447 KOps/s | $\color{#d91a1a}-1.51\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.4911ms | 26.1904μs | 38.1819 KOps/s | 38.0639 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.3325ms | 0.1336ms | 7.4868 KOps/s | 7.6525 KOps/s | $\color{#d91a1a}-2.16\\%$ | | test_compile_indexing[int-tensorclass-eager] | 0.1862ms | 22.6130μs | 44.2223 KOps/s | 45.3609 KOps/s | $\color{#d91a1a}-2.51\\%$ | | test_compile_indexing[int-pytree-compile] | 0.3038ms | 0.1323ms | 7.5585 KOps/s | 7.6380 KOps/s | $\color{#d91a1a}-1.04\\%$ | | test_compile_indexing[int-pytree-eager] | 0.1053ms | 22.4277μs | 44.5877 KOps/s | 45.2509 KOps/s | $\color{#d91a1a}-1.47\\%$ | | test_mod_add[eager] | 0.1915ms | 38.9899μs | 25.6477 KOps/s | 25.4349 KOps/s | $\color{#35bf28}+0.84\\%$ | | test_mod_add[compile] | 0.2240ms | 69.5873μs | 14.3704 KOps/s | 13.9912 KOps/s | $\color{#35bf28}+2.71\\%$ | | test_mod_add[compile-overhead] | 0.2631ms | 0.1465ms | 6.8260 KOps/s | 6.8381 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_mod_wrap[eager] | 0.4513ms | 0.2605ms | 3.8392 KOps/s | 3.8938 KOps/s | $\color{#d91a1a}-1.40\\%$ | | test_mod_wrap[compile] | 0.4920ms | 0.2976ms | 3.3601 KOps/s | 3.4177 KOps/s | $\color{#d91a1a}-1.69\\%$ | | test_mod_wrap[compile-overhead] | 8.5492ms | 4.5269ms | 220.8995 Ops/s | 218.6773 Ops/s | $\color{#35bf28}+1.02\\%$ | | test_mod_wrap_and_backward[eager] | 1.6464ms | 1.4750ms | 677.9839 Ops/s | 728.5973 Ops/s | $\textbf{\color{#d91a1a}-6.95\\%}$ | | test_mod_wrap_and_backward[compile] | 1.6624ms | 1.4587ms | 685.5387 Ops/s | 744.7444 Ops/s | $\textbf{\color{#d91a1a}-7.95\\%}$ | | test_mod_wrap_and_backward[compile-overhead] | 1.4342ms | 0.9890ms | 1.0112 KOps/s | 1.1121 KOps/s | $\textbf{\color{#d91a1a}-9.08\\%}$ | | test_seq_add[eager] | 0.2634ms | 0.1110ms | 9.0089 KOps/s | 9.0723 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_seq_add[compile] | 0.2881ms | 85.9749μs | 11.6313 KOps/s | 11.4856 KOps/s | $\color{#35bf28}+1.27\\%$ | | test_seq_add[compile-overhead] | 0.2713ms | 0.1233ms | 8.1080 KOps/s | 8.0640 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_seq_wrap[eager] | 0.5621ms | 0.4285ms | 2.3337 KOps/s | 2.3197 KOps/s | $\color{#35bf28}+0.60\\%$ | | test_seq_wrap[compile] | 0.4913ms | 0.3263ms | 3.0644 KOps/s | 3.1056 KOps/s | $\color{#d91a1a}-1.33\\%$ | | test_seq_wrap[compile-overhead] | 0.3107s | 0.1483s | 6.7425 Ops/s | 6.7474 Ops/s | $\color{#d91a1a}-0.07\\%$ | | test_func_call_runtime[False-eager] | 0.9371ms | 0.7478ms | 1.3372 KOps/s | 1.2788 KOps/s | $\color{#35bf28}+4.57\\%$ | | test_func_call_runtime[False-compile] | 0.9749ms | 0.8054ms | 1.2416 KOps/s | 1.2465 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.5904ms | 0.3614ms | 2.7670 KOps/s | 2.7658 KOps/s | $\color{#35bf28}+0.04\\%$ | | test_func_call_runtime[True-eager] | 1.1759ms | 1.0009ms | 999.0857 Ops/s | 1.0091 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_func_call_runtime[True-compile] | 1.0726ms | 0.8594ms | 1.1637 KOps/s | 1.1863 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.5175ms | 0.3998ms | 2.5013 KOps/s | 2.4981 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_distributed | 0.2898ms | 72.1940μs | 13.8516 KOps/s | 13.8643 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_tdmodule | 33.4020μs | 16.7459μs | 59.7160 KOps/s | 60.5200 KOps/s | $\color{#d91a1a}-1.33\\%$ | | test_tdmodule_dispatch | 55.8420μs | 34.5223μs | 28.9668 KOps/s | 30.1247 KOps/s | $\color{#d91a1a}-3.84\\%$ | | test_tdseq | 34.4800μs | 17.7721μs | 56.2679 KOps/s | 57.4458 KOps/s | $\color{#d91a1a}-2.05\\%$ | | test_tdseq_dispatch | 57.2110μs | 36.5973μs | 27.3244 KOps/s | 28.0958 KOps/s | $\color{#d91a1a}-2.75\\%$ | | test_instantiation_functorch | 2.1541ms | 1.9938ms | 501.5499 Ops/s | 510.1778 Ops/s | $\color{#d91a1a}-1.69\\%$ | | test_instantiation_td | 1.9811ms | 1.2893ms | 775.5990 Ops/s | 780.6956 Ops/s | $\color{#d91a1a}-0.65\\%$ | | test_exec_functorch | 0.3818ms | 0.2234ms | 4.4767 KOps/s | 4.5099 KOps/s | $\color{#d91a1a}-0.74\\%$ | | test_exec_functional_call | 0.4270ms | 0.2380ms | 4.2025 KOps/s | 4.3371 KOps/s | $\color{#d91a1a}-3.10\\%$ | | test_exec_td | 0.4179ms | 0.2379ms | 4.2031 KOps/s | 4.3401 KOps/s | $\color{#d91a1a}-3.16\\%$ | | test_exec_td_decorator | 0.6906ms | 0.3092ms | 3.2342 KOps/s | 3.3664 KOps/s | $\color{#d91a1a}-3.93\\%$ | | test_vmap_mlp_speed[True-True] | 0.9464ms | 0.7096ms | 1.4093 KOps/s | 1.4046 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_vmap_mlp_speed[True-False] | 0.8980ms | 0.7101ms | 1.4082 KOps/s | 1.4475 KOps/s | $\color{#d91a1a}-2.71\\%$ | | test_vmap_mlp_speed[False-True] | 0.8062ms | 0.6232ms | 1.6047 KOps/s | 1.6118 KOps/s | $\color{#d91a1a}-0.44\\%$ | | test_vmap_mlp_speed[False-False] | 0.7868ms | 0.6145ms | 1.6272 KOps/s | 1.6310 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.6157ms | 0.7708ms | 1.2973 KOps/s | 1.3286 KOps/s | $\color{#d91a1a}-2.36\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9661ms | 0.7635ms | 1.3098 KOps/s | 1.3275 KOps/s | $\color{#d91a1a}-1.34\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8616ms | 0.6607ms | 1.5136 KOps/s | 1.5284 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8705ms | 0.6628ms | 1.5086 KOps/s | 1.5171 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_vmap_transformer_speed[True-True] | 9.4534ms | 8.9352ms | 111.9168 Ops/s | 112.8171 Ops/s | $\color{#d91a1a}-0.80\\%$ | | test_vmap_transformer_speed[True-False] | 9.3495ms | 8.8881ms | 112.5094 Ops/s | 112.8667 Ops/s | $\color{#d91a1a}-0.32\\%$ | | test_vmap_transformer_speed[False-True] | 9.3508ms | 8.7897ms | 113.7690 Ops/s | 113.9762 Ops/s | $\color{#d91a1a}-0.18\\%$ | | test_vmap_transformer_speed[False-False] | 9.3123ms | 8.8427ms | 113.0880 Ops/s | 113.5419 Ops/s | $\color{#d91a1a}-0.40\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 22.1973ms | 21.3617ms | 46.8127 Ops/s | 47.2442 Ops/s | $\color{#d91a1a}-0.91\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 22.0253ms | 21.2414ms | 47.0779 Ops/s | 47.1787 Ops/s | $\color{#d91a1a}-0.21\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 22.0418ms | 21.1308ms | 47.3242 Ops/s | 47.6241 Ops/s | $\color{#d91a1a}-0.63\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 22.0134ms | 21.1707ms | 47.2352 Ops/s | 47.6039 Ops/s | $\color{#d91a1a}-0.77\\%$ | | test_to_module_speed[True] | 2.0201ms | 1.4782ms | 676.5197 Ops/s | 678.7254 Ops/s | $\color{#d91a1a}-0.32\\%$ | | test_to_module_speed[False] | 1.9041ms | 1.4484ms | 690.4274 Ops/s | 684.5486 Ops/s | $\color{#35bf28}+0.86\\%$ | | test_tc_init | 71.5220μs | 41.9417μs | 23.8426 KOps/s | 26.6940 KOps/s | $\textbf{\color{#d91a1a}-10.68\\%}$ | | test_tc_init_nested | 0.1268ms | 85.3691μs | 11.7138 KOps/s | 13.3336 KOps/s | $\textbf{\color{#d91a1a}-12.15\\%}$ | | test_tc_first_layer_tensor | 3.4085μs | 0.7763μs | 1.2882 MOps/s | 249.2205 KOps/s | $\textbf{\color{#35bf28}+416.90\\%}$ | | test_tc_first_layer_nontensor | 21.0610μs | 2.5207μs | 396.7203 KOps/s | 247.2793 KOps/s | $\textbf{\color{#35bf28}+60.43\\%}$ | | test_tc_second_layer_tensor | 6.7800μs | 1.5905μs | 628.7472 KOps/s | 762.5024 KOps/s | $\textbf{\color{#d91a1a}-17.54\\%}$ | | test_tc_second_layer_nontensor | 20.3810μs | 3.3755μs | 296.2494 KOps/s | 215.4490 KOps/s | $\textbf{\color{#35bf28}+37.50\\%}$ | | test_unbind | 0.3236s | 12.4508ms | 80.3159 Ops/s | 81.2533 Ops/s | $\color{#d91a1a}-1.15\\%$ | | test_full_like | 0.7670ms | 0.5798ms | 1.7246 KOps/s | 1.7303 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_zeros_like | 0.3457ms | 0.1979ms | 5.0524 KOps/s | 5.0446 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_ones_like | 0.3478ms | 0.1978ms | 5.0556 KOps/s | 5.0504 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_clone | 0.5628ms | 0.4151ms | 2.4090 KOps/s | 2.4135 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_squeeze | 0.1392ms | 11.0576μs | 90.4356 KOps/s | 86.1709 KOps/s | $\color{#35bf28}+4.95\\%$ | | test_unsqueeze | 0.2499ms | 80.4879μs | 12.4242 KOps/s | 12.1532 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_split | 0.4379ms | 0.1803ms | 5.5469 KOps/s | 5.6601 KOps/s | $\color{#d91a1a}-2.00\\%$ | | test_permute | 0.3104ms | 0.1923ms | 5.2009 KOps/s | 5.2440 KOps/s | $\color{#d91a1a}-0.82\\%$ | | test_stack | 1.3337ms | 0.9009ms | 1.1101 KOps/s | 1.1179 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_cat | 1.3470ms | 1.2321ms | 811.6114 Ops/s | 811.4328 Ops/s | $\color{#35bf28}+0.02\\%$ |