pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
821 stars 67 forks source link

[Feature] flatten_keys and unflatten_keys as context managers #908

Closed vmoens closed 2 months ago

vmoens commented 2 months ago

Stack from ghstack (oldest at bottom):

github-actions[bot] commented 2 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}35$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 58.1780μs | 23.7191μs | 42.1601 KOps/s | 46.1613 KOps/s | $\textbf{\color{#d91a1a}-8.67\\%}$ | | test_plain_set_stack_nested | 89.1660μs | 24.2420μs | 41.2507 KOps/s | 45.3245 KOps/s | $\textbf{\color{#d91a1a}-8.99\\%}$ | | test_plain_set_nested_inplace | 58.1880μs | 26.2622μs | 38.0776 KOps/s | 42.1562 KOps/s | $\textbf{\color{#d91a1a}-9.67\\%}$ | | test_plain_set_stack_nested_inplace | 90.1870μs | 26.3970μs | 37.8830 KOps/s | 42.6573 KOps/s | $\textbf{\color{#d91a1a}-11.19\\%}$ | | test_items | 27.6120μs | 2.6748μs | 373.8531 KOps/s | 384.0378 KOps/s | $\color{#d91a1a}-2.65\\%$ | | test_items_nested | 0.5642ms | 0.3604ms | 2.7747 KOps/s | 2.4724 KOps/s | $\textbf{\color{#35bf28}+12.23\\%}$ | | test_items_nested_locked | 1.6194ms | 0.3669ms | 2.7257 KOps/s | 2.4703 KOps/s | $\textbf{\color{#35bf28}+10.34\\%}$ | | test_items_nested_leaf | 0.1697ms | 88.5528μs | 11.2927 KOps/s | 11.4767 KOps/s | $\color{#d91a1a}-1.60\\%$ | | test_items_stack_nested | 0.7509ms | 0.3644ms | 2.7445 KOps/s | 2.4731 KOps/s | $\textbf{\color{#35bf28}+10.98\\%}$ | | test_items_stack_nested_leaf | 0.1859ms | 89.1378μs | 11.2186 KOps/s | 11.4825 KOps/s | $\color{#d91a1a}-2.30\\%$ | | test_items_stack_nested_locked | 0.5347ms | 0.3619ms | 2.7633 KOps/s | 2.4732 KOps/s | $\textbf{\color{#35bf28}+11.73\\%}$ | | test_keys | 29.8560μs | 3.8857μs | 257.3547 KOps/s | 252.7135 KOps/s | $\color{#35bf28}+1.84\\%$ | | test_keys_nested | 0.2474ms | 0.1463ms | 6.8337 KOps/s | 7.0264 KOps/s | $\color{#d91a1a}-2.74\\%$ | | test_keys_nested_locked | 1.8953ms | 0.1516ms | 6.5980 KOps/s | 6.7190 KOps/s | $\color{#d91a1a}-1.80\\%$ | | test_keys_nested_leaf | 0.2160ms | 0.1257ms | 7.9559 KOps/s | 8.1173 KOps/s | $\color{#d91a1a}-1.99\\%$ | | test_keys_stack_nested | 0.3350ms | 0.1476ms | 6.7773 KOps/s | 6.9496 KOps/s | $\color{#d91a1a}-2.48\\%$ | | test_keys_stack_nested_leaf | 0.2283ms | 0.1265ms | 7.9060 KOps/s | 8.1377 KOps/s | $\color{#d91a1a}-2.85\\%$ | | test_keys_stack_nested_locked | 0.2950ms | 0.1518ms | 6.5862 KOps/s | 6.6163 KOps/s | $\color{#d91a1a}-0.46\\%$ | | test_values | 8.5907μs | 1.1720μs | 853.2660 KOps/s | 858.1076 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_values_nested | 0.1352ms | 50.7656μs | 19.6984 KOps/s | 19.9607 KOps/s | $\color{#d91a1a}-1.31\\%$ | | test_values_nested_locked | 0.1056ms | 50.6496μs | 19.7435 KOps/s | 19.8773 KOps/s | $\color{#d91a1a}-0.67\\%$ | | test_values_nested_leaf | 0.1622ms | 46.0830μs | 21.7000 KOps/s | 21.9143 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_values_stack_nested | 0.1321ms | 52.2702μs | 19.1314 KOps/s | 19.8670 KOps/s | $\color{#d91a1a}-3.70\\%$ | | test_values_stack_nested_leaf | 0.3990ms | 47.0030μs | 21.2752 KOps/s | 21.5336 KOps/s | $\color{#d91a1a}-1.20\\%$ | | test_values_stack_nested_locked | 1.3022ms | 51.9997μs | 19.2309 KOps/s | 19.6109 KOps/s | $\color{#d91a1a}-1.94\\%$ | | test_membership | 25.2470μs | 0.9416μs | 1.0620 MOps/s | 1.0936 MOps/s | $\color{#d91a1a}-2.89\\%$ | | test_membership_nested | 52.2970μs | 2.7449μs | 364.3176 KOps/s | 365.2647 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_membership_nested_leaf | 30.3670μs | 2.7483μs | 363.8570 KOps/s | 365.1095 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_membership_stacked_nested | 30.1660μs | 2.7365μs | 365.4291 KOps/s | 358.2319 KOps/s | $\color{#35bf28}+2.01\\%$ | | test_membership_stacked_nested_leaf | 32.9610μs | 2.7731μs | 360.6113 KOps/s | 365.7619 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_membership_nested_last | 31.2580μs | 4.1455μs | 241.2231 KOps/s | 247.1385 KOps/s | $\color{#d91a1a}-2.39\\%$ | | test_membership_nested_leaf_last | 23.4930μs | 4.1421μs | 241.4210 KOps/s | 248.8583 KOps/s | $\color{#d91a1a}-2.99\\%$ | | test_membership_stacked_nested_last | 60.0110μs | 7.3628μs | 135.8184 KOps/s | 249.6028 KOps/s | $\textbf{\color{#d91a1a}-45.59\\%}$ | | test_membership_stacked_nested_leaf_last | 43.8920μs | 7.2809μs | 137.3448 KOps/s | 244.8912 KOps/s | $\textbf{\color{#d91a1a}-43.92\\%}$ | | test_nested_getleaf | 67.7860μs | 11.2822μs | 88.6349 KOps/s | 92.6691 KOps/s | $\color{#d91a1a}-4.35\\%$ | | test_nested_get | 41.4870μs | 10.8258μs | 92.3722 KOps/s | 94.9473 KOps/s | $\color{#d91a1a}-2.71\\%$ | | test_stacked_getleaf | 66.1930μs | 11.1384μs | 89.7791 KOps/s | 91.9112 KOps/s | $\color{#d91a1a}-2.32\\%$ | | test_stacked_get | 0.1501ms | 10.6685μs | 93.7338 KOps/s | 97.5022 KOps/s | $\color{#d91a1a}-3.86\\%$ | | test_nested_getitemleaf | 35.1450μs | 11.8108μs | 84.6683 KOps/s | 88.6310 KOps/s | $\color{#d91a1a}-4.47\\%$ | | test_nested_getitem | 61.0540μs | 10.9855μs | 91.0291 KOps/s | 95.5686 KOps/s | $\color{#d91a1a}-4.75\\%$ | | test_stacked_getitemleaf | 54.5110μs | 11.8157μs | 84.6328 KOps/s | 88.3895 KOps/s | $\color{#d91a1a}-4.25\\%$ | | test_stacked_getitem | 32.4500μs | 10.9989μs | 90.9180 KOps/s | 94.8218 KOps/s | $\color{#d91a1a}-4.12\\%$ | | test_lock_nested | 3.0237ms | 0.5204ms | 1.9215 KOps/s | 1.6307 KOps/s | $\textbf{\color{#35bf28}+17.83\\%}$ | | test_lock_stack_nested | 0.9242ms | 0.4717ms | 2.1201 KOps/s | 2.0329 KOps/s | $\color{#35bf28}+4.29\\%$ | | test_unlock_nested | 0.8861ms | 0.4405ms | 2.2703 KOps/s | 2.3226 KOps/s | $\color{#d91a1a}-2.25\\%$ | | test_unlock_stack_nested | 0.4843ms | 0.3867ms | 2.5859 KOps/s | 2.4825 KOps/s | $\color{#35bf28}+4.17\\%$ | | test_flatten_speed | 0.2600ms | 0.1065ms | 9.3879 KOps/s | 9.4682 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_unflatten_speed | 0.9999ms | 0.4557ms | 2.1946 KOps/s | 2.2194 KOps/s | $\color{#d91a1a}-1.12\\%$ | | test_common_ops | 2.1616ms | 1.2385ms | 807.4026 Ops/s | 860.4017 Ops/s | $\textbf{\color{#d91a1a}-6.16\\%}$ | | test_creation | 27.8720μs | 2.5529μs | 391.7152 KOps/s | 394.2538 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_creation_empty | 50.6230μs | 22.6964μs | 44.0599 KOps/s | 54.6708 KOps/s | $\textbf{\color{#d91a1a}-19.41\\%}$ | | test_creation_nested_1 | 73.4460μs | 26.5846μs | 37.6157 KOps/s | 43.8137 KOps/s | $\textbf{\color{#d91a1a}-14.15\\%}$ | | test_creation_nested_2 | 89.1350μs | 30.5361μs | 32.7481 KOps/s | 38.3054 KOps/s | $\textbf{\color{#d91a1a}-14.51\\%}$ | | test_clone | 0.1884ms | 18.7920μs | 53.2142 KOps/s | 57.3491 KOps/s | $\textbf{\color{#d91a1a}-7.21\\%}$ | | test_getitem[int] | 0.8472ms | 12.9166μs | 77.4199 KOps/s | 77.5472 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_getitem[slice_int] | 0.1325ms | 33.4538μs | 29.8920 KOps/s | 29.2700 KOps/s | $\color{#35bf28}+2.12\\%$ | | test_getitem[range] | 0.3644ms | 58.4607μs | 17.1055 KOps/s | 16.8611 KOps/s | $\color{#35bf28}+1.45\\%$ | | test_getitem[tuple] | 0.1488ms | 27.2523μs | 36.6942 KOps/s | 35.9902 KOps/s | $\color{#35bf28}+1.96\\%$ | | test_getitem[list] | 0.4014ms | 53.2128μs | 18.7925 KOps/s | 18.2722 KOps/s | $\color{#35bf28}+2.85\\%$ | | test_setitem_dim[int] | 82.0120μs | 38.6163μs | 25.8958 KOps/s | 29.4853 KOps/s | $\textbf{\color{#d91a1a}-12.17\\%}$ | | test_setitem_dim[slice_int] | 0.2008ms | 77.9955μs | 12.8213 KOps/s | 13.4741 KOps/s | $\color{#d91a1a}-4.85\\%$ | | test_setitem_dim[range] | 0.1457ms | 98.1801μs | 10.1854 KOps/s | 10.5783 KOps/s | $\color{#d91a1a}-3.71\\%$ | | test_setitem_dim[tuple] | 0.1130ms | 64.7763μs | 15.4378 KOps/s | 16.2156 KOps/s | $\color{#d91a1a}-4.80\\%$ | | test_setitem | 0.1518ms | 33.8906μs | 29.5067 KOps/s | 32.8850 KOps/s | $\textbf{\color{#d91a1a}-10.27\\%}$ | | test_set | 0.1811ms | 33.3063μs | 30.0244 KOps/s | 33.5873 KOps/s | $\textbf{\color{#d91a1a}-10.61\\%}$ | | test_set_shared | 3.5316ms | 0.2198ms | 4.5496 KOps/s | 4.4990 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_update | 0.1863ms | 41.9528μs | 23.8363 KOps/s | 25.9556 KOps/s | $\textbf{\color{#d91a1a}-8.16\\%}$ | | test_update_nested | 0.2400ms | 53.0165μs | 18.8621 KOps/s | 21.3819 KOps/s | $\textbf{\color{#d91a1a}-11.78\\%}$ | | test_update__nested | 0.1795ms | 36.5670μs | 27.3470 KOps/s | 28.1344 KOps/s | $\color{#d91a1a}-2.80\\%$ | | test_set_nested | 0.1444ms | 35.3866μs | 28.2593 KOps/s | 30.9158 KOps/s | $\textbf{\color{#d91a1a}-8.59\\%}$ | | test_set_nested_new | 0.1698ms | 41.3699μs | 24.1722 KOps/s | 26.0935 KOps/s | $\textbf{\color{#d91a1a}-7.36\\%}$ | | test_select | 0.2196ms | 59.0218μs | 16.9429 KOps/s | 17.8995 KOps/s | $\textbf{\color{#d91a1a}-5.34\\%}$ | | test_select_nested | 0.1543ms | 60.5580μs | 16.5131 KOps/s | 16.3527 KOps/s | $\color{#35bf28}+0.98\\%$ | | test_exclude_nested | 0.1549ms | 80.0434μs | 12.4932 KOps/s | 12.1026 KOps/s | $\color{#35bf28}+3.23\\%$ | | test_empty[True] | 0.6540ms | 0.3484ms | 2.8700 KOps/s | 2.8428 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_empty[False] | 12.5608μs | 1.3098μs | 763.4465 KOps/s | 765.9821 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_unbind_speed | 0.5683ms | 0.3246ms | 3.0809 KOps/s | 3.0742 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_unbind_speed_stack0 | 0.6171ms | 0.3117ms | 3.2085 KOps/s | 3.1002 KOps/s | $\color{#35bf28}+3.50\\%$ | | test_unbind_speed_stack1 | 83.9489ms | 0.8068ms | 1.2395 KOps/s | 1.3740 KOps/s | $\textbf{\color{#d91a1a}-9.79\\%}$ | | test_split | 76.9395ms | 2.2894ms | 436.8004 Ops/s | 402.6555 Ops/s | $\textbf{\color{#35bf28}+8.48\\%}$ | | test_chunk | 83.7467ms | 2.3126ms | 432.4090 Ops/s | 465.4370 Ops/s | $\textbf{\color{#d91a1a}-7.10\\%}$ | | test_creation[device0] | 0.2288ms | 0.1257ms | 7.9581 KOps/s | 8.0705 KOps/s | $\color{#d91a1a}-1.39\\%$ | | test_creation_from_tensor | 4.2419ms | 0.1253ms | 7.9830 KOps/s | 8.1521 KOps/s | $\color{#d91a1a}-2.07\\%$ | | test_add_one[memmap_tensor0] | 0.1941ms | 7.6097μs | 131.4115 KOps/s | 125.5856 KOps/s | $\color{#35bf28}+4.64\\%$ | | test_contiguous[memmap_tensor0] | 45.3750μs | 2.2110μs | 452.2750 KOps/s | 445.9585 KOps/s | $\color{#35bf28}+1.42\\%$ | | test_stack[memmap_tensor0] | 51.2850μs | 6.0022μs | 166.6042 KOps/s | 170.3582 KOps/s | $\color{#d91a1a}-2.20\\%$ | | test_memmaptd_index | 1.2056ms | 0.4527ms | 2.2090 KOps/s | 2.2917 KOps/s | $\color{#d91a1a}-3.61\\%$ | | test_memmaptd_index_astensor | 0.8194ms | 0.5277ms | 1.8949 KOps/s | 1.7773 KOps/s | $\textbf{\color{#35bf28}+6.62\\%}$ | | test_memmaptd_index_op | 1.5221ms | 1.1468ms | 872.0148 Ops/s | 932.3414 Ops/s | $\textbf{\color{#d91a1a}-6.47\\%}$ | | test_serialize_model | 0.2095s | 0.1446s | 6.9148 Ops/s | 7.8287 Ops/s | $\textbf{\color{#d91a1a}-11.67\\%}$ | | test_serialize_model_pickle | 0.4748s | 0.3993s | 2.5043 Ops/s | 2.4851 Ops/s | $\color{#35bf28}+0.78\\%$ | | test_serialize_weights | 0.1363s | 0.1287s | 7.7723 Ops/s | 6.9375 Ops/s | $\textbf{\color{#35bf28}+12.03\\%}$ | | test_serialize_weights_returnearly | 0.1729s | 0.1658s | 6.0319 Ops/s | 6.0651 Ops/s | $\color{#d91a1a}-0.55\\%$ | | test_serialize_weights_pickle | 1.2485s | 0.9118s | 1.0968 Ops/s | 2.5457 Ops/s | $\textbf{\color{#d91a1a}-56.92\\%}$ | | test_serialize_weights_filesystem | 0.1510s | 0.1455s | 6.8706 Ops/s | 6.6925 Ops/s | $\color{#35bf28}+2.66\\%$ | | test_serialize_model_filesystem | 0.1565s | 0.1453s | 6.8827 Ops/s | 5.8762 Ops/s | $\textbf{\color{#35bf28}+17.13\\%}$ | | test_reshape_pytree | 0.2322ms | 41.8723μs | 23.8822 KOps/s | 24.4828 KOps/s | $\color{#d91a1a}-2.45\\%$ | | test_reshape_td | 0.1031ms | 50.8270μs | 19.6746 KOps/s | 19.7988 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_view_pytree | 96.9690μs | 39.6329μs | 25.2316 KOps/s | 25.2643 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_view_td | 0.1151ms | 56.9581μs | 17.5568 KOps/s | 17.7273 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_unbind_pytree | 84.4460μs | 36.5813μs | 27.3364 KOps/s | 28.0997 KOps/s | $\color{#d91a1a}-2.72\\%$ | | test_unbind_td | 78.5903ms | 56.8407μs | 17.5930 KOps/s | 20.5109 KOps/s | $\textbf{\color{#d91a1a}-14.23\\%}$ | | test_split_pytree | 82.8440μs | 39.4937μs | 25.3205 KOps/s | 25.9083 KOps/s | $\color{#d91a1a}-2.27\\%$ | | test_split_td | 0.1960ms | 62.6948μs | 15.9503 KOps/s | 16.3209 KOps/s | $\color{#d91a1a}-2.27\\%$ | | test_add_pytree | 0.1713ms | 44.1392μs | 22.6556 KOps/s | 22.2299 KOps/s | $\color{#35bf28}+1.92\\%$ | | test_add_td | 0.1729ms | 92.8599μs | 10.7689 KOps/s | 12.0421 KOps/s | $\textbf{\color{#d91a1a}-10.57\\%}$ | | test_distributed | 0.7288ms | 0.1314ms | 7.6104 KOps/s | 7.2453 KOps/s | $\textbf{\color{#35bf28}+5.04\\%}$ | | test_tdmodule | 48.3500μs | 18.8169μs | 53.1436 KOps/s | 59.1148 KOps/s | $\textbf{\color{#d91a1a}-10.10\\%}$ | | test_tdmodule_dispatch | 71.4320μs | 40.0347μs | 24.9783 KOps/s | 28.4952 KOps/s | $\textbf{\color{#d91a1a}-12.34\\%}$ | | test_tdseq | 45.6350μs | 20.6113μs | 48.5171 KOps/s | 53.3568 KOps/s | $\textbf{\color{#d91a1a}-9.07\\%}$ | | test_tdseq_dispatch | 79.1170μs | 44.3227μs | 22.5618 KOps/s | 25.7466 KOps/s | $\textbf{\color{#d91a1a}-12.37\\%}$ | | test_instantiation_functorch | 1.9402ms | 1.6524ms | 605.1854 Ops/s | 625.4436 Ops/s | $\color{#d91a1a}-3.24\\%$ | | test_instantiation_td | 2.6710ms | 1.2035ms | 830.8939 Ops/s | 865.8988 Ops/s | $\color{#d91a1a}-4.04\\%$ | | test_exec_functorch | 0.3422ms | 0.1883ms | 5.3111 KOps/s | 5.3426 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_exec_functional_call | 0.3026ms | 0.1746ms | 5.7269 KOps/s | 5.2064 KOps/s | $\textbf{\color{#35bf28}+10.00\\%}$ | | test_exec_td | 0.3309ms | 0.1759ms | 5.6841 KOps/s | 5.6538 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_exec_td_decorator | 0.8462ms | 0.2671ms | 3.7444 KOps/s | 3.7769 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_vmap_mlp_speed[True-True] | 0.8685ms | 0.6103ms | 1.6386 KOps/s | 1.6544 KOps/s | $\color{#d91a1a}-0.95\\%$ | | test_vmap_mlp_speed[True-False] | 0.8962ms | 0.6088ms | 1.6426 KOps/s | 1.6547 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_vmap_mlp_speed[False-True] | 0.8464ms | 0.4963ms | 2.0149 KOps/s | 2.0091 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_vmap_mlp_speed[False-False] | 0.8695ms | 0.4980ms | 2.0080 KOps/s | 2.0158 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.3637ms | 0.7033ms | 1.4218 KOps/s | 1.4226 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9590ms | 0.7047ms | 1.4190 KOps/s | 1.4300 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.9094ms | 0.5792ms | 1.7264 KOps/s | 1.7133 KOps/s | $\color{#35bf28}+0.77\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.9885ms | 0.5789ms | 1.7276 KOps/s | 1.7247 KOps/s | $\color{#35bf28}+0.16\\%$ | | test_to_module_speed[True] | 2.4630ms | 1.8500ms | 540.5429 Ops/s | 538.8634 Ops/s | $\color{#35bf28}+0.31\\%$ | | test_to_module_speed[False] | 2.9039ms | 1.8352ms | 544.9109 Ops/s | 548.5593 Ops/s | $\color{#d91a1a}-0.67\\%$ | | test_tc_init | 96.9100μs | 46.0740μs | 21.7042 KOps/s | 23.1175 KOps/s | $\textbf{\color{#d91a1a}-6.11\\%}$ | | test_tc_init_nested | 0.1533ms | 93.2386μs | 10.7252 KOps/s | 11.2473 KOps/s | $\color{#d91a1a}-4.64\\%$ | | test_tc_first_layer_tensor | 46.4060μs | 9.3081μs | 107.4337 KOps/s | 106.1683 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_tc_first_layer_nontensor | 51.6060μs | 9.0913μs | 109.9957 KOps/s | 107.3081 KOps/s | $\color{#35bf28}+2.50\\%$ | | test_tc_second_layer_tensor | 44.5520μs | 2.8137μs | 355.4093 KOps/s | 340.9049 KOps/s | $\color{#35bf28}+4.25\\%$ | | test_tc_second_layer_nontensor | 44.8630μs | 10.3923μs | 96.2249 KOps/s | 92.8632 KOps/s | $\color{#35bf28}+3.62\\%$ | | test_unbind | 0.1044s | 14.4627ms | 69.1435 Ops/s | 67.4599 Ops/s | $\color{#35bf28}+2.50\\%$ | | test_full_like | 11.5202ms | 8.5969ms | 116.3215 Ops/s | 126.6579 Ops/s | $\textbf{\color{#d91a1a}-8.16\\%}$ | | test_zeros_like | 14.5601ms | 6.7715ms | 147.6776 Ops/s | 153.5228 Ops/s | $\color{#d91a1a}-3.81\\%$ | | test_ones_like | 13.4976ms | 7.8255ms | 127.7880 Ops/s | 128.7501 Ops/s | $\color{#d91a1a}-0.75\\%$ | | test_clone | 17.5014ms | 9.9139ms | 100.8685 Ops/s | 109.6009 Ops/s | $\textbf{\color{#d91a1a}-7.97\\%}$ | | test_squeeze | 60.6520μs | 14.8642μs | 67.2757 KOps/s | 64.5823 KOps/s | $\color{#35bf28}+4.17\\%$ | | test_unsqueeze | 0.2137ms | 0.1001ms | 9.9909 KOps/s | 9.8931 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_split | 0.4498ms | 0.2070ms | 4.8305 KOps/s | 4.6123 KOps/s | $\color{#35bf28}+4.73\\%$ | | test_permute | 0.4171ms | 0.2323ms | 4.3040 KOps/s | 4.3333 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_stack | 33.8198ms | 27.2829ms | 36.6530 Ops/s | 39.5636 Ops/s | $\textbf{\color{#d91a1a}-7.36\\%}$ | | test_cat | 33.5557ms | 27.2998ms | 36.6302 Ops/s | 40.4875 Ops/s | $\textbf{\color{#d91a1a}-9.53\\%}$ |
github-actions[bot] commented 2 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}21$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 29.9400μs | 17.1672μs | 58.2508 KOps/s | 59.5848 KOps/s | $\color{#d91a1a}-2.24\\%$ | | test_plain_set_stack_nested | 36.7600μs | 17.3032μs | 57.7928 KOps/s | 59.3534 KOps/s | $\color{#d91a1a}-2.63\\%$ | | test_plain_set_nested_inplace | 43.5600μs | 18.5522μs | 53.9021 KOps/s | 53.2050 KOps/s | $\color{#35bf28}+1.31\\%$ | | test_plain_set_stack_nested_inplace | 44.2210μs | 18.5265μs | 53.9767 KOps/s | 55.3320 KOps/s | $\color{#d91a1a}-2.45\\%$ | | test_items | 17.3710μs | 4.8390μs | 206.6551 KOps/s | 210.5973 KOps/s | $\color{#d91a1a}-1.87\\%$ | | test_items_nested | 0.4928ms | 0.3940ms | 2.5380 KOps/s | 2.4874 KOps/s | $\color{#35bf28}+2.03\\%$ | | test_items_nested_locked | 0.4803ms | 0.3910ms | 2.5576 KOps/s | 2.4830 KOps/s | $\color{#35bf28}+3.00\\%$ | | test_items_nested_leaf | 0.1183ms | 86.0946μs | 11.6151 KOps/s | 11.5808 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_items_stack_nested | 0.4881ms | 0.3907ms | 2.5595 KOps/s | 2.4941 KOps/s | $\color{#35bf28}+2.62\\%$ | | test_items_stack_nested_leaf | 0.1318ms | 86.3261μs | 11.5840 KOps/s | 11.4835 KOps/s | $\color{#35bf28}+0.88\\%$ | | test_items_stack_nested_locked | 0.4415ms | 0.3928ms | 2.5457 KOps/s | 2.5179 KOps/s | $\color{#35bf28}+1.10\\%$ | | test_keys | 18.0600μs | 4.4067μs | 226.9287 KOps/s | 228.2812 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_keys_nested | 0.1013ms | 67.6305μs | 14.7862 KOps/s | 15.2294 KOps/s | $\color{#d91a1a}-2.91\\%$ | | test_keys_nested_locked | 0.7333ms | 73.1001μs | 13.6799 KOps/s | 13.6256 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_keys_nested_leaf | 91.2110μs | 57.6563μs | 17.3442 KOps/s | 17.5727 KOps/s | $\color{#d91a1a}-1.30\\%$ | | test_keys_stack_nested | 0.1089ms | 66.3441μs | 15.0729 KOps/s | 15.1544 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_keys_stack_nested_leaf | 93.6910μs | 56.9444μs | 17.5610 KOps/s | 17.7404 KOps/s | $\color{#d91a1a}-1.01\\%$ | | test_keys_stack_nested_locked | 0.1051ms | 72.3696μs | 13.8179 KOps/s | 13.7371 KOps/s | $\color{#35bf28}+0.59\\%$ | | test_values | 7.3570μs | 1.7740μs | 563.6872 KOps/s | 566.9632 KOps/s | $\color{#d91a1a}-0.58\\%$ | | test_values_nested | 46.4100μs | 34.1181μs | 29.3099 KOps/s | 29.4578 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_values_nested_locked | 86.0320μs | 35.6650μs | 28.0387 KOps/s | 27.7520 KOps/s | $\color{#35bf28}+1.03\\%$ | | test_values_nested_leaf | 61.5510μs | 30.3009μs | 33.0023 KOps/s | 33.1318 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_values_stack_nested | 61.4910μs | 34.8897μs | 28.6618 KOps/s | 28.6794 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_values_stack_nested_leaf | 59.3800μs | 31.0267μs | 32.2303 KOps/s | 32.3490 KOps/s | $\color{#d91a1a}-0.37\\%$ | | test_values_stack_nested_locked | 63.4010μs | 36.3459μs | 27.5134 KOps/s | 27.4367 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_membership | 3.1836μs | 0.5602μs | 1.7850 MOps/s | 1.8386 MOps/s | $\color{#d91a1a}-2.92\\%$ | | test_membership_nested | 13.3900μs | 2.0619μs | 485.0011 KOps/s | 477.9696 KOps/s | $\color{#35bf28}+1.47\\%$ | | test_membership_nested_leaf | 11.2855μs | 2.0730μs | 482.3949 KOps/s | 494.3265 KOps/s | $\color{#d91a1a}-2.41\\%$ | | test_membership_stacked_nested | 33.8910μs | 2.1538μs | 464.2914 KOps/s | 487.1459 KOps/s | $\color{#d91a1a}-4.69\\%$ | | test_membership_stacked_nested_leaf | 15.7500μs | 2.1281μs | 469.8948 KOps/s | 482.3425 KOps/s | $\color{#d91a1a}-2.58\\%$ | | test_membership_nested_last | 16.7610μs | 3.1051μs | 322.0559 KOps/s | 332.1734 KOps/s | $\color{#d91a1a}-3.05\\%$ | | test_membership_nested_leaf_last | 55.7310μs | 3.1185μs | 320.6692 KOps/s | 334.2651 KOps/s | $\color{#d91a1a}-4.07\\%$ | | test_membership_stacked_nested_last | 15.9590μs | 3.1094μs | 321.6081 KOps/s | 334.4267 KOps/s | $\color{#d91a1a}-3.83\\%$ | | test_membership_stacked_nested_leaf_last | 19.9390μs | 3.0666μs | 326.0944 KOps/s | 331.2035 KOps/s | $\color{#d91a1a}-1.54\\%$ | | test_nested_getleaf | 30.2510μs | 8.0069μs | 124.8917 KOps/s | 123.8436 KOps/s | $\color{#35bf28}+0.85\\%$ | | test_nested_get | 22.3000μs | 7.5595μs | 132.2833 KOps/s | 131.9271 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_stacked_getleaf | 24.4710μs | 8.0506μs | 124.2141 KOps/s | 124.1707 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_stacked_get | 20.7400μs | 7.5796μs | 131.9323 KOps/s | 131.7998 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_nested_getitemleaf | 67.8420μs | 8.1919μs | 122.0714 KOps/s | 122.1611 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_nested_getitem | 21.5510μs | 7.7569μs | 128.9174 KOps/s | 129.0864 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_stacked_getitemleaf | 31.0110μs | 8.2170μs | 121.6982 KOps/s | 120.8845 KOps/s | $\color{#35bf28}+0.67\\%$ | | test_stacked_getitem | 23.1310μs | 7.7320μs | 129.3319 KOps/s | 129.5715 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_lock_nested | 9.9110ms | 0.4860ms | 2.0575 KOps/s | 2.0441 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_lock_stack_nested | 0.5309ms | 0.4401ms | 2.2724 KOps/s | 2.2356 KOps/s | $\color{#35bf28}+1.64\\%$ | | test_unlock_nested | 0.8833ms | 0.4010ms | 2.4937 KOps/s | 2.4691 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_unlock_stack_nested | 0.3956ms | 0.3610ms | 2.7701 KOps/s | 2.7302 KOps/s | $\color{#35bf28}+1.46\\%$ | | test_flatten_speed | 0.5127ms | 0.1054ms | 9.4868 KOps/s | 9.5619 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_unflatten_speed | 0.3524ms | 0.2981ms | 3.3551 KOps/s | 3.4187 KOps/s | $\color{#d91a1a}-1.86\\%$ | | test_common_ops | 1.7397ms | 1.4491ms | 690.0722 Ops/s | 740.5372 Ops/s | $\textbf{\color{#d91a1a}-6.81\\%}$ | | test_creation | 20.4910μs | 2.0504μs | 487.7073 KOps/s | 486.9862 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_creation_empty | 47.7500μs | 19.4260μs | 51.4774 KOps/s | 57.7403 KOps/s | $\textbf{\color{#d91a1a}-10.85\\%}$ | | test_creation_nested_1 | 61.8610μs | 21.6277μs | 46.2370 KOps/s | 50.9419 KOps/s | $\textbf{\color{#d91a1a}-9.24\\%}$ | | test_creation_nested_2 | 45.6400μs | 25.6232μs | 39.0272 KOps/s | 44.7713 KOps/s | $\textbf{\color{#d91a1a}-12.83\\%}$ | | test_clone | 70.4420μs | 34.5209μs | 28.9680 KOps/s | 31.6106 KOps/s | $\textbf{\color{#d91a1a}-8.36\\%}$ | | test_getitem[int] | 1.2943ms | 18.7900μs | 53.2199 KOps/s | 56.1460 KOps/s | $\textbf{\color{#d91a1a}-5.21\\%}$ | | test_getitem[slice_int] | 0.1559ms | 33.4563μs | 29.8897 KOps/s | 32.0479 KOps/s | $\textbf{\color{#d91a1a}-6.73\\%}$ | | test_getitem[range] | 0.2648ms | 0.1217ms | 8.2174 KOps/s | 8.4357 KOps/s | $\color{#d91a1a}-2.59\\%$ | | test_getitem[tuple] | 89.8689ms | 32.2947μs | 30.9648 KOps/s | 37.8489 KOps/s | $\textbf{\color{#d91a1a}-18.19\\%}$ | | test_getitem[list] | 0.2221ms | 0.1109ms | 9.0138 KOps/s | 9.2520 KOps/s | $\color{#d91a1a}-2.57\\%$ | | test_setitem_dim[int] | 77.8620μs | 56.4469μs | 17.7158 KOps/s | 18.7489 KOps/s | $\textbf{\color{#d91a1a}-5.51\\%}$ | | test_setitem_dim[slice_int] | 0.1194ms | 82.2609μs | 12.1564 KOps/s | 12.7534 KOps/s | $\color{#d91a1a}-4.68\\%$ | | test_setitem_dim[range] | 0.1950ms | 0.1573ms | 6.3586 KOps/s | 7.0187 KOps/s | $\textbf{\color{#d91a1a}-9.40\\%}$ | | test_setitem_dim[tuple] | 0.1157ms | 80.2920μs | 12.4545 KOps/s | 13.9791 KOps/s | $\textbf{\color{#d91a1a}-10.91\\%}$ | | test_setitem | 0.1015ms | 50.2773μs | 19.8897 KOps/s | 20.7713 KOps/s | $\color{#d91a1a}-4.24\\%$ | | test_set | 75.9720μs | 48.9120μs | 20.4449 KOps/s | 20.2884 KOps/s | $\color{#35bf28}+0.77\\%$ | | test_set_shared | 0.3907ms | 56.7185μs | 17.6309 KOps/s | 17.9518 KOps/s | $\color{#d91a1a}-1.79\\%$ | | test_update | 0.1061ms | 54.4282μs | 18.3728 KOps/s | 18.5650 KOps/s | $\color{#d91a1a}-1.03\\%$ | | test_update_nested | 98.3420μs | 64.2952μs | 15.5533 KOps/s | 16.1121 KOps/s | $\color{#d91a1a}-3.47\\%$ | | test_update__nested | 0.1090ms | 70.7394μs | 14.1364 KOps/s | 15.5858 KOps/s | $\textbf{\color{#d91a1a}-9.30\\%}$ | | test_set_nested | 82.8210μs | 51.8144μs | 19.2997 KOps/s | 19.7579 KOps/s | $\color{#d91a1a}-2.32\\%$ | | test_set_nested_new | 91.1420μs | 56.3626μs | 17.7423 KOps/s | 17.7987 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_select | 95.0010μs | 71.1798μs | 14.0489 KOps/s | 13.8089 KOps/s | $\color{#35bf28}+1.74\\%$ | | test_select_nested | 0.5327ms | 53.9960μs | 18.5199 KOps/s | 18.6973 KOps/s | $\color{#d91a1a}-0.95\\%$ | | test_exclude_nested | 0.1115ms | 72.3466μs | 13.8223 KOps/s | 14.0105 KOps/s | $\color{#d91a1a}-1.34\\%$ | | test_empty[True] | 0.3580ms | 0.2937ms | 3.4053 KOps/s | 3.4103 KOps/s | $\color{#d91a1a}-0.15\\%$ | | test_empty[False] | 3.2332μs | 0.9422μs | 1.0613 MOps/s | 1.0857 MOps/s | $\color{#d91a1a}-2.24\\%$ | | test_to | 64.6110μs | 38.1394μs | 26.2196 KOps/s | 25.9447 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_to_nonblocking | 60.9210μs | 24.7702μs | 40.3711 KOps/s | 41.5788 KOps/s | $\color{#d91a1a}-2.90\\%$ | | test_unbind_speed | 0.4001ms | 0.3098ms | 3.2278 KOps/s | 3.2110 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_unbind_speed_stack0 | 0.4061ms | 0.3043ms | 3.2865 KOps/s | 3.1908 KOps/s | $\color{#35bf28}+3.00\\%$ | | test_unbind_speed_stack1 | 89.6915ms | 0.7714ms | 1.2963 KOps/s | 1.2597 KOps/s | $\color{#35bf28}+2.91\\%$ | | test_split | 91.3963ms | 2.4072ms | 415.4244 Ops/s | 418.8806 Ops/s | $\color{#d91a1a}-0.83\\%$ | | test_chunk | 2.3313ms | 2.1768ms | 459.3999 Ops/s | 455.3524 Ops/s | $\color{#35bf28}+0.89\\%$ | | test_creation[device0] | 0.1639ms | 0.1061ms | 9.4228 KOps/s | 8.6163 KOps/s | $\textbf{\color{#35bf28}+9.36\\%}$ | | test_creation_from_tensor | 0.1843ms | 0.1093ms | 9.1460 KOps/s | 8.8596 KOps/s | $\color{#35bf28}+3.23\\%$ | | test_add_one[memmap_tensor0] | 24.7010μs | 10.2545μs | 97.5177 KOps/s | 105.4630 KOps/s | $\textbf{\color{#d91a1a}-7.53\\%}$ | | test_contiguous[memmap_tensor0] | 24.2820μs | 2.2411μs | 446.2120 KOps/s | 428.8251 KOps/s | $\color{#35bf28}+4.05\\%$ | | test_stack[memmap_tensor0] | 34.0310μs | 7.0374μs | 142.0981 KOps/s | 147.3508 KOps/s | $\color{#d91a1a}-3.56\\%$ | | test_memmaptd_index | 1.2117ms | 0.4449ms | 2.2478 KOps/s | 1.9127 KOps/s | $\textbf{\color{#35bf28}+17.52\\%}$ | | test_memmaptd_index_astensor | 0.7995ms | 0.5068ms | 1.9731 KOps/s | 1.9564 KOps/s | $\color{#35bf28}+0.85\\%$ | | test_memmaptd_index_op | 1.5554ms | 1.1064ms | 903.8498 Ops/s | 912.4362 Ops/s | $\color{#d91a1a}-0.94\\%$ | | test_serialize_model | 0.1008s | 96.8150ms | 10.3290 Ops/s | 9.9274 Ops/s | $\color{#35bf28}+4.04\\%$ | | test_serialize_model_pickle | 1.3492s | 1.2362s | 0.8089 Ops/s | 0.8056 Ops/s | $\color{#35bf28}+0.41\\%$ | | test_serialize_weights | 96.0737ms | 92.8765ms | 10.7670 Ops/s | 9.0808 Ops/s | $\textbf{\color{#35bf28}+18.57\\%}$ | | test_serialize_weights_returnearly | 86.2683ms | 71.8502ms | 13.9179 Ops/s | 13.9211 Ops/s | $\color{#d91a1a}-0.02\\%$ | | test_serialize_weights_pickle | 1.3523s | 1.2368s | 0.8086 Ops/s | 0.8082 Ops/s | $\color{#35bf28}+0.04\\%$ | | test_reshape_pytree | 0.1079ms | 40.3371μs | 24.7911 KOps/s | 24.8256 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_reshape_td | 75.0510μs | 45.3240μs | 22.0634 KOps/s | 21.3020 KOps/s | $\color{#35bf28}+3.57\\%$ | | test_view_pytree | 64.2010μs | 39.5374μs | 25.2925 KOps/s | 25.1888 KOps/s | $\color{#35bf28}+0.41\\%$ | | test_view_td | 74.5120μs | 50.4119μs | 19.8366 KOps/s | 18.6545 KOps/s | $\textbf{\color{#35bf28}+6.34\\%}$ | | test_unbind_pytree | 85.7910μs | 38.0886μs | 26.2546 KOps/s | 26.2912 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_unbind_td | 0.4411ms | 46.9754μs | 21.2877 KOps/s | 21.0421 KOps/s | $\color{#35bf28}+1.17\\%$ | | test_split_pytree | 81.0520μs | 51.6913μs | 19.3456 KOps/s | 18.8092 KOps/s | $\color{#35bf28}+2.85\\%$ | | test_split_td | 90.1962ms | 71.1305μs | 14.0587 KOps/s | 13.6975 KOps/s | $\color{#35bf28}+2.64\\%$ | | test_add_pytree | 0.1064ms | 61.9452μs | 16.1433 KOps/s | 16.2481 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_add_td | 0.1286ms | 98.0347μs | 10.2005 KOps/s | 10.3511 KOps/s | $\color{#d91a1a}-1.45\\%$ | | test_compile_add_one_nested[tensordict-compile] | 0.4158ms | 0.2162ms | 4.6249 KOps/s | 4.7560 KOps/s | $\color{#d91a1a}-2.76\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.2607ms | 0.1763ms | 5.6733 KOps/s | 5.7079 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.2065ms | 0.1469ms | 6.8084 KOps/s | 6.7786 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.2962ms | 0.2087ms | 4.7915 KOps/s | 4.9277 KOps/s | $\color{#d91a1a}-2.76\\%$ | | test_compile_copy_nested[tensordict-compile] | 63.4210μs | 22.0345μs | 45.3834 KOps/s | 44.3464 KOps/s | $\color{#35bf28}+2.34\\%$ | | test_compile_copy_nested[tensordict-eager] | 83.8610μs | 49.8410μs | 20.0638 KOps/s | 20.3177 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1317ms | 71.6547μs | 13.9558 KOps/s | 13.8481 KOps/s | $\color{#35bf28}+0.78\\%$ | | test_compile_copy_nested[pytree-eager] | 94.3010μs | 59.5464μs | 16.7936 KOps/s | 16.7579 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.4265ms | 0.3297ms | 3.0329 KOps/s | 2.9834 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.3163ms | 0.2247ms | 4.4504 KOps/s | 4.4946 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.1842ms | 0.1323ms | 7.5613 KOps/s | 7.5286 KOps/s | $\color{#35bf28}+0.43\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.1231ms | 64.5316μs | 15.4963 KOps/s | 14.8986 KOps/s | $\color{#35bf28}+4.01\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.4412ms | 0.3302ms | 3.0283 KOps/s | 3.0059 KOps/s | $\color{#35bf28}+0.74\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.7996ms | 0.7143ms | 1.4000 KOps/s | 1.4867 KOps/s | $\textbf{\color{#d91a1a}-5.83\\%}$ | | test_compile_add_self_flat[tensordict-eager] | 0.3668ms | 0.2731ms | 3.6622 KOps/s | 3.6774 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.3697ms | 0.3339ms | 2.9948 KOps/s | 2.9484 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.1606ms | 81.4643μs | 12.2753 KOps/s | 12.3979 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.2375ms | 0.1334ms | 7.4966 KOps/s | 7.3800 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.6703ms | 0.6034ms | 1.6571 KOps/s | 1.7706 KOps/s | $\textbf{\color{#d91a1a}-6.41\\%}$ | | test_compile_add_self_flat[pytree-compile] | 0.3789ms | 0.3302ms | 3.0286 KOps/s | 2.9747 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_compile_copy_flat[tensordict-compile] | 39.1300μs | 18.0088μs | 55.5284 KOps/s | 53.2394 KOps/s | $\color{#35bf28}+4.30\\%$ | | test_compile_copy_flat[tensordict-eager] | 55.5920μs | 31.9031μs | 31.3449 KOps/s | 31.0879 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1144ms | 75.7340μs | 13.2041 KOps/s | 12.9987 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_compile_copy_flat[pytree-eager] | 0.1079ms | 61.2577μs | 16.3245 KOps/s | 16.3116 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.5585ms | 0.9449ms | 1.0584 KOps/s | 1.0302 KOps/s | $\color{#35bf28}+2.73\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.7584ms | 3.5013ms | 285.6047 Ops/s | 288.0124 Ops/s | $\color{#d91a1a}-0.84\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.5375ms | 0.9284ms | 1.0771 KOps/s | 1.0537 KOps/s | $\color{#35bf28}+2.22\\%$ | | test_compile_assign_and_add[pytree-eager] | 3.6325ms | 3.5590ms | 280.9814 Ops/s | 266.6459 Ops/s | $\textbf{\color{#35bf28}+5.38\\%}$ | | test_compile_indexing[tensor-tensordict-compile] | 0.1750ms | 0.1162ms | 8.6053 KOps/s | 8.8592 KOps/s | $\color{#d91a1a}-2.87\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.2646ms | 69.5397μs | 14.3803 KOps/s | 15.2643 KOps/s | $\textbf{\color{#d91a1a}-5.79\\%}$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.1481ms | 0.1063ms | 9.4087 KOps/s | 9.2088 KOps/s | $\color{#35bf28}+2.17\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 94.7020μs | 51.3754μs | 19.4646 KOps/s | 20.0017 KOps/s | $\color{#d91a1a}-2.69\\%$ | | test_compile_indexing[tensor-pytree-compile] | 0.1696ms | 0.1095ms | 9.1309 KOps/s | 9.4028 KOps/s | $\color{#d91a1a}-2.89\\%$ | | test_compile_indexing[tensor-pytree-eager] | 0.1024ms | 51.0700μs | 19.5810 KOps/s | 21.2301 KOps/s | $\textbf{\color{#d91a1a}-7.77\\%}$ | | test_compile_indexing[slice-tensordict-compile] | 0.1857ms | 0.1423ms | 7.0292 KOps/s | 6.9265 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.1961ms | 28.1407μs | 35.5358 KOps/s | 36.5568 KOps/s | $\color{#d91a1a}-2.79\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.1955ms | 0.1399ms | 7.1504 KOps/s | 7.4161 KOps/s | $\color{#d91a1a}-3.58\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 53.8220μs | 23.4505μs | 42.6429 KOps/s | 43.0689 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.2120ms | 0.1391ms | 7.1909 KOps/s | 7.3987 KOps/s | $\color{#d91a1a}-2.81\\%$ | | test_compile_indexing[slice-pytree-eager] | 68.0720μs | 22.8886μs | 43.6899 KOps/s | 38.0757 KOps/s | $\textbf{\color{#35bf28}+14.74\\%}$ | | test_compile_indexing[int-tensordict-compile] | 0.1966ms | 0.1426ms | 7.0140 KOps/s | 6.9585 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.4902ms | 27.5081μs | 36.3529 KOps/s | 37.2961 KOps/s | $\color{#d91a1a}-2.53\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.1813ms | 0.1370ms | 7.3009 KOps/s | 7.4025 KOps/s | $\color{#d91a1a}-1.37\\%$ | | test_compile_indexing[int-tensorclass-eager] | 54.3420μs | 22.9528μs | 43.5676 KOps/s | 43.8891 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_compile_indexing[int-pytree-compile] | 0.1785ms | 0.1345ms | 7.4368 KOps/s | 7.4213 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_compile_indexing[int-pytree-eager] | 0.4249ms | 23.1225μs | 43.2479 KOps/s | 43.4714 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_mod_add[eager] | 75.8620μs | 39.1542μs | 25.5400 KOps/s | 25.4189 KOps/s | $\color{#35bf28}+0.48\\%$ | | test_mod_add[compile] | 0.1153ms | 69.2739μs | 14.4354 KOps/s | 14.2549 KOps/s | $\color{#35bf28}+1.27\\%$ | | test_mod_add[compile-overhead] | 0.2601ms | 0.1470ms | 6.8026 KOps/s | 6.5826 KOps/s | $\color{#35bf28}+3.34\\%$ | | test_mod_wrap[eager] | 0.3573ms | 0.2641ms | 3.7858 KOps/s | 3.5741 KOps/s | $\textbf{\color{#35bf28}+5.92\\%}$ | | test_mod_wrap[compile] | 1.1915ms | 0.3019ms | 3.3128 KOps/s | 3.2846 KOps/s | $\color{#35bf28}+0.86\\%$ | | test_mod_wrap[compile-overhead] | 7.9664ms | 4.1814ms | 239.1568 Ops/s | 232.2283 Ops/s | $\color{#35bf28}+2.98\\%$ | | test_mod_wrap_and_backward[eager] | 1.6531ms | 1.4701ms | 680.2049 Ops/s | 719.2259 Ops/s | $\textbf{\color{#d91a1a}-5.43\\%}$ | | test_mod_wrap_and_backward[compile] | 1.5685ms | 1.4791ms | 676.0729 Ops/s | 719.2027 Ops/s | $\textbf{\color{#d91a1a}-6.00\\%}$ | | test_mod_wrap_and_backward[compile-overhead] | 1.8434ms | 1.0766ms | 928.8570 Ops/s | 1.0931 KOps/s | $\textbf{\color{#d91a1a}-15.03\\%}$ | | test_seq_add[eager] | 0.1695ms | 0.1187ms | 8.4250 KOps/s | 8.7126 KOps/s | $\color{#d91a1a}-3.30\\%$ | | test_seq_add[compile] | 0.1424ms | 85.4684μs | 11.7002 KOps/s | 11.0175 KOps/s | $\textbf{\color{#35bf28}+6.20\\%}$ | | test_seq_add[compile-overhead] | 0.1743ms | 0.1247ms | 8.0180 KOps/s | 8.0427 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_seq_wrap[eager] | 0.5117ms | 0.4547ms | 2.1991 KOps/s | 2.2946 KOps/s | $\color{#d91a1a}-4.16\\%$ | | test_seq_wrap[compile] | 1.5351ms | 0.3412ms | 2.9310 KOps/s | 2.9383 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_seq_wrap[compile-overhead] | 0.3153s | 0.1466s | 6.8211 Ops/s | 6.8498 Ops/s | $\color{#d91a1a}-0.42\\%$ | | test_func_call_runtime[False-eager] | 0.8157ms | 0.7672ms | 1.3035 KOps/s | 1.2927 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_func_call_runtime[False-compile] | 0.8686ms | 0.8331ms | 1.2003 KOps/s | 1.1434 KOps/s | $\color{#35bf28}+4.98\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.4061ms | 0.3672ms | 2.7232 KOps/s | 2.6106 KOps/s | $\color{#35bf28}+4.32\\%$ | | test_func_call_runtime[True-eager] | 1.0710ms | 1.0225ms | 978.0136 Ops/s | 935.5915 Ops/s | $\color{#35bf28}+4.53\\%$ | | test_func_call_runtime[True-compile] | 0.9315ms | 0.8733ms | 1.1451 KOps/s | 1.0745 KOps/s | $\textbf{\color{#35bf28}+6.57\\%}$ | | test_func_call_runtime[True-compile-overhead] | 0.4771ms | 0.4106ms | 2.4355 KOps/s | 2.4168 KOps/s | $\color{#35bf28}+0.77\\%$ | | test_distributed | 1.3943ms | 72.4989μs | 13.7933 KOps/s | 13.6576 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_tdmodule | 31.6810μs | 16.3799μs | 61.0504 KOps/s | 59.0048 KOps/s | $\color{#35bf28}+3.47\\%$ | | test_tdmodule_dispatch | 51.3800μs | 35.0219μs | 28.5536 KOps/s | 29.3602 KOps/s | $\color{#d91a1a}-2.75\\%$ | | test_tdseq | 34.0410μs | 17.2777μs | 57.8780 KOps/s | 56.6158 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_tdseq_dispatch | 59.2210μs | 36.8534μs | 27.1346 KOps/s | 27.3391 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_instantiation_functorch | 2.2366ms | 2.0805ms | 480.6550 Ops/s | 481.7805 Ops/s | $\color{#d91a1a}-0.23\\%$ | | test_instantiation_td | 2.0650ms | 1.3351ms | 749.0341 Ops/s | 753.1071 Ops/s | $\color{#d91a1a}-0.54\\%$ | | test_exec_functorch | 0.2968ms | 0.2440ms | 4.0977 KOps/s | 4.0644 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_exec_functional_call | 0.3257ms | 0.2476ms | 4.0387 KOps/s | 4.1251 KOps/s | $\color{#d91a1a}-2.09\\%$ | | test_exec_td | 0.3116ms | 0.2456ms | 4.0725 KOps/s | 4.1153 KOps/s | $\color{#d91a1a}-1.04\\%$ | | test_exec_td_decorator | 0.9922ms | 0.3235ms | 3.0910 KOps/s | 3.1437 KOps/s | $\color{#d91a1a}-1.67\\%$ | | test_vmap_mlp_speed[True-True] | 0.8876ms | 0.7190ms | 1.3909 KOps/s | 1.3876 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_vmap_mlp_speed[True-False] | 0.8847ms | 0.7249ms | 1.3795 KOps/s | 1.3925 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_vmap_mlp_speed[False-True] | 0.7016ms | 0.6401ms | 1.5622 KOps/s | 1.5819 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_vmap_mlp_speed[False-False] | 0.7795ms | 0.6410ms | 1.5601 KOps/s | 1.6435 KOps/s | $\textbf{\color{#d91a1a}-5.07\\%}$ | | test_vmap_mlp_speed_decorator[True-True] | 0.9316ms | 0.8002ms | 1.2498 KOps/s | 1.2596 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 1.2944ms | 0.8009ms | 1.2486 KOps/s | 1.2576 KOps/s | $\color{#d91a1a}-0.72\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8411ms | 0.6955ms | 1.4378 KOps/s | 1.4344 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8612ms | 0.6944ms | 1.4400 KOps/s | 1.4290 KOps/s | $\color{#35bf28}+0.77\\%$ | | test_vmap_transformer_speed[True-True] | 9.5928ms | 9.2634ms | 107.9521 Ops/s | 108.9365 Ops/s | $\color{#d91a1a}-0.90\\%$ | | test_vmap_transformer_speed[True-False] | 9.5695ms | 9.4334ms | 106.0063 Ops/s | 107.3416 Ops/s | $\color{#d91a1a}-1.24\\%$ | | test_vmap_transformer_speed[False-True] | 9.5046ms | 9.2689ms | 107.8878 Ops/s | 109.9350 Ops/s | $\color{#d91a1a}-1.86\\%$ | | test_vmap_transformer_speed[False-False] | 9.5658ms | 9.2677ms | 107.9012 Ops/s | 109.2632 Ops/s | $\color{#d91a1a}-1.25\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 22.6707ms | 22.0302ms | 45.3921 Ops/s | 46.6724 Ops/s | $\color{#d91a1a}-2.74\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 22.9037ms | 22.0268ms | 45.3992 Ops/s | 46.9301 Ops/s | $\color{#d91a1a}-3.26\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 22.3543ms | 21.3331ms | 46.8755 Ops/s | 47.1282 Ops/s | $\color{#d91a1a}-0.54\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 22.4319ms | 21.3375ms | 46.8658 Ops/s | 47.2086 Ops/s | $\color{#d91a1a}-0.73\\%$ | | test_to_module_speed[True] | 3.0566ms | 1.5128ms | 661.0208 Ops/s | 670.0223 Ops/s | $\color{#d91a1a}-1.34\\%$ | | test_to_module_speed[False] | 2.0011ms | 1.5071ms | 663.5138 Ops/s | 674.2669 Ops/s | $\color{#d91a1a}-1.59\\%$ | | test_tc_init | 69.9520μs | 37.8319μs | 26.4327 KOps/s | 27.0531 KOps/s | $\color{#d91a1a}-2.29\\%$ | | test_tc_init_nested | 0.1091ms | 78.7286μs | 12.7019 KOps/s | 13.2962 KOps/s | $\color{#d91a1a}-4.47\\%$ | | test_tc_first_layer_tensor | 0.1007ms | 4.0390μs | 247.5886 KOps/s | 252.7337 KOps/s | $\color{#d91a1a}-2.04\\%$ | | test_tc_first_layer_nontensor | 29.6620μs | 4.0408μs | 247.4743 KOps/s | 250.1938 KOps/s | $\color{#d91a1a}-1.09\\%$ | | test_tc_second_layer_tensor | 7.6775μs | 1.2929μs | 773.4486 KOps/s | 777.2255 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_tc_second_layer_nontensor | 26.5310μs | 4.5971μs | 217.5303 KOps/s | 218.2170 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_unbind | 0.3152s | 12.7995ms | 78.1281 Ops/s | 77.4408 Ops/s | $\color{#35bf28}+0.89\\%$ | | test_full_like | 0.6567ms | 0.5788ms | 1.7277 KOps/s | 1.7315 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_zeros_like | 0.2652ms | 0.1976ms | 5.0615 KOps/s | 5.0544 KOps/s | $\color{#35bf28}+0.14\\%$ | | test_ones_like | 0.2299ms | 0.1974ms | 5.0650 KOps/s | 5.0630 KOps/s | $\color{#35bf28}+0.04\\%$ | | test_clone | 0.4477ms | 0.4149ms | 2.4099 KOps/s | 2.4086 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_squeeze | 36.2910μs | 12.0781μs | 82.7947 KOps/s | 82.1742 KOps/s | $\color{#35bf28}+0.76\\%$ | | test_unsqueeze | 0.2648ms | 83.2469μs | 12.0125 KOps/s | 11.4343 KOps/s | $\textbf{\color{#35bf28}+5.06\\%}$ | | test_split | 0.4465ms | 0.1853ms | 5.3969 KOps/s | 5.3988 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_permute | 0.2453ms | 0.1949ms | 5.1309 KOps/s | 4.9707 KOps/s | $\color{#35bf28}+3.22\\%$ | | test_stack | 1.2499ms | 0.9094ms | 1.0996 KOps/s | 1.1159 KOps/s | $\color{#d91a1a}-1.46\\%$ | | test_cat | 1.2523ms | 1.2317ms | 811.8628 Ops/s | 811.9302 Ops/s | $-0.01\\%$ |