pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[BugFix] Refactor map and map_iter #869

Closed vmoens closed 1 month ago

vmoens commented 1 month ago

map and map_iter overlap is the cause of tests flakiness

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ------------------------------------ | | test_plain_set_nested | 40.2050μs | 16.0296μs | 62.3847 KOps/s | 60.3721 KOps/s | $\color{#35bf28}+3.33\\%$ | | test_plain_set_stack_nested | 44.4230μs | 16.2506μs | 61.5360 KOps/s | 59.8859 KOps/s | $\color{#35bf28}+2.76\\%$ | | test_plain_set_nested_inplace | 81.9440μs | 18.4507μs | 54.1985 KOps/s | 52.3307 KOps/s | $\color{#35bf28}+3.57\\%$ | | test_plain_set_stack_nested_inplace | 0.1917ms | 18.3871μs | 54.3858 KOps/s | 53.2159 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_items | 38.7930μs | 2.6735μs | 374.0348 KOps/s | 392.8386 KOps/s | $\color{#d91a1a}-4.79\\%$ | | test_items_nested | 0.5375ms | 0.2676ms | 3.7364 KOps/s | 3.7269 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_items_nested_locked | 1.5699ms | 0.2743ms | 3.6460 KOps/s | 3.6994 KOps/s | $\color{#d91a1a}-1.44\\%$ | | test_items_nested_leaf | 0.1711ms | 78.9013μs | 12.6741 KOps/s | 12.4320 KOps/s | $\color{#35bf28}+1.95\\%$ | | test_items_stack_nested | 0.3269ms | 0.2700ms | 3.7031 KOps/s | 3.7100 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_items_stack_nested_leaf | 0.1427ms | 81.6996μs | 12.2400 KOps/s | 12.5037 KOps/s | $\color{#d91a1a}-2.11\\%$ | | test_items_stack_nested_locked | 0.3268ms | 0.2707ms | 3.6942 KOps/s | 3.6817 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_keys | 43.2510μs | 3.8804μs | 257.7039 KOps/s | 265.2674 KOps/s | $\color{#d91a1a}-2.85\\%$ | | test_keys_nested | 0.2471ms | 0.1393ms | 7.1775 KOps/s | 7.0805 KOps/s | $\color{#35bf28}+1.37\\%$ | | test_keys_nested_locked | 0.7724ms | 0.1475ms | 6.7812 KOps/s | 6.8241 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_keys_nested_leaf | 0.5161ms | 0.1207ms | 8.2828 KOps/s | 8.3612 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_keys_stack_nested | 0.2567ms | 0.1371ms | 7.2942 KOps/s | 7.1343 KOps/s | $\color{#35bf28}+2.24\\%$ | | test_keys_stack_nested_leaf | 0.2075ms | 0.1170ms | 8.5503 KOps/s | 8.3611 KOps/s | $\color{#35bf28}+2.26\\%$ | | test_keys_stack_nested_locked | 0.2544ms | 0.1440ms | 6.9439 KOps/s | 6.8923 KOps/s | $\color{#35bf28}+0.75\\%$ | | test_values | 34.1543μs | 1.1514μs | 868.5143 KOps/s | 842.5309 KOps/s | $\color{#35bf28}+3.08\\%$ | | test_values_nested | 0.1023ms | 50.7431μs | 19.7071 KOps/s | 19.1713 KOps/s | $\color{#35bf28}+2.79\\%$ | | test_values_nested_locked | 0.1315ms | 50.8426μs | 19.6685 KOps/s | 19.0947 KOps/s | $\color{#35bf28}+3.01\\%$ | | test_values_nested_leaf | 78.1370μs | 45.5881μs | 21.9355 KOps/s | 21.2577 KOps/s | $\color{#35bf28}+3.19\\%$ | | test_values_stack_nested | 0.2213ms | 50.8947μs | 19.6484 KOps/s | 19.0470 KOps/s | $\color{#35bf28}+3.16\\%$ | | test_values_stack_nested_leaf | 98.7150μs | 45.8792μs | 21.7964 KOps/s | 21.7823 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_values_stack_nested_locked | 0.1235ms | 51.5422μs | 19.4016 KOps/s | 19.0433 KOps/s | $\color{#35bf28}+1.88\\%$ | | test_membership | 12.5240μs | 1.3379μs | 747.4486 KOps/s | 606.4361 KOps/s | $\textbf{\color{#35bf28}+23.25\\%}$ | | test_membership_nested | 36.6490μs | 3.4656μs | 288.5476 KOps/s | 276.1171 KOps/s | $\color{#35bf28}+4.50\\%$ | | test_membership_nested_leaf | 25.1970μs | 3.5199μs | 284.1026 KOps/s | 284.1747 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_membership_stacked_nested | 18.6350μs | 3.4889μs | 286.6258 KOps/s | 285.3704 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_membership_stacked_nested_leaf | 26.6100μs | 3.5109μs | 284.8298 KOps/s | 287.5884 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_membership_nested_last | 48.5110μs | 4.2591μs | 234.7936 KOps/s | 237.7730 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_membership_nested_leaf_last | 21.3910μs | 4.3131μs | 231.8496 KOps/s | 233.6968 KOps/s | $\color{#d91a1a}-0.79\\%$ | | test_membership_stacked_nested_last | 44.1430μs | 4.2654μs | 234.4467 KOps/s | 73.0217 KOps/s | $\textbf{\color{#35bf28}+221.06\\%}$ | | test_membership_stacked_nested_leaf_last | 30.7080μs | 4.2953μs | 232.8124 KOps/s | 72.4214 KOps/s | $\textbf{\color{#35bf28}+221.47\\%}$ | | test_nested_getleaf | 54.2620μs | 10.7093μs | 93.3765 KOps/s | 90.3978 KOps/s | $\color{#35bf28}+3.30\\%$ | | test_nested_get | 61.7360μs | 10.1787μs | 98.2439 KOps/s | 94.5283 KOps/s | $\color{#35bf28}+3.93\\%$ | | test_stacked_getleaf | 36.8590μs | 10.5528μs | 94.7619 KOps/s | 92.2870 KOps/s | $\color{#35bf28}+2.68\\%$ | | test_stacked_get | 54.6120μs | 9.9420μs | 100.5835 KOps/s | 96.7536 KOps/s | $\color{#35bf28}+3.96\\%$ | | test_nested_getitemleaf | 57.4180μs | 11.2996μs | 88.4985 KOps/s | 86.0878 KOps/s | $\color{#35bf28}+2.80\\%$ | | test_nested_getitem | 35.7310μs | 10.4486μs | 95.7066 KOps/s | 92.1256 KOps/s | $\color{#35bf28}+3.89\\%$ | | test_stacked_getitemleaf | 52.6440μs | 11.0821μs | 90.2359 KOps/s | 87.1729 KOps/s | $\color{#35bf28}+3.51\\%$ | | test_stacked_getitem | 69.4600μs | 10.3176μs | 96.9220 KOps/s | 92.9217 KOps/s | $\color{#35bf28}+4.31\\%$ | | test_lock_nested | 52.8618ms | 0.3846ms | 2.6004 KOps/s | 2.9989 KOps/s | $\textbf{\color{#d91a1a}-13.29\\%}$ | | test_lock_stack_nested | 0.5196ms | 0.3007ms | 3.3250 KOps/s | 3.4193 KOps/s | $\color{#d91a1a}-2.76\\%$ | | test_unlock_nested | 0.7918ms | 0.3361ms | 2.9755 KOps/s | 2.9807 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_unlock_stack_nested | 0.4697ms | 0.3086ms | 3.2405 KOps/s | 3.3178 KOps/s | $\color{#d91a1a}-2.33\\%$ | | test_flatten_speed | 0.2251ms | 99.7408μs | 10.0260 KOps/s | 9.9930 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_unflatten_speed | 0.9371ms | 0.4122ms | 2.4261 KOps/s | 2.3896 KOps/s | $\color{#35bf28}+1.52\\%$ | | test_common_ops | 3.9027ms | 0.7381ms | 1.3549 KOps/s | 1.3502 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_creation | 43.4120μs | 1.9401μs | 515.4446 KOps/s | 508.5992 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_creation_empty | 30.3970μs | 9.5418μs | 104.8018 KOps/s | 98.3620 KOps/s | $\textbf{\color{#35bf28}+6.55\\%}$ | | test_creation_nested_1 | 40.7970μs | 12.3333μs | 81.0811 KOps/s | 76.1622 KOps/s | $\textbf{\color{#35bf28}+6.46\\%}$ | | test_creation_nested_2 | 52.1080μs | 15.6850μs | 63.7551 KOps/s | 61.3272 KOps/s | $\color{#35bf28}+3.96\\%$ | | test_clone | 0.1365ms | 13.0810μs | 76.4469 KOps/s | 74.7007 KOps/s | $\color{#35bf28}+2.34\\%$ | | test_getitem[int] | 34.3550μs | 11.3041μs | 88.4635 KOps/s | 90.7973 KOps/s | $\color{#d91a1a}-2.57\\%$ | | test_getitem[slice_int] | 64.3610μs | 22.3023μs | 44.8385 KOps/s | 45.4824 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_getitem[range] | 85.1800μs | 58.8706μs | 16.9864 KOps/s | 17.2768 KOps/s | $\color{#d91a1a}-1.68\\%$ | | test_getitem[tuple] | 83.9680μs | 18.8731μs | 52.9854 KOps/s | 54.6910 KOps/s | $\color{#d91a1a}-3.12\\%$ | | test_getitem[list] | 94.0870μs | 39.6739μs | 25.2055 KOps/s | 25.3066 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_setitem_dim[int] | 56.8470μs | 32.1501μs | 31.1041 KOps/s | 30.3178 KOps/s | $\color{#35bf28}+2.59\\%$ | | test_setitem_dim[slice_int] | 0.1082ms | 57.8940μs | 17.2730 KOps/s | 16.8333 KOps/s | $\color{#35bf28}+2.61\\%$ | | test_setitem_dim[range] | 0.1284ms | 80.3903μs | 12.4393 KOps/s | 12.3194 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_setitem_dim[tuple] | 99.9880μs | 49.2368μs | 20.3100 KOps/s | 20.3609 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_setitem | 0.1145ms | 21.7336μs | 46.0118 KOps/s | 48.5129 KOps/s | $\textbf{\color{#d91a1a}-5.16\\%}$ | | test_set | 68.9900μs | 19.0437μs | 52.5107 KOps/s | 51.7031 KOps/s | $\color{#35bf28}+1.56\\%$ | | test_set_shared | 1.6119ms | 0.1649ms | 6.0630 KOps/s | 5.9520 KOps/s | $\color{#35bf28}+1.87\\%$ | | test_update | 0.1473ms | 21.6167μs | 46.2605 KOps/s | 45.6578 KOps/s | $\color{#35bf28}+1.32\\%$ | | test_update_nested | 78.2270μs | 30.0457μs | 33.2826 KOps/s | 32.5096 KOps/s | $\color{#35bf28}+2.38\\%$ | | test_update__nested | 89.3780μs | 24.6818μs | 40.5157 KOps/s | 39.5795 KOps/s | $\color{#35bf28}+2.37\\%$ | | test_set_nested | 61.0950μs | 21.2386μs | 47.0840 KOps/s | 47.5805 KOps/s | $\color{#d91a1a}-1.04\\%$ | | test_set_nested_new | 59.5930μs | 25.0746μs | 39.8810 KOps/s | 39.8420 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_select | 0.1152ms | 39.6690μs | 25.2086 KOps/s | 24.9037 KOps/s | $\color{#35bf28}+1.22\\%$ | | test_select_nested | 0.1250ms | 56.3345μs | 17.7511 KOps/s | 17.4869 KOps/s | $\color{#35bf28}+1.51\\%$ | | test_exclude_nested | 0.1920ms | 0.1168ms | 8.5651 KOps/s | 8.4276 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_empty[True] | 0.7368ms | 0.3933ms | 2.5426 KOps/s | 2.4772 KOps/s | $\color{#35bf28}+2.64\\%$ | | test_empty[False] | 4.9172μs | 1.0224μs | 978.1018 KOps/s | 920.0529 KOps/s | $\textbf{\color{#35bf28}+6.31\\%}$ | | test_unbind_speed | 4.3448ms | 0.2427ms | 4.1195 KOps/s | 4.0501 KOps/s | $\color{#35bf28}+1.71\\%$ | | test_unbind_speed_stack0 | 0.4345ms | 0.2394ms | 4.1770 KOps/s | 4.2067 KOps/s | $\color{#d91a1a}-0.71\\%$ | | test_unbind_speed_stack1 | 65.2206ms | 0.7063ms | 1.4158 KOps/s | 1.4563 KOps/s | $\color{#d91a1a}-2.78\\%$ | | test_split | 69.9213ms | 1.5796ms | 633.0865 Ops/s | 634.3451 Ops/s | $\color{#d91a1a}-0.20\\%$ | | test_chunk | 66.6385ms | 1.5711ms | 636.4839 Ops/s | 631.1496 Ops/s | $\color{#35bf28}+0.85\\%$ | | test_creation[device0] | 0.1899ms | 94.0081μs | 10.6374 KOps/s | 10.7150 KOps/s | $\color{#d91a1a}-0.72\\%$ | | test_creation_from_tensor | 3.7297ms | 98.1817μs | 10.1852 KOps/s | 10.3174 KOps/s | $\color{#d91a1a}-1.28\\%$ | | test_add_one[memmap_tensor0] | 77.1350μs | 5.4002μs | 185.1771 KOps/s | 176.7622 KOps/s | $\color{#35bf28}+4.76\\%$ | | test_contiguous[memmap_tensor0] | 12.3130μs | 0.6303μs | 1.5864 MOps/s | 1.5924 MOps/s | $\color{#d91a1a}-0.38\\%$ | | test_stack[memmap_tensor0] | 22.3320μs | 3.5294μs | 283.3312 KOps/s | 272.8818 KOps/s | $\color{#35bf28}+3.83\\%$ | | test_memmaptd_index | 1.0616ms | 0.2626ms | 3.8081 KOps/s | 3.9196 KOps/s | $\color{#d91a1a}-2.85\\%$ | | test_memmaptd_index_astensor | 0.8637ms | 0.3349ms | 2.9863 KOps/s | 3.0351 KOps/s | $\color{#d91a1a}-1.61\\%$ | | test_memmaptd_index_op | 0.9820ms | 0.5974ms | 1.6739 KOps/s | 1.6088 KOps/s | $\color{#35bf28}+4.04\\%$ | | test_serialize_model | 0.1314s | 0.1222s | 8.1812 Ops/s | 7.1584 Ops/s | $\textbf{\color{#35bf28}+14.29\\%}$ | | test_serialize_model_pickle | 0.4369s | 0.4076s | 2.4531 Ops/s | 2.5214 Ops/s | $\color{#d91a1a}-2.71\\%$ | | test_serialize_weights | 0.1275s | 0.1232s | 8.1194 Ops/s | 8.0377 Ops/s | $\color{#35bf28}+1.02\\%$ | | test_serialize_weights_returnearly | 0.1831s | 0.1627s | 6.1464 Ops/s | 6.2450 Ops/s | $\color{#d91a1a}-1.58\\%$ | | test_serialize_weights_pickle | 0.4581s | 0.3993s | 2.5043 Ops/s | 2.3183 Ops/s | $\textbf{\color{#35bf28}+8.02\\%}$ | | test_serialize_weights_filesystem | 0.1510s | 0.1427s | 7.0067 Ops/s | 6.9216 Ops/s | $\color{#35bf28}+1.23\\%$ | | test_serialize_model_filesystem | 0.2117s | 0.1613s | 6.1978 Ops/s | 6.0463 Ops/s | $\color{#35bf28}+2.51\\%$ | | test_reshape_pytree | 56.0150μs | 26.1789μs | 38.1987 KOps/s | 37.5226 KOps/s | $\color{#35bf28}+1.80\\%$ | | test_reshape_td | 94.5980μs | 34.0707μs | 29.3507 KOps/s | 25.9769 KOps/s | $\textbf{\color{#35bf28}+12.99\\%}$ | | test_view_pytree | 72.3660μs | 26.0578μs | 38.3762 KOps/s | 38.9289 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_view_td | 0.1199ms | 39.9308μs | 25.0434 KOps/s | 26.2762 KOps/s | $\color{#d91a1a}-4.69\\%$ | | test_unbind_pytree | 88.0250μs | 29.4012μs | 34.0122 KOps/s | 33.4746 KOps/s | $\color{#35bf28}+1.61\\%$ | | test_unbind_td | 0.4148ms | 36.3730μs | 27.4929 KOps/s | 27.0072 KOps/s | $\color{#35bf28}+1.80\\%$ | | test_split_pytree | 78.1770μs | 29.2864μs | 34.1455 KOps/s | 34.1630 KOps/s | $\color{#d91a1a}-0.05\\%$ | | test_split_td | 0.1200ms | 38.6188μs | 25.8941 KOps/s | 24.8488 KOps/s | $\color{#35bf28}+4.21\\%$ | | test_add_pytree | 78.8380μs | 35.0809μs | 28.5056 KOps/s | 27.9624 KOps/s | $\color{#35bf28}+1.94\\%$ | | test_add_td | 0.1705ms | 52.7683μs | 18.9508 KOps/s | 17.5645 KOps/s | $\textbf{\color{#35bf28}+7.89\\%}$ | | test_distributed | 0.2342ms | 0.1293ms | 7.7341 KOps/s | 7.5694 KOps/s | $\color{#35bf28}+2.18\\%$ | | test_tdmodule | 43.8420μs | 16.8699μs | 59.2772 KOps/s | 55.9550 KOps/s | $\textbf{\color{#35bf28}+5.94\\%}$ | | test_tdmodule_dispatch | 58.6310μs | 33.9560μs | 29.4499 KOps/s | 28.4372 KOps/s | $\color{#35bf28}+3.56\\%$ | | test_tdseq | 36.6990μs | 19.9850μs | 50.0374 KOps/s | 48.3172 KOps/s | $\color{#35bf28}+3.56\\%$ | | test_tdseq_dispatch | 69.8620μs | 39.5082μs | 25.3112 KOps/s | 24.9890 KOps/s | $\color{#35bf28}+1.29\\%$ | | test_instantiation_functorch | 1.5140ms | 1.3206ms | 757.2305 Ops/s | 752.6308 Ops/s | $\color{#35bf28}+0.61\\%$ | | test_instantiation_td | 1.7512ms | 1.0140ms | 986.2272 Ops/s | 975.9591 Ops/s | $\color{#35bf28}+1.05\\%$ | | test_exec_functorch | 0.2958ms | 0.1627ms | 6.1447 KOps/s | 6.2008 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_exec_functional_call | 0.2931ms | 0.1467ms | 6.8174 KOps/s | 6.6178 KOps/s | $\color{#35bf28}+3.02\\%$ | | test_exec_td | 0.2090ms | 0.1430ms | 6.9936 KOps/s | 6.8677 KOps/s | $\color{#35bf28}+1.83\\%$ | | test_exec_td_decorator | 0.4934ms | 0.2253ms | 4.4376 KOps/s | 4.4607 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_vmap_mlp_speed[True-True] | 1.1909ms | 0.4751ms | 2.1050 KOps/s | 2.0573 KOps/s | $\color{#35bf28}+2.32\\%$ | | test_vmap_mlp_speed[True-False] | 0.7176ms | 0.4739ms | 2.1103 KOps/s | 1.5035 KOps/s | $\textbf{\color{#35bf28}+40.36\\%}$ | | test_vmap_mlp_speed[False-True] | 0.5571ms | 0.3845ms | 2.6011 KOps/s | 2.5222 KOps/s | $\color{#35bf28}+3.13\\%$ | | test_vmap_mlp_speed[False-False] | 0.7035ms | 0.3848ms | 2.5989 KOps/s | 2.5140 KOps/s | $\color{#35bf28}+3.38\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 0.9348ms | 0.5498ms | 1.8189 KOps/s | 1.7746 KOps/s | $\color{#35bf28}+2.49\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9839ms | 0.5531ms | 1.8081 KOps/s | 1.7972 KOps/s | $\color{#35bf28}+0.61\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7327ms | 0.4483ms | 2.2305 KOps/s | 2.1945 KOps/s | $\color{#35bf28}+1.64\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7529ms | 0.4504ms | 2.2203 KOps/s | 2.1735 KOps/s | $\color{#35bf28}+2.15\\%$ | | test_to_module_speed[True] | 2.2343ms | 1.6672ms | 599.7928 Ops/s | 582.2181 Ops/s | $\color{#35bf28}+3.02\\%$ | | test_to_module_speed[False] | 2.6455ms | 1.6500ms | 606.0710 Ops/s | 592.9358 Ops/s | $\color{#35bf28}+2.22\\%$ | | test_tc_init | 0.1072ms | 54.8124μs | 18.2441 KOps/s | 17.2247 KOps/s | $\textbf{\color{#35bf28}+5.92\\%}$ | | test_tc_init_nested | 0.1917ms | 0.1132ms | 8.8356 KOps/s | 8.5377 KOps/s | $\color{#35bf28}+3.49\\%$ | | test_tc_first_layer_tensor | 41.0380μs | 8.4160μs | 118.8214 KOps/s | 121.2852 KOps/s | $\color{#d91a1a}-2.03\\%$ | | test_tc_first_layer_nontensor | 55.5980μs | 8.2654μs | 120.9862 KOps/s | 122.7308 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_tc_second_layer_tensor | 24.0250μs | 2.4346μs | 410.7457 KOps/s | 399.3967 KOps/s | $\color{#35bf28}+2.84\\%$ | | test_tc_second_layer_nontensor | 27.3310μs | 9.2078μs | 108.6034 KOps/s | 107.3908 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_unbind | 94.4853ms | 14.5026ms | 68.9533 Ops/s | 67.2542 Ops/s | $\color{#35bf28}+2.53\\%$ | | test_full_like | 8.6098ms | 7.1176ms | 140.4960 Ops/s | 133.2581 Ops/s | $\textbf{\color{#35bf28}+5.43\\%}$ | | test_zeros_like | 15.1177ms | 7.2147ms | 138.6053 Ops/s | 138.2517 Ops/s | $\color{#35bf28}+0.26\\%$ | | test_ones_like | 12.8273ms | 7.8086ms | 128.0639 Ops/s | 129.6134 Ops/s | $\color{#d91a1a}-1.20\\%$ | | test_clone | 12.4970ms | 9.3310ms | 107.1699 Ops/s | 102.5962 Ops/s | $\color{#35bf28}+4.46\\%$ | | test_squeeze | 64.2210μs | 12.6183μs | 79.2499 KOps/s | 77.5438 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_unsqueeze | 0.2121ms | 95.7199μs | 10.4471 KOps/s | 10.2155 KOps/s | $\color{#35bf28}+2.27\\%$ | | test_split | 0.5870ms | 0.2722ms | 3.6734 KOps/s | 3.6747 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_permute | 0.3689ms | 0.2244ms | 4.4565 KOps/s | 4.5203 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_stack | 32.1951ms | 25.0368ms | 39.9412 Ops/s | 39.1871 Ops/s | $\color{#35bf28}+1.92\\%$ | | test_cat | 30.4995ms | 24.7438ms | 40.4142 Ops/s | 39.2038 Ops/s | $\color{#35bf28}+3.09\\%$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 70.2420μs | 11.7529μs | 85.0856 KOps/s | 85.5506 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_plain_set_stack_nested | 27.6510μs | 11.9143μs | 83.9328 KOps/s | 84.7610 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_plain_set_nested_inplace | 35.6000μs | 13.1393μs | 76.1076 KOps/s | 76.8623 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_plain_set_stack_nested_inplace | 46.1100μs | 13.0626μs | 76.5545 KOps/s | 76.7667 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_items | 0.5451ms | 4.6232μs | 216.3014 KOps/s | 214.9728 KOps/s | $\color{#35bf28}+0.62\\%$ | | test_items_nested | 0.4581ms | 0.3399ms | 2.9420 KOps/s | 2.9565 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_items_nested_locked | 0.4433ms | 0.3481ms | 2.8724 KOps/s | 2.9176 KOps/s | $\color{#d91a1a}-1.55\\%$ | | test_items_nested_leaf | 0.1157ms | 82.4374μs | 12.1304 KOps/s | 12.1095 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_items_stack_nested | 0.4706ms | 0.3491ms | 2.8642 KOps/s | 2.9249 KOps/s | $\color{#d91a1a}-2.07\\%$ | | test_items_stack_nested_leaf | 0.1902ms | 84.3830μs | 11.8507 KOps/s | 11.8080 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_items_stack_nested_locked | 0.4612ms | 0.3456ms | 2.8933 KOps/s | 2.9104 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_keys | 21.0000μs | 4.3488μs | 229.9482 KOps/s | 230.6873 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_keys_nested | 0.1685ms | 67.3015μs | 14.8585 KOps/s | 14.4734 KOps/s | $\color{#35bf28}+2.66\\%$ | | test_keys_nested_locked | 0.7058ms | 74.3354μs | 13.4525 KOps/s | 13.2924 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_keys_nested_leaf | 98.8720μs | 59.4710μs | 16.8149 KOps/s | 16.7847 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_keys_stack_nested | 0.1794ms | 68.4939μs | 14.5998 KOps/s | 14.5054 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_keys_stack_nested_leaf | 0.1343ms | 59.3168μs | 16.8586 KOps/s | 16.6814 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_keys_stack_nested_locked | 0.1748ms | 73.8609μs | 13.5390 KOps/s | 13.5291 KOps/s | $\color{#35bf28}+0.07\\%$ | | test_values | 35.5273μs | 1.8222μs | 548.7980 KOps/s | 547.7677 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_values_nested | 0.1459ms | 35.4016μs | 28.2473 KOps/s | 27.5973 KOps/s | $\color{#35bf28}+2.36\\%$ | | test_values_nested_locked | 0.1473ms | 37.1403μs | 26.9249 KOps/s | 26.3752 KOps/s | $\color{#35bf28}+2.08\\%$ | | test_values_nested_leaf | 0.1378ms | 31.5939μs | 31.6517 KOps/s | 31.2272 KOps/s | $\color{#35bf28}+1.36\\%$ | | test_values_stack_nested | 64.3410μs | 35.9116μs | 27.8462 KOps/s | 27.3183 KOps/s | $\color{#35bf28}+1.93\\%$ | | test_values_stack_nested_leaf | 0.1523ms | 32.3292μs | 30.9318 KOps/s | 30.6358 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_values_stack_nested_locked | 0.1650ms | 37.7468μs | 26.4923 KOps/s | 26.0801 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_membership | 16.1560μs | 0.7238μs | 1.3815 MOps/s | 1.3885 MOps/s | $\color{#d91a1a}-0.50\\%$ | | test_membership_nested | 18.1700μs | 2.4938μs | 401.0024 KOps/s | 398.5812 KOps/s | $\color{#35bf28}+0.61\\%$ | | test_membership_nested_leaf | 0.1110ms | 2.4824μs | 402.8296 KOps/s | 401.7282 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_membership_stacked_nested | 23.5010μs | 2.5124μs | 398.0286 KOps/s | 403.7040 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_membership_stacked_nested_leaf | 0.1199ms | 2.4686μs | 405.0804 KOps/s | 402.1437 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_membership_nested_last | 16.7210μs | 3.0065μs | 332.6090 KOps/s | 326.7222 KOps/s | $\color{#35bf28}+1.80\\%$ | | test_membership_nested_leaf_last | 21.1710μs | 2.9964μs | 333.7357 KOps/s | 328.7370 KOps/s | $\color{#35bf28}+1.52\\%$ | | test_membership_stacked_nested_last | 18.7410μs | 3.0106μs | 332.1630 KOps/s | 287.1113 KOps/s | $\textbf{\color{#35bf28}+15.69\\%}$ | | test_membership_stacked_nested_leaf_last | 20.9610μs | 2.9970μs | 333.6713 KOps/s | 288.7028 KOps/s | $\textbf{\color{#35bf28}+15.58\\%}$ | | test_nested_getleaf | 0.1221ms | 8.4563μs | 118.2553 KOps/s | 117.6818 KOps/s | $\color{#35bf28}+0.49\\%$ | | test_nested_get | 21.7710μs | 7.9098μs | 126.4250 KOps/s | 125.1518 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_stacked_getleaf | 31.8710μs | 8.4868μs | 117.8304 KOps/s | 117.9763 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_stacked_get | 27.5900μs | 7.9078μs | 126.4580 KOps/s | 125.3607 KOps/s | $\color{#35bf28}+0.88\\%$ | | test_nested_getitemleaf | 0.1208ms | 8.9334μs | 111.9391 KOps/s | 114.0462 KOps/s | $\color{#d91a1a}-1.85\\%$ | | test_nested_getitem | 27.2400μs | 8.1563μs | 122.6049 KOps/s | 120.7180 KOps/s | $\color{#35bf28}+1.56\\%$ | | test_stacked_getitemleaf | 26.0810μs | 8.6990μs | 114.9559 KOps/s | 114.6592 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_stacked_getitem | 0.1109ms | 8.0997μs | 123.4609 KOps/s | 122.2968 KOps/s | $\color{#35bf28}+0.95\\%$ | | test_lock_nested | 62.0524ms | 0.4036ms | 2.4776 KOps/s | 2.4910 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_lock_stack_nested | 0.3470ms | 0.2935ms | 3.4067 KOps/s | 3.3872 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_unlock_nested | 64.5078ms | 0.4047ms | 2.4710 KOps/s | 2.4757 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_unlock_stack_nested | 0.4337ms | 0.3035ms | 3.2945 KOps/s | 3.2753 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_flatten_speed | 0.4354ms | 0.1003ms | 9.9690 KOps/s | 9.4818 KOps/s | $\textbf{\color{#35bf28}+5.14\\%}$ | | test_unflatten_speed | 0.3947ms | 0.2865ms | 3.4899 KOps/s | 3.3943 KOps/s | $\color{#35bf28}+2.82\\%$ | | test_common_ops | 1.0512ms | 0.5282ms | 1.8934 KOps/s | 1.8535 KOps/s | $\color{#35bf28}+2.15\\%$ | | test_creation | 36.0410μs | 1.5567μs | 642.3725 KOps/s | 645.4920 KOps/s | $\color{#d91a1a}-0.48\\%$ | | test_creation_empty | 24.7800μs | 5.9958μs | 166.7825 KOps/s | 164.0949 KOps/s | $\color{#35bf28}+1.64\\%$ | | test_creation_nested_1 | 0.1250ms | 7.6660μs | 130.4468 KOps/s | 127.0859 KOps/s | $\color{#35bf28}+2.64\\%$ | | test_creation_nested_2 | 27.7910μs | 9.9752μs | 100.2485 KOps/s | 101.8971 KOps/s | $\color{#d91a1a}-1.62\\%$ | | test_clone | 0.1052ms | 11.0820μs | 90.2367 KOps/s | 88.5485 KOps/s | $\color{#35bf28}+1.91\\%$ | | test_getitem[int] | 29.9010μs | 10.6744μs | 93.6819 KOps/s | 95.6069 KOps/s | $\color{#d91a1a}-2.01\\%$ | | test_getitem[slice_int] | 0.1355ms | 21.1121μs | 47.3662 KOps/s | 50.1015 KOps/s | $\textbf{\color{#d91a1a}-5.46\\%}$ | | test_getitem[range] | 67.2910μs | 48.6648μs | 20.5487 KOps/s | 21.1395 KOps/s | $\color{#d91a1a}-2.79\\%$ | | test_getitem[tuple] | 45.9700μs | 18.5387μs | 53.9412 KOps/s | 55.6722 KOps/s | $\color{#d91a1a}-3.11\\%$ | | test_getitem[list] | 0.1675ms | 33.9139μs | 29.4865 KOps/s | 30.1333 KOps/s | $\color{#d91a1a}-2.15\\%$ | | test_setitem_dim[int] | 39.7200μs | 23.0680μs | 43.3500 KOps/s | 41.2410 KOps/s | $\textbf{\color{#35bf28}+5.11\\%}$ | | test_setitem_dim[slice_int] | 67.4010μs | 45.1822μs | 22.1326 KOps/s | 21.7485 KOps/s | $\color{#35bf28}+1.77\\%$ | | test_setitem_dim[range] | 86.4510μs | 62.2622μs | 16.0611 KOps/s | 15.7868 KOps/s | $\color{#35bf28}+1.74\\%$ | | test_setitem_dim[tuple] | 57.6110μs | 38.3600μs | 26.0688 KOps/s | 25.1830 KOps/s | $\color{#35bf28}+3.52\\%$ | | test_setitem | 40.2010μs | 14.0800μs | 71.0226 KOps/s | 66.7522 KOps/s | $\textbf{\color{#35bf28}+6.40\\%}$ | | test_set | 0.1300ms | 13.8264μs | 72.3252 KOps/s | 71.1701 KOps/s | $\color{#35bf28}+1.62\\%$ | | test_set_shared | 1.9747ms | 0.1007ms | 9.9309 KOps/s | 10.1278 KOps/s | $\color{#d91a1a}-1.94\\%$ | | test_update | 0.1342ms | 15.3651μs | 65.0824 KOps/s | 62.7038 KOps/s | $\color{#35bf28}+3.79\\%$ | | test_update_nested | 0.1367ms | 20.3062μs | 49.2460 KOps/s | 47.7263 KOps/s | $\color{#35bf28}+3.18\\%$ | | test_update__nested | 62.5220μs | 21.1886μs | 47.1953 KOps/s | 46.2299 KOps/s | $\color{#35bf28}+2.09\\%$ | | test_set_nested | 58.7710μs | 14.9981μs | 66.6749 KOps/s | 64.8774 KOps/s | $\color{#35bf28}+2.77\\%$ | | test_set_nested_new | 55.4210μs | 17.5316μs | 57.0398 KOps/s | 55.8230 KOps/s | $\color{#35bf28}+2.18\\%$ | | test_select | 0.1530ms | 29.4320μs | 33.9766 KOps/s | 33.8110 KOps/s | $\color{#35bf28}+0.49\\%$ | | test_select_nested | 0.2152ms | 51.2806μs | 19.5006 KOps/s | 19.1384 KOps/s | $\color{#35bf28}+1.89\\%$ | | test_exclude_nested | 0.2130ms | 0.1057ms | 9.4631 KOps/s | 9.1633 KOps/s | $\color{#35bf28}+3.27\\%$ | | test_empty[True] | 0.4411ms | 0.3390ms | 2.9500 KOps/s | 2.9253 KOps/s | $\color{#35bf28}+0.84\\%$ | | test_empty[False] | 10.7412μs | 0.8211μs | 1.2179 MOps/s | 1.2207 MOps/s | $\color{#d91a1a}-0.22\\%$ | | test_to | 86.4010μs | 59.0460μs | 16.9359 KOps/s | 16.8295 KOps/s | $\color{#35bf28}+0.63\\%$ | | test_to_nonblocking | 0.1618ms | 35.0874μs | 28.5003 KOps/s | 27.5372 KOps/s | $\color{#35bf28}+3.50\\%$ | | test_unbind_speed | 1.6537ms | 0.2576ms | 3.8816 KOps/s | 3.9742 KOps/s | $\color{#d91a1a}-2.33\\%$ | | test_unbind_speed_stack0 | 0.3794ms | 0.2592ms | 3.8586 KOps/s | 3.8424 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_unbind_speed_stack1 | 80.1951ms | 0.7955ms | 1.2570 KOps/s | 1.2666 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_split | 80.2861ms | 1.6360ms | 611.2546 Ops/s | 606.4193 Ops/s | $\color{#35bf28}+0.80\\%$ | | test_chunk | 80.4653ms | 1.6377ms | 610.5974 Ops/s | 600.1816 Ops/s | $\color{#35bf28}+1.74\\%$ | | test_creation[device0] | 0.1354ms | 59.2540μs | 16.8765 KOps/s | 17.8511 KOps/s | $\textbf{\color{#d91a1a}-5.46\\%}$ | | test_creation_from_tensor | 0.1946ms | 55.2520μs | 18.0989 KOps/s | 18.2486 KOps/s | $\color{#d91a1a}-0.82\\%$ | | test_add_one[memmap_tensor0] | 0.1268ms | 7.2954μs | 137.0731 KOps/s | 136.4699 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_contiguous[memmap_tensor0] | 12.8600μs | 0.6392μs | 1.5643 MOps/s | 1.4948 MOps/s | $\color{#35bf28}+4.65\\%$ | | test_stack[memmap_tensor0] | 27.7510μs | 4.5473μs | 219.9107 KOps/s | 215.3346 KOps/s | $\color{#35bf28}+2.13\\%$ | | test_memmaptd_index | 0.4826ms | 0.2620ms | 3.8162 KOps/s | 3.7099 KOps/s | $\color{#35bf28}+2.86\\%$ | | test_memmaptd_index_astensor | 0.6209ms | 0.3276ms | 3.0524 KOps/s | 3.0214 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_memmaptd_index_op | 1.0095ms | 0.5902ms | 1.6943 KOps/s | 1.6639 KOps/s | $\color{#35bf28}+1.82\\%$ | | test_serialize_model | 96.4740ms | 91.3134ms | 10.9513 Ops/s | 10.4853 Ops/s | $\color{#35bf28}+4.44\\%$ | | test_serialize_model_pickle | 1.6682s | 1.3924s | 0.7182 Ops/s | 0.8078 Ops/s | $\textbf{\color{#d91a1a}-11.09\\%}$ | | test_serialize_weights | 92.8061ms | 89.0794ms | 11.2259 Ops/s | 9.5571 Ops/s | $\textbf{\color{#35bf28}+17.46\\%}$ | | test_serialize_weights_returnearly | 0.2602s | 74.9090ms | 13.3495 Ops/s | 13.4300 Ops/s | $\color{#d91a1a}-0.60\\%$ | | test_serialize_weights_pickle | 1.3547s | 1.2495s | 0.8003 Ops/s | 0.8090 Ops/s | $\color{#d91a1a}-1.08\\%$ | | test_reshape_pytree | 69.0010μs | 26.0907μs | 38.3278 KOps/s | 38.1426 KOps/s | $\color{#35bf28}+0.49\\%$ | | test_reshape_td | 0.2259ms | 31.4982μs | 31.7478 KOps/s | 31.9818 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_view_pytree | 0.1317ms | 25.1888μs | 39.7001 KOps/s | 39.4838 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_view_td | 0.2509ms | 35.4063μs | 28.2436 KOps/s | 28.3932 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_unbind_pytree | 0.1165ms | 31.9175μs | 31.3308 KOps/s | 31.3685 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_unbind_td | 0.5488ms | 39.9263μs | 25.0462 KOps/s | 25.1465 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_split_pytree | 0.1264ms | 33.5092μs | 29.8425 KOps/s | 29.4084 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_split_td | 0.2492ms | 37.1813μs | 26.8952 KOps/s | 26.8509 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_add_pytree | 0.1492ms | 37.6194μs | 26.5820 KOps/s | 25.9678 KOps/s | $\color{#35bf28}+2.37\\%$ | | test_add_td | 86.4910μs | 46.2051μs | 21.6426 KOps/s | 20.4138 KOps/s | $\textbf{\color{#35bf28}+6.02\\%}$ | | test_distributed | 0.3192ms | 67.5578μs | 14.8021 KOps/s | 13.9387 KOps/s | $\textbf{\color{#35bf28}+6.19\\%}$ | | test_tdmodule | 0.1184ms | 14.3639μs | 69.6188 KOps/s | 70.7826 KOps/s | $\color{#d91a1a}-1.64\\%$ | | test_tdmodule_dispatch | 41.7410μs | 26.4813μs | 37.7625 KOps/s | 37.5571 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_tdseq | 42.3100μs | 15.1855μs | 65.8522 KOps/s | 68.0072 KOps/s | $\color{#d91a1a}-3.17\\%$ | | test_tdseq_dispatch | 47.3510μs | 29.0177μs | 34.4617 KOps/s | 35.1515 KOps/s | $\color{#d91a1a}-1.96\\%$ | | test_instantiation_functorch | 1.6922ms | 1.3741ms | 727.7566 Ops/s | 722.3537 Ops/s | $\color{#35bf28}+0.75\\%$ | | test_instantiation_td | 1.5103ms | 0.9672ms | 1.0339 KOps/s | 939.9115 Ops/s | $\textbf{\color{#35bf28}+10.00\\%}$ | | test_exec_functorch | 0.1826ms | 0.1474ms | 6.7843 KOps/s | 6.6426 KOps/s | $\color{#35bf28}+2.13\\%$ | | test_exec_functional_call | 0.3459ms | 0.1381ms | 7.2398 KOps/s | 7.2059 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_exec_td | 0.1807ms | 0.1348ms | 7.4165 KOps/s | 7.3788 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_exec_td_decorator | 82.7394ms | 0.2316ms | 4.3180 KOps/s | 4.8188 KOps/s | $\textbf{\color{#d91a1a}-10.39\\%}$ | | test_vmap_mlp_speed[True-True] | 0.9255ms | 0.5801ms | 1.7240 KOps/s | 1.7176 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_vmap_mlp_speed[True-False] | 1.4219ms | 0.5875ms | 1.7020 KOps/s | 1.7057 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_vmap_mlp_speed[False-True] | 0.7243ms | 0.5142ms | 1.9450 KOps/s | 1.9005 KOps/s | $\color{#35bf28}+2.34\\%$ | | test_vmap_mlp_speed[False-False] | 0.6992ms | 0.5280ms | 1.8941 KOps/s | 1.9301 KOps/s | $\color{#d91a1a}-1.87\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 0.9496ms | 0.6442ms | 1.5522 KOps/s | 1.5448 KOps/s | $\color{#35bf28}+0.48\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8401ms | 0.6457ms | 1.5487 KOps/s | 1.5464 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8644ms | 0.5999ms | 1.6670 KOps/s | 1.7314 KOps/s | $\color{#d91a1a}-3.72\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8747ms | 0.5883ms | 1.6997 KOps/s | 1.7288 KOps/s | $\color{#d91a1a}-1.68\\%$ | | test_vmap_transformer_speed[True-True] | 8.6884ms | 7.8449ms | 127.4721 Ops/s | 129.2342 Ops/s | $\color{#d91a1a}-1.36\\%$ | | test_vmap_transformer_speed[True-False] | 8.2348ms | 7.8450ms | 127.4697 Ops/s | 129.2543 Ops/s | $\color{#d91a1a}-1.38\\%$ | | test_vmap_transformer_speed[False-True] | 7.8986ms | 7.6550ms | 130.6328 Ops/s | 129.6122 Ops/s | $\color{#35bf28}+0.79\\%$ | | test_vmap_transformer_speed[False-False] | 7.9860ms | 7.6766ms | 130.2654 Ops/s | 129.2557 Ops/s | $\color{#35bf28}+0.78\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 19.5491ms | 19.0355ms | 52.5334 Ops/s | 52.5874 Ops/s | $\color{#d91a1a}-0.10\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 19.0637ms | 18.8045ms | 53.1786 Ops/s | 52.8899 Ops/s | $\color{#35bf28}+0.55\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 19.3170ms | 18.7646ms | 53.2918 Ops/s | 52.9531 Ops/s | $\color{#35bf28}+0.64\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 19.7400ms | 18.7707ms | 53.2744 Ops/s | 53.1633 Ops/s | $\color{#35bf28}+0.21\\%$ | | test_to_module_speed[True] | 1.7102ms | 1.4939ms | 669.4049 Ops/s | 675.4141 Ops/s | $\color{#d91a1a}-0.89\\%$ | | test_to_module_speed[False] | 1.6879ms | 1.4649ms | 682.6462 Ops/s | 690.4117 Ops/s | $\color{#d91a1a}-1.12\\%$ | | test_tc_init | 89.7210μs | 46.8609μs | 21.3397 KOps/s | 21.0841 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_tc_init_nested | 0.3101ms | 92.3822μs | 10.8246 KOps/s | 10.4871 KOps/s | $\color{#35bf28}+3.22\\%$ | | test_tc_first_layer_tensor | 21.4200μs | 3.7141μs | 269.2411 KOps/s | 268.5733 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_tc_first_layer_nontensor | 0.2068ms | 3.7314μs | 267.9963 KOps/s | 264.7157 KOps/s | $\color{#35bf28}+1.24\\%$ | | test_tc_second_layer_tensor | 49.8932μs | 1.2105μs | 826.1051 KOps/s | 774.7455 KOps/s | $\textbf{\color{#35bf28}+6.63\\%}$ | | test_tc_second_layer_nontensor | 74.8210μs | 4.2885μs | 233.1815 KOps/s | 234.6328 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_unbind | 0.1139s | 13.8247ms | 72.3344 Ops/s | 68.6721 Ops/s | $\textbf{\color{#35bf28}+5.33\\%}$ | | test_full_like | 14.4686ms | 14.0305ms | 71.2735 Ops/s | 72.6760 Ops/s | $\color{#d91a1a}-1.93\\%$ | | test_zeros_like | 8.9009ms | 8.1317ms | 122.9753 Ops/s | 123.7403 Ops/s | $\color{#d91a1a}-0.62\\%$ | | test_ones_like | 8.8935ms | 8.1068ms | 123.3539 Ops/s | 123.0359 Ops/s | $\color{#35bf28}+0.26\\%$ | | test_clone | 10.1072ms | 9.7745ms | 102.3075 Ops/s | 101.3743 Ops/s | $\color{#35bf28}+0.92\\%$ | | test_squeeze | 0.1474ms | 10.6231μs | 94.1341 KOps/s | 94.2299 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_unsqueeze | 0.2833ms | 84.9757μs | 11.7681 KOps/s | 12.0852 KOps/s | $\color{#d91a1a}-2.62\\%$ | | test_split | 3.6460ms | 3.1325ms | 319.2371 Ops/s | 325.8140 Ops/s | $\color{#d91a1a}-2.02\\%$ | | test_permute | 0.3701ms | 0.1953ms | 5.1201 KOps/s | 5.1157 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_stack | 28.1064ms | 27.7319ms | 36.0595 Ops/s | 34.9128 Ops/s | $\color{#35bf28}+3.28\\%$ | | test_cat | 27.8200ms | 27.5239ms | 36.3321 Ops/s | 35.1627 Ops/s | $\color{#35bf28}+3.33\\%$ |