pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
808 stars 66 forks source link

[Refactor] Remove `_run_checks` from `__init__` #843

Closed vmoens closed 2 months ago

github-actions[bot] commented 2 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}24$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 35.5760μs | 16.2917μs | 61.3808 KOps/s | 63.9503 KOps/s | $\color{#d91a1a}-4.02\\%$ | | test_plain_set_stack_nested | 59.0810μs | 16.7796μs | 59.5963 KOps/s | 63.5154 KOps/s | $\textbf{\color{#d91a1a}-6.17\\%}$ | | test_plain_set_nested_inplace | 93.6460μs | 18.7067μs | 53.4567 KOps/s | 56.1229 KOps/s | $\color{#d91a1a}-4.75\\%$ | | test_plain_set_stack_nested_inplace | 0.3306ms | 19.2878μs | 51.8461 KOps/s | 56.4593 KOps/s | $\textbf{\color{#d91a1a}-8.17\\%}$ | | test_items | 0.1649ms | 2.7109μs | 368.8748 KOps/s | 373.4862 KOps/s | $\color{#d91a1a}-1.23\\%$ | | test_items_nested | 0.4403ms | 0.2750ms | 3.6363 KOps/s | 3.6405 KOps/s | $\color{#d91a1a}-0.11\\%$ | | test_items_nested_locked | 1.0292ms | 0.2778ms | 3.5996 KOps/s | 3.6124 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_items_nested_leaf | 0.1509ms | 80.0285μs | 12.4956 KOps/s | 12.6756 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_items_stack_nested | 1.4119ms | 0.2824ms | 3.5414 KOps/s | 3.5765 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_items_stack_nested_leaf | 0.1638ms | 81.8633μs | 12.2155 KOps/s | 12.6613 KOps/s | $\color{#d91a1a}-3.52\\%$ | | test_items_stack_nested_locked | 0.4597ms | 0.2782ms | 3.5951 KOps/s | 3.5815 KOps/s | $\color{#35bf28}+0.38\\%$ | | test_keys | 28.7240μs | 3.8321μs | 260.9502 KOps/s | 258.7869 KOps/s | $\color{#35bf28}+0.84\\%$ | | test_keys_nested | 0.2771ms | 0.1400ms | 7.1404 KOps/s | 7.1877 KOps/s | $\color{#d91a1a}-0.66\\%$ | | test_keys_nested_locked | 0.7709ms | 0.1443ms | 6.9294 KOps/s | 6.8731 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_keys_nested_leaf | 0.2440ms | 0.1200ms | 8.3326 KOps/s | 8.4673 KOps/s | $\color{#d91a1a}-1.59\\%$ | | test_keys_stack_nested | 0.2667ms | 0.1389ms | 7.1988 KOps/s | 7.2350 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_keys_stack_nested_leaf | 0.2346ms | 0.1168ms | 8.5624 KOps/s | 8.5591 KOps/s | $\color{#35bf28}+0.04\\%$ | | test_keys_stack_nested_locked | 0.2759ms | 0.1430ms | 6.9941 KOps/s | 7.0006 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_values | 6.0592μs | 1.2190μs | 820.3720 KOps/s | 878.0777 KOps/s | $\textbf{\color{#d91a1a}-6.57\\%}$ | | test_values_nested | 91.1310μs | 50.3862μs | 19.8467 KOps/s | 19.6291 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_values_nested_locked | 0.1043ms | 50.0045μs | 19.9982 KOps/s | 19.6537 KOps/s | $\color{#35bf28}+1.75\\%$ | | test_values_nested_leaf | 1.7468ms | 45.9350μs | 21.7699 KOps/s | 21.9416 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_values_stack_nested | 0.1025ms | 50.9564μs | 19.6246 KOps/s | 19.2771 KOps/s | $\color{#35bf28}+1.80\\%$ | | test_values_stack_nested_leaf | 95.3390μs | 45.5471μs | 21.9553 KOps/s | 21.8465 KOps/s | $\color{#35bf28}+0.50\\%$ | | test_values_stack_nested_locked | 97.6340μs | 50.7818μs | 19.6921 KOps/s | 19.2052 KOps/s | $\color{#35bf28}+2.54\\%$ | | test_membership | 33.4430μs | 1.3813μs | 723.9653 KOps/s | 740.5928 KOps/s | $\color{#d91a1a}-2.25\\%$ | | test_membership_nested | 29.5050μs | 3.5286μs | 283.3983 KOps/s | 291.7891 KOps/s | $\color{#d91a1a}-2.88\\%$ | | test_membership_nested_leaf | 28.9140μs | 3.5447μs | 282.1112 KOps/s | 286.1673 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_membership_stacked_nested | 27.7320μs | 3.5246μs | 283.7168 KOps/s | 293.2549 KOps/s | $\color{#d91a1a}-3.25\\%$ | | test_membership_stacked_nested_leaf | 22.6130μs | 3.5330μs | 283.0435 KOps/s | 286.5922 KOps/s | $\color{#d91a1a}-1.24\\%$ | | test_membership_nested_last | 29.3550μs | 4.3000μs | 232.5555 KOps/s | 235.0121 KOps/s | $\color{#d91a1a}-1.05\\%$ | | test_membership_nested_leaf_last | 29.9560μs | 4.3222μs | 231.3618 KOps/s | 237.0276 KOps/s | $\color{#d91a1a}-2.39\\%$ | | test_membership_stacked_nested_last | 40.7170μs | 4.3334μs | 230.7647 KOps/s | 237.8379 KOps/s | $\color{#d91a1a}-2.97\\%$ | | test_membership_stacked_nested_leaf_last | 26.9710μs | 4.2883μs | 233.1944 KOps/s | 236.7567 KOps/s | $\color{#d91a1a}-1.50\\%$ | | test_nested_getleaf | 47.1080μs | 10.5740μs | 94.5715 KOps/s | 92.3728 KOps/s | $\color{#35bf28}+2.38\\%$ | | test_nested_get | 49.1520μs | 10.0336μs | 99.6655 KOps/s | 97.6651 KOps/s | $\color{#35bf28}+2.05\\%$ | | test_stacked_getleaf | 39.3740μs | 10.6065μs | 94.2815 KOps/s | 93.3179 KOps/s | $\color{#35bf28}+1.03\\%$ | | test_stacked_get | 34.5750μs | 9.9907μs | 100.0927 KOps/s | 99.8927 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_nested_getitemleaf | 57.1870μs | 11.1590μs | 89.6136 KOps/s | 88.2549 KOps/s | $\color{#35bf28}+1.54\\%$ | | test_nested_getitem | 52.6590μs | 10.1768μs | 98.2625 KOps/s | 96.1874 KOps/s | $\color{#35bf28}+2.16\\%$ | | test_stacked_getitemleaf | 65.0920μs | 11.1400μs | 89.7663 KOps/s | 89.0873 KOps/s | $\color{#35bf28}+0.76\\%$ | | test_stacked_getitem | 36.9590μs | 10.2409μs | 97.6478 KOps/s | 97.5005 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_lock_nested | 0.8744ms | 0.3360ms | 2.9761 KOps/s | 2.9233 KOps/s | $\color{#35bf28}+1.80\\%$ | | test_lock_stack_nested | 0.5632ms | 0.3038ms | 3.2921 KOps/s | 3.2679 KOps/s | $\color{#35bf28}+0.74\\%$ | | test_unlock_nested | 0.7513ms | 0.3366ms | 2.9708 KOps/s | 2.8479 KOps/s | $\color{#35bf28}+4.32\\%$ | | test_unlock_stack_nested | 0.4705ms | 0.3092ms | 3.2346 KOps/s | 3.1761 KOps/s | $\color{#35bf28}+1.84\\%$ | | test_flatten_speed | 0.5965ms | 99.5208μs | 10.0482 KOps/s | 10.0088 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_unflatten_speed | 0.8993ms | 0.4178ms | 2.3933 KOps/s | 2.3634 KOps/s | $\color{#35bf28}+1.27\\%$ | | test_common_ops | 3.0222ms | 0.7256ms | 1.3781 KOps/s | 1.4845 KOps/s | $\textbf{\color{#d91a1a}-7.17\\%}$ | | test_creation | 32.9520μs | 1.9522μs | 512.2505 KOps/s | 509.6016 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_creation_empty | 31.4790μs | 10.0522μs | 99.4811 KOps/s | 122.3387 KOps/s | $\textbf{\color{#d91a1a}-18.68\\%}$ | | test_creation_nested_1 | 39.0840μs | 12.9283μs | 77.3499 KOps/s | 92.3064 KOps/s | $\textbf{\color{#d91a1a}-16.20\\%}$ | | test_creation_nested_2 | 51.9570μs | 16.3381μs | 61.2065 KOps/s | 70.5053 KOps/s | $\textbf{\color{#d91a1a}-13.19\\%}$ | | test_clone | 0.1042ms | 13.2032μs | 75.7391 KOps/s | 74.2141 KOps/s | $\color{#35bf28}+2.05\\%$ | | test_getitem[int] | 34.8450μs | 11.1903μs | 89.3630 KOps/s | 86.8984 KOps/s | $\color{#35bf28}+2.84\\%$ | | test_getitem[slice_int] | 70.0010μs | 22.5618μs | 44.3227 KOps/s | 44.0908 KOps/s | $\color{#35bf28}+0.53\\%$ | | test_getitem[range] | 76.3730μs | 57.0086μs | 17.5412 KOps/s | 16.8471 KOps/s | $\color{#35bf28}+4.12\\%$ | | test_getitem[tuple] | 55.3340μs | 18.5531μs | 53.8994 KOps/s | 52.8615 KOps/s | $\color{#35bf28}+1.96\\%$ | | test_getitem[list] | 0.1028ms | 40.2801μs | 24.8261 KOps/s | 25.2007 KOps/s | $\color{#d91a1a}-1.49\\%$ | | test_setitem_dim[int] | 55.1930μs | 33.8692μs | 29.5253 KOps/s | 31.7981 KOps/s | $\textbf{\color{#d91a1a}-7.15\\%}$ | | test_setitem_dim[slice_int] | 0.1043ms | 61.1811μs | 16.3449 KOps/s | 17.2482 KOps/s | $\textbf{\color{#d91a1a}-5.24\\%}$ | | test_setitem_dim[range] | 0.1281ms | 84.8939μs | 11.7794 KOps/s | 12.5689 KOps/s | $\textbf{\color{#d91a1a}-6.28\\%}$ | | test_setitem_dim[tuple] | 83.2160μs | 50.3551μs | 19.8590 KOps/s | 21.5231 KOps/s | $\textbf{\color{#d91a1a}-7.73\\%}$ | | test_setitem | 50.7060μs | 19.9359μs | 50.1607 KOps/s | 52.1624 KOps/s | $\color{#d91a1a}-3.84\\%$ | | test_set | 70.9210μs | 19.1040μs | 52.3451 KOps/s | 55.1652 KOps/s | $\textbf{\color{#d91a1a}-5.11\\%}$ | | test_set_shared | 1.4511ms | 0.1443ms | 6.9305 KOps/s | 6.8912 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_update | 0.1289ms | 22.0608μs | 45.3293 KOps/s | 51.0009 KOps/s | $\textbf{\color{#d91a1a}-11.12\\%}$ | | test_update_nested | 78.5780μs | 30.8919μs | 32.3709 KOps/s | 35.9814 KOps/s | $\textbf{\color{#d91a1a}-10.03\\%}$ | | test_update__nested | 68.7890μs | 24.7133μs | 40.4640 KOps/s | 40.1429 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_set_nested | 0.1008ms | 21.4961μs | 46.5201 KOps/s | 50.3723 KOps/s | $\textbf{\color{#d91a1a}-7.65\\%}$ | | test_set_nested_new | 67.1760μs | 25.1895μs | 39.6990 KOps/s | 41.2566 KOps/s | $\color{#d91a1a}-3.78\\%$ | | test_select | 93.1950μs | 40.6558μs | 24.5967 KOps/s | 24.7427 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_select_nested | 0.1167ms | 57.5285μs | 17.3827 KOps/s | 16.3817 KOps/s | $\textbf{\color{#35bf28}+6.11\\%}$ | | test_exclude_nested | 0.2210ms | 0.1178ms | 8.4872 KOps/s | 8.2005 KOps/s | $\color{#35bf28}+3.50\\%$ | | test_empty[True] | 0.5513ms | 0.3951ms | 2.5313 KOps/s | 2.4592 KOps/s | $\color{#35bf28}+2.93\\%$ | | test_empty[False] | 7.4360μs | 1.0849μs | 921.7156 KOps/s | 858.2779 KOps/s | $\textbf{\color{#35bf28}+7.39\\%}$ | | test_unbind_speed | 1.6962ms | 0.2431ms | 4.1143 KOps/s | 3.8515 KOps/s | $\textbf{\color{#35bf28}+6.83\\%}$ | | test_unbind_speed_stack0 | 0.4567ms | 0.2484ms | 4.0262 KOps/s | 4.0142 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_unbind_speed_stack1 | 64.8664ms | 0.7116ms | 1.4053 KOps/s | 1.3939 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_split | 66.3857ms | 1.5878ms | 629.7952 Ops/s | 610.1197 Ops/s | $\color{#35bf28}+3.22\\%$ | | test_chunk | 70.5214ms | 1.6292ms | 613.7904 Ops/s | 596.5007 Ops/s | $\color{#35bf28}+2.90\\%$ | | test_creation[device0] | 0.1812ms | 85.4147μs | 11.7076 KOps/s | 11.4864 KOps/s | $\color{#35bf28}+1.93\\%$ | | test_creation_from_tensor | 2.9725ms | 88.1753μs | 11.3410 KOps/s | 11.4962 KOps/s | $\color{#d91a1a}-1.35\\%$ | | test_add_one[memmap_tensor0] | 68.3180μs | 5.4127μs | 184.7519 KOps/s | 180.4886 KOps/s | $\color{#35bf28}+2.36\\%$ | | test_contiguous[memmap_tensor0] | 14.1470μs | 0.6406μs | 1.5610 MOps/s | 1.5504 MOps/s | $\color{#35bf28}+0.68\\%$ | | test_stack[memmap_tensor0] | 24.6460μs | 3.5230μs | 283.8459 KOps/s | 260.5583 KOps/s | $\textbf{\color{#35bf28}+8.94\\%}$ | | test_memmaptd_index | 0.5962ms | 0.2625ms | 3.8090 KOps/s | 3.8360 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_memmaptd_index_astensor | 0.5682ms | 0.3296ms | 3.0342 KOps/s | 2.9661 KOps/s | $\color{#35bf28}+2.30\\%$ | | test_memmaptd_index_op | 2.5382ms | 0.6081ms | 1.6444 KOps/s | 1.7261 KOps/s | $\color{#d91a1a}-4.74\\%$ | | test_serialize_model | 0.1670s | 0.1041s | 9.6071 Ops/s | 9.0795 Ops/s | $\textbf{\color{#35bf28}+5.81\\%}$ | | test_serialize_model_pickle | 0.4498s | 0.3763s | 2.6573 Ops/s | 2.5873 Ops/s | $\color{#35bf28}+2.71\\%$ | | test_serialize_weights | 0.1638s | 0.1044s | 9.5764 Ops/s | 9.4960 Ops/s | $\color{#35bf28}+0.85\\%$ | | test_serialize_weights_returnearly | 0.1342s | 0.1191s | 8.3943 Ops/s | 8.4030 Ops/s | $\color{#d91a1a}-0.10\\%$ | | test_serialize_weights_pickle | 0.8568s | 0.5085s | 1.9665 Ops/s | 2.4241 Ops/s | $\textbf{\color{#d91a1a}-18.88\\%}$ | | test_serialize_weights_filesystem | 0.1564s | 0.1015s | 9.8531 Ops/s | 9.7564 Ops/s | $\color{#35bf28}+0.99\\%$ | | test_serialize_model_filesystem | 98.2677ms | 93.2379ms | 10.7253 Ops/s | 10.1973 Ops/s | $\textbf{\color{#35bf28}+5.18\\%}$ | | test_reshape_pytree | 57.5080μs | 25.6521μs | 38.9831 KOps/s | 38.9806 KOps/s | $+0.01\\%$ | | test_reshape_td | 96.7820μs | 34.0252μs | 29.3900 KOps/s | 29.3485 KOps/s | $\color{#35bf28}+0.14\\%$ | | test_view_pytree | 61.8260μs | 25.6221μs | 39.0288 KOps/s | 39.1964 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_view_td | 82.3240μs | 38.5033μs | 25.9718 KOps/s | 26.2880 KOps/s | $\color{#d91a1a}-1.20\\%$ | | test_unbind_pytree | 70.0120μs | 29.5474μs | 33.8439 KOps/s | 33.5525 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_unbind_td | 0.3834ms | 36.8136μs | 27.1639 KOps/s | 26.1077 KOps/s | $\color{#35bf28}+4.05\\%$ | | test_split_pytree | 65.0820μs | 29.5559μs | 33.8342 KOps/s | 34.0460 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_split_td | 0.1216ms | 40.3557μs | 24.7797 KOps/s | 24.3652 KOps/s | $\color{#35bf28}+1.70\\%$ | | test_add_pytree | 83.2960μs | 35.2641μs | 28.3574 KOps/s | 28.4349 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_add_td | 0.1171ms | 54.0301μs | 18.5082 KOps/s | 20.4571 KOps/s | $\textbf{\color{#d91a1a}-9.53\\%}$ | | test_distributed | 0.2382ms | 0.1022ms | 9.7812 KOps/s | 9.7361 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_tdmodule | 41.0870μs | 18.2013μs | 54.9411 KOps/s | 59.1108 KOps/s | $\textbf{\color{#d91a1a}-7.05\\%}$ | | test_tdmodule_dispatch | 60.7040μs | 35.3090μs | 28.3214 KOps/s | 29.4347 KOps/s | $\color{#d91a1a}-3.78\\%$ | | test_tdseq | 45.2850μs | 20.5470μs | 48.6689 KOps/s | 51.2473 KOps/s | $\textbf{\color{#d91a1a}-5.03\\%}$ | | test_tdseq_dispatch | 69.4000μs | 40.2404μs | 24.8507 KOps/s | 26.4869 KOps/s | $\textbf{\color{#d91a1a}-6.18\\%}$ | | test_instantiation_functorch | 1.5390ms | 1.3339ms | 749.6884 Ops/s | 743.0078 Ops/s | $\color{#35bf28}+0.90\\%$ | | test_instantiation_td | 68.1281ms | 1.1283ms | 886.3158 Ops/s | 908.4473 Ops/s | $\color{#d91a1a}-2.44\\%$ | | test_exec_functorch | 0.3010ms | 0.1597ms | 6.2623 KOps/s | 6.2356 KOps/s | $\color{#35bf28}+0.43\\%$ | | test_exec_functional_call | 0.2787ms | 0.1475ms | 6.7812 KOps/s | 6.8602 KOps/s | $\color{#d91a1a}-1.15\\%$ | | test_exec_td | 0.2860ms | 0.1448ms | 6.9078 KOps/s | 7.0944 KOps/s | $\color{#d91a1a}-2.63\\%$ | | test_exec_td_decorator | 0.7266ms | 0.2207ms | 4.5300 KOps/s | 4.5784 KOps/s | $\color{#d91a1a}-1.06\\%$ | | test_vmap_mlp_speed[True-True] | 0.7320ms | 0.4908ms | 2.0375 KOps/s | 2.0425 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_vmap_mlp_speed[True-False] | 0.6730ms | 0.4863ms | 2.0564 KOps/s | 2.0740 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_vmap_mlp_speed[False-True] | 0.6884ms | 0.3973ms | 2.5170 KOps/s | 2.5135 KOps/s | $\color{#35bf28}+0.14\\%$ | | test_vmap_mlp_speed[False-False] | 0.7819ms | 0.4005ms | 2.4968 KOps/s | 2.4881 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.2159ms | 0.5629ms | 1.7764 KOps/s | 1.7876 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 2.4098ms | 0.5728ms | 1.7459 KOps/s | 1.7998 KOps/s | $\color{#d91a1a}-2.99\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7151ms | 0.4616ms | 2.1666 KOps/s | 2.1712 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.6919ms | 0.4604ms | 2.1718 KOps/s | 2.1649 KOps/s | $\color{#35bf28}+0.32\\%$ | | test_to_module_speed[True] | 2.5182ms | 1.6523ms | 605.2281 Ops/s | 588.8556 Ops/s | $\color{#35bf28}+2.78\\%$ | | test_to_module_speed[False] | 3.1248ms | 1.6307ms | 613.2331 Ops/s | 599.3089 Ops/s | $\color{#35bf28}+2.32\\%$ | | test_tc_init | 61.5850μs | 28.2471μs | 35.4019 KOps/s | 42.8330 KOps/s | $\textbf{\color{#d91a1a}-17.35\\%}$ | | test_tc_init_nested | 0.1370ms | 56.0339μs | 17.8463 KOps/s | 19.7570 KOps/s | $\textbf{\color{#d91a1a}-9.67\\%}$ | | test_tc_first_layer_tensor | 2.0083μs | 0.6796μs | 1.4714 MOps/s | 1.3826 MOps/s | $\textbf{\color{#35bf28}+6.42\\%}$ | | test_tc_first_layer_nontensor | 2.3050μs | 0.6790μs | 1.4727 MOps/s | 1.4485 MOps/s | $\color{#35bf28}+1.67\\%$ | | test_tc_second_layer_tensor | 24.0150μs | 1.8164μs | 550.5351 KOps/s | 527.5405 KOps/s | $\color{#35bf28}+4.36\\%$ | | test_tc_second_layer_nontensor | 14.8277μs | 1.5248μs | 655.8179 KOps/s | 641.2365 KOps/s | $\color{#35bf28}+2.27\\%$ | | test_unbind | 81.8774ms | 7.3272ms | 136.4781 Ops/s | 142.3885 Ops/s | $\color{#d91a1a}-4.15\\%$ | | test_full_like | 17.3456ms | 10.1812ms | 98.2200 Ops/s | 135.6269 Ops/s | $\textbf{\color{#d91a1a}-27.58\\%}$ | | test_zeros_like | 11.3164ms | 5.8201ms | 171.8188 Ops/s | 185.6708 Ops/s | $\textbf{\color{#d91a1a}-7.46\\%}$ | | test_ones_like | 13.7723ms | 6.2463ms | 160.0959 Ops/s | 161.2244 Ops/s | $\color{#d91a1a}-0.70\\%$ | | test_clone | 12.0721ms | 7.7127ms | 129.6561 Ops/s | 129.7747 Ops/s | $\color{#d91a1a}-0.09\\%$ | | test_squeeze | 65.2820μs | 14.1348μs | 70.7473 KOps/s | 71.1446 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_unsqueeze | 0.2221ms | 60.5870μs | 16.5052 KOps/s | 16.6542 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_split | 0.2076ms | 0.1125ms | 8.8916 KOps/s | 8.8995 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_permute | 0.2177ms | 0.1291ms | 7.7465 KOps/s | 7.7891 KOps/s | $\color{#d91a1a}-0.55\\%$ | | test_stack | 27.0478ms | 21.9701ms | 45.5163 Ops/s | 44.9185 Ops/s | $\color{#35bf28}+1.33\\%$ | | test_cat | 29.6998ms | 21.9525ms | 45.5529 Ops/s | 44.5830 Ops/s | $\color{#35bf28}+2.18\\%$ |
github-actions[bot] commented 2 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}25$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ------------------------------------ | | test_plain_set_nested | 0.1054ms | 12.3205μs | 81.1654 KOps/s | 84.4475 KOps/s | $\color{#d91a1a}-3.89\\%$ | | test_plain_set_stack_nested | 32.5310μs | 12.4415μs | 80.3764 KOps/s | 83.4680 KOps/s | $\color{#d91a1a}-3.70\\%$ | | test_plain_set_nested_inplace | 0.1192ms | 13.8102μs | 72.4105 KOps/s | 75.4174 KOps/s | $\color{#d91a1a}-3.99\\%$ | | test_plain_set_stack_nested_inplace | 31.6410μs | 13.8545μs | 72.1788 KOps/s | 75.8325 KOps/s | $\color{#d91a1a}-4.82\\%$ | | test_items | 24.9010μs | 4.5824μs | 218.2274 KOps/s | 210.2807 KOps/s | $\color{#35bf28}+3.78\\%$ | | test_items_nested | 0.3754ms | 0.3423ms | 2.9210 KOps/s | 2.9504 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_items_nested_locked | 0.3927ms | 0.3521ms | 2.8398 KOps/s | 2.9250 KOps/s | $\color{#d91a1a}-2.91\\%$ | | test_items_nested_leaf | 95.8120μs | 82.1363μs | 12.1749 KOps/s | 12.1853 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_items_stack_nested | 0.4363ms | 0.3470ms | 2.8822 KOps/s | 2.9404 KOps/s | $\color{#d91a1a}-1.98\\%$ | | test_items_stack_nested_leaf | 98.9310μs | 83.0537μs | 12.0404 KOps/s | 12.1986 KOps/s | $\color{#d91a1a}-1.30\\%$ | | test_items_stack_nested_locked | 0.4445ms | 0.3492ms | 2.8639 KOps/s | 2.9153 KOps/s | $\color{#d91a1a}-1.76\\%$ | | test_keys | 20.3510μs | 4.3356μs | 230.6472 KOps/s | 230.8098 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_keys_nested | 84.7510μs | 69.1943μs | 14.4521 KOps/s | 14.8144 KOps/s | $\color{#d91a1a}-2.45\\%$ | | test_keys_nested_locked | 0.8360ms | 74.7819μs | 13.3722 KOps/s | 13.7540 KOps/s | $\color{#d91a1a}-2.78\\%$ | | test_keys_nested_leaf | 74.1310μs | 60.1818μs | 16.6163 KOps/s | 16.8741 KOps/s | $\color{#d91a1a}-1.53\\%$ | | test_keys_stack_nested | 0.1879ms | 68.9785μs | 14.4973 KOps/s | 15.2547 KOps/s | $\color{#d91a1a}-4.97\\%$ | | test_keys_stack_nested_leaf | 80.2720μs | 57.6020μs | 17.3605 KOps/s | 17.5925 KOps/s | $\color{#d91a1a}-1.32\\%$ | | test_keys_stack_nested_locked | 95.2120μs | 74.1113μs | 13.4932 KOps/s | 13.8143 KOps/s | $\color{#d91a1a}-2.32\\%$ | | test_values | 6.3600μs | 1.7990μs | 555.8573 KOps/s | 556.8301 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_values_nested | 0.1426ms | 35.7366μs | 27.9825 KOps/s | 28.6165 KOps/s | $\color{#d91a1a}-2.22\\%$ | | test_values_nested_locked | 58.8110μs | 37.0613μs | 26.9823 KOps/s | 27.2928 KOps/s | $\color{#d91a1a}-1.14\\%$ | | test_values_nested_leaf | 48.3410μs | 31.5166μs | 31.7293 KOps/s | 32.2533 KOps/s | $\color{#d91a1a}-1.62\\%$ | | test_values_stack_nested | 0.2065ms | 35.5779μs | 28.1073 KOps/s | 28.1334 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_values_stack_nested_leaf | 0.2263ms | 32.1321μs | 31.1215 KOps/s | 32.0012 KOps/s | $\color{#d91a1a}-2.75\\%$ | | test_values_stack_nested_locked | 0.1180ms | 37.3646μs | 26.7633 KOps/s | 26.7385 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_membership | 3.0343μs | 0.7381μs | 1.3548 MOps/s | 1.4097 MOps/s | $\color{#d91a1a}-3.90\\%$ | | test_membership_nested | 17.6200μs | 2.5537μs | 391.5899 KOps/s | 385.3331 KOps/s | $\color{#35bf28}+1.62\\%$ | | test_membership_nested_leaf | 19.2320μs | 2.5980μs | 384.9045 KOps/s | 385.5847 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_membership_stacked_nested | 25.3210μs | 2.5604μs | 390.5703 KOps/s | 382.1911 KOps/s | $\color{#35bf28}+2.19\\%$ | | test_membership_stacked_nested_leaf | 19.3400μs | 2.5558μs | 391.2672 KOps/s | 389.4248 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_membership_nested_last | 95.4110μs | 3.1159μs | 320.9299 KOps/s | 321.4702 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_membership_nested_leaf_last | 20.4110μs | 3.1005μs | 322.5235 KOps/s | 318.1129 KOps/s | $\color{#35bf28}+1.39\\%$ | | test_membership_stacked_nested_last | 30.3520μs | 3.1118μs | 321.3576 KOps/s | 102.2696 KOps/s | $\textbf{\color{#35bf28}+214.23\\%}$ | | test_membership_stacked_nested_leaf_last | 0.1427ms | 3.1397μs | 318.5023 KOps/s | 101.4019 KOps/s | $\textbf{\color{#35bf28}+214.10\\%}$ | | test_nested_getleaf | 25.0500μs | 8.4351μs | 118.5518 KOps/s | 120.1229 KOps/s | $\color{#d91a1a}-1.31\\%$ | | test_nested_get | 0.1634ms | 7.8723μs | 127.0272 KOps/s | 128.1837 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_stacked_getleaf | 0.1876ms | 8.4484μs | 118.3657 KOps/s | 119.3448 KOps/s | $\color{#d91a1a}-0.82\\%$ | | test_stacked_get | 0.1870ms | 7.9020μs | 126.5498 KOps/s | 127.5033 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_nested_getitemleaf | 0.1943ms | 8.6712μs | 115.3248 KOps/s | 116.7531 KOps/s | $\color{#d91a1a}-1.22\\%$ | | test_nested_getitem | 0.1841ms | 8.0739μs | 123.8556 KOps/s | 124.7826 KOps/s | $\color{#d91a1a}-0.74\\%$ | | test_stacked_getitemleaf | 0.1919ms | 8.6370μs | 115.7814 KOps/s | 117.5585 KOps/s | $\color{#d91a1a}-1.51\\%$ | | test_stacked_getitem | 22.6810μs | 8.0391μs | 124.3925 KOps/s | 125.0298 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_lock_nested | 61.6632ms | 0.4051ms | 2.4684 KOps/s | 2.4201 KOps/s | $\color{#35bf28}+2.00\\%$ | | test_lock_stack_nested | 0.4181ms | 0.2915ms | 3.4305 KOps/s | 3.3062 KOps/s | $\color{#35bf28}+3.76\\%$ | | test_unlock_nested | 63.3505ms | 0.3999ms | 2.5006 KOps/s | 2.3870 KOps/s | $\color{#35bf28}+4.76\\%$ | | test_unlock_stack_nested | 0.4164ms | 0.2994ms | 3.3396 KOps/s | 3.2149 KOps/s | $\color{#35bf28}+3.88\\%$ | | test_flatten_speed | 0.5132ms | 0.1029ms | 9.7139 KOps/s | 9.8853 KOps/s | $\color{#d91a1a}-1.73\\%$ | | test_unflatten_speed | 0.3180ms | 0.2967ms | 3.3699 KOps/s | 3.4329 KOps/s | $\color{#d91a1a}-1.83\\%$ | | test_common_ops | 1.0274ms | 0.5525ms | 1.8099 KOps/s | 1.7538 KOps/s | $\color{#35bf28}+3.20\\%$ | | test_creation | 20.0800μs | 1.6174μs | 618.2773 KOps/s | 626.1634 KOps/s | $\color{#d91a1a}-1.26\\%$ | | test_creation_empty | 22.8300μs | 7.1454μs | 139.9502 KOps/s | 147.7475 KOps/s | $\textbf{\color{#d91a1a}-5.28\\%}$ | | test_creation_nested_1 | 27.5800μs | 8.8680μs | 112.7645 KOps/s | 116.1541 KOps/s | $\color{#d91a1a}-2.92\\%$ | | test_creation_nested_2 | 32.4210μs | 10.9540μs | 91.2910 KOps/s | 93.1308 KOps/s | $\color{#d91a1a}-1.98\\%$ | | test_clone | 97.9220μs | 11.8826μs | 84.1570 KOps/s | 76.7796 KOps/s | $\textbf{\color{#35bf28}+9.61\\%}$ | | test_getitem[int] | 25.5810μs | 10.8455μs | 92.2043 KOps/s | 87.5244 KOps/s | $\textbf{\color{#35bf28}+5.35\\%}$ | | test_getitem[slice_int] | 0.1523ms | 21.1017μs | 47.3896 KOps/s | 45.5933 KOps/s | $\color{#35bf28}+3.94\\%$ | | test_getitem[range] | 67.7110μs | 49.4730μs | 20.2130 KOps/s | 20.0856 KOps/s | $\color{#35bf28}+0.63\\%$ | | test_getitem[tuple] | 39.5910μs | 18.3851μs | 54.3920 KOps/s | 51.3167 KOps/s | $\textbf{\color{#35bf28}+5.99\\%}$ | | test_getitem[list] | 0.1691ms | 33.9860μs | 29.4239 KOps/s | 28.1049 KOps/s | $\color{#35bf28}+4.69\\%$ | | test_setitem_dim[int] | 53.4610μs | 29.6050μs | 33.7781 KOps/s | 36.7714 KOps/s | $\textbf{\color{#d91a1a}-8.14\\%}$ | | test_setitem_dim[slice_int] | 0.1723ms | 50.9635μs | 19.6219 KOps/s | 20.7680 KOps/s | $\textbf{\color{#d91a1a}-5.52\\%}$ | | test_setitem_dim[range] | 86.9410μs | 67.0836μs | 14.9068 KOps/s | 15.2242 KOps/s | $\color{#d91a1a}-2.08\\%$ | | test_setitem_dim[tuple] | 0.1465ms | 42.7212μs | 23.4076 KOps/s | 24.2776 KOps/s | $\color{#d91a1a}-3.58\\%$ | | test_setitem | 41.3710μs | 15.5260μs | 64.4080 KOps/s | 59.4785 KOps/s | $\textbf{\color{#35bf28}+8.29\\%}$ | | test_set | 0.1652ms | 14.9818μs | 66.7477 KOps/s | 61.6522 KOps/s | $\textbf{\color{#35bf28}+8.26\\%}$ | | test_set_shared | 1.8044ms | 99.0431μs | 10.0966 KOps/s | 9.8727 KOps/s | $\color{#35bf28}+2.27\\%$ | | test_update | 0.1388ms | 17.3731μs | 57.5604 KOps/s | 55.0756 KOps/s | $\color{#35bf28}+4.51\\%$ | | test_update_nested | 70.2610μs | 23.0550μs | 43.3744 KOps/s | 42.5917 KOps/s | $\color{#35bf28}+1.84\\%$ | | test_update__nested | 0.1155ms | 23.1032μs | 43.2841 KOps/s | 40.5003 KOps/s | $\textbf{\color{#35bf28}+6.87\\%}$ | | test_set_nested | 58.9810μs | 16.1902μs | 61.7656 KOps/s | 57.8917 KOps/s | $\textbf{\color{#35bf28}+6.69\\%}$ | | test_set_nested_new | 58.0810μs | 18.6095μs | 53.7360 KOps/s | 50.0982 KOps/s | $\textbf{\color{#35bf28}+7.26\\%}$ | | test_select | 68.6410μs | 31.0666μs | 32.1889 KOps/s | 30.1536 KOps/s | $\textbf{\color{#35bf28}+6.75\\%}$ | | test_select_nested | 0.6731ms | 51.0178μs | 19.6010 KOps/s | 18.5592 KOps/s | $\textbf{\color{#35bf28}+5.61\\%}$ | | test_exclude_nested | 0.1274ms | 0.1060ms | 9.4304 KOps/s | 9.0864 KOps/s | $\color{#35bf28}+3.79\\%$ | | test_empty[True] | 0.3626ms | 0.3439ms | 2.9077 KOps/s | 2.8970 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_empty[False] | 2.3681μs | 0.7949μs | 1.2581 MOps/s | 1.0425 MOps/s | $\textbf{\color{#35bf28}+20.68\\%}$ | | test_to | 89.8320μs | 60.0960μs | 16.6400 KOps/s | 12.6320 KOps/s | $\textbf{\color{#35bf28}+31.73\\%}$ | | test_to_nonblocking | 0.1882ms | 36.8023μs | 27.1722 KOps/s | 15.7843 KOps/s | $\textbf{\color{#35bf28}+72.15\\%}$ | | test_unbind_speed | 0.3848ms | 0.2522ms | 3.9644 KOps/s | 3.6694 KOps/s | $\textbf{\color{#35bf28}+8.04\\%}$ | | test_unbind_speed_stack0 | 0.4365ms | 0.2536ms | 3.9426 KOps/s | 3.7259 KOps/s | $\textbf{\color{#35bf28}+5.82\\%}$ | | test_unbind_speed_stack1 | 80.3425ms | 0.7816ms | 1.2795 KOps/s | 1.2257 KOps/s | $\color{#35bf28}+4.39\\%$ | | test_split | 80.9181ms | 1.7195ms | 581.5723 Ops/s | 553.3714 Ops/s | $\textbf{\color{#35bf28}+5.10\\%}$ | | test_chunk | 1.7900ms | 1.5850ms | 630.9189 Ops/s | 601.1827 Ops/s | $\color{#35bf28}+4.95\\%$ | | test_creation[device0] | 0.2061ms | 57.4640μs | 17.4022 KOps/s | 17.1146 KOps/s | $\color{#35bf28}+1.68\\%$ | | test_creation_from_tensor | 0.2269ms | 55.2570μs | 18.0973 KOps/s | 18.3337 KOps/s | $\color{#d91a1a}-1.29\\%$ | | test_add_one[memmap_tensor0] | 95.9210μs | 7.3471μs | 136.1077 KOps/s | 126.4104 KOps/s | $\textbf{\color{#35bf28}+7.67\\%}$ | | test_contiguous[memmap_tensor0] | 18.8800μs | 0.6699μs | 1.4928 MOps/s | 1.4951 MOps/s | $\color{#d91a1a}-0.15\\%$ | | test_stack[memmap_tensor0] | 30.1810μs | 5.0470μs | 198.1387 KOps/s | 177.6881 KOps/s | $\textbf{\color{#35bf28}+11.51\\%}$ | | test_memmaptd_index | 80.3292ms | 0.3836ms | 2.6069 KOps/s | 3.1752 KOps/s | $\textbf{\color{#d91a1a}-17.90\\%}$ | | test_memmaptd_index_astensor | 0.6285ms | 0.3446ms | 2.9018 KOps/s | 2.5870 KOps/s | $\textbf{\color{#35bf28}+12.17\\%}$ | | test_memmaptd_index_op | 0.9169ms | 0.6275ms | 1.5936 KOps/s | 1.4400 KOps/s | $\textbf{\color{#35bf28}+10.66\\%}$ | | test_serialize_model | 99.3285ms | 94.8956ms | 10.5379 Ops/s | 10.0873 Ops/s | $\color{#35bf28}+4.47\\%$ | | test_serialize_model_pickle | 1.3471s | 1.2358s | 0.8092 Ops/s | 0.8086 Ops/s | $\color{#35bf28}+0.07\\%$ | | test_serialize_weights | 0.1792s | 0.1024s | 9.7631 Ops/s | 9.1499 Ops/s | $\textbf{\color{#35bf28}+6.70\\%}$ | | test_serialize_weights_returnearly | 0.2853s | 88.4362ms | 11.3076 Ops/s | 14.5608 Ops/s | $\textbf{\color{#d91a1a}-22.34\\%}$ | | test_serialize_weights_pickle | 1.3495s | 1.2353s | 0.8095 Ops/s | 0.8088 Ops/s | $\color{#35bf28}+0.09\\%$ | | test_reshape_pytree | 56.4610μs | 26.5299μs | 37.6934 KOps/s | 37.8630 KOps/s | $\color{#d91a1a}-0.45\\%$ | | test_reshape_td | 0.1555ms | 31.2715μs | 31.9780 KOps/s | 31.5759 KOps/s | $\color{#35bf28}+1.27\\%$ | | test_view_pytree | 0.1735ms | 26.3138μs | 38.0029 KOps/s | 38.4678 KOps/s | $\color{#d91a1a}-1.21\\%$ | | test_view_td | 67.1910μs | 35.5268μs | 28.1478 KOps/s | 26.8138 KOps/s | $\color{#35bf28}+4.98\\%$ | | test_unbind_pytree | 73.7220μs | 31.5785μs | 31.6671 KOps/s | 30.7892 KOps/s | $\color{#35bf28}+2.85\\%$ | | test_unbind_td | 0.4406ms | 39.4998μs | 25.3166 KOps/s | 23.7083 KOps/s | $\textbf{\color{#35bf28}+6.78\\%}$ | | test_split_pytree | 0.1556ms | 35.9342μs | 27.8286 KOps/s | 26.7679 KOps/s | $\color{#35bf28}+3.96\\%$ | | test_split_td | 0.1804ms | 40.0559μs | 24.9651 KOps/s | 23.8012 KOps/s | $\color{#35bf28}+4.89\\%$ | | test_add_pytree | 0.1736ms | 40.3366μs | 24.7914 KOps/s | 24.8877 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_add_td | 0.1876ms | 52.0289μs | 19.2201 KOps/s | 20.1399 KOps/s | $\color{#d91a1a}-4.57\\%$ | | test_distributed | 2.3497ms | 75.9076μs | 13.1739 KOps/s | 10.5989 KOps/s | $\textbf{\color{#35bf28}+24.29\\%}$ | | test_tdmodule | 28.3310μs | 14.0226μs | 71.3134 KOps/s | 73.5420 KOps/s | $\color{#d91a1a}-3.03\\%$ | | test_tdmodule_dispatch | 0.1611ms | 27.7106μs | 36.0873 KOps/s | 36.8450 KOps/s | $\color{#d91a1a}-2.06\\%$ | | test_tdseq | 25.5610μs | 15.5234μs | 64.4188 KOps/s | 63.7447 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_tdseq_dispatch | 46.3110μs | 30.2733μs | 33.0324 KOps/s | 33.0776 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_instantiation_functorch | 1.5993ms | 1.4296ms | 699.5156 Ops/s | 687.7319 Ops/s | $\color{#35bf28}+1.71\\%$ | | test_instantiation_td | 1.4540ms | 0.9827ms | 1.0176 KOps/s | 998.7197 Ops/s | $\color{#35bf28}+1.89\\%$ | | test_exec_functorch | 0.2896ms | 0.1454ms | 6.8789 KOps/s | 6.5852 KOps/s | $\color{#35bf28}+4.46\\%$ | | test_exec_functional_call | 0.3389ms | 0.1377ms | 7.2618 KOps/s | 7.1251 KOps/s | $\color{#35bf28}+1.92\\%$ | | test_exec_td | 0.2962ms | 0.1353ms | 7.3884 KOps/s | 7.1945 KOps/s | $\color{#35bf28}+2.69\\%$ | | test_exec_td_decorator | 0.5822ms | 0.2043ms | 4.8949 KOps/s | 4.7032 KOps/s | $\color{#35bf28}+4.07\\%$ | | test_vmap_mlp_speed[True-True] | 0.7658ms | 0.5746ms | 1.7405 KOps/s | 1.7254 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_vmap_mlp_speed[True-False] | 0.7437ms | 0.5781ms | 1.7298 KOps/s | 1.7233 KOps/s | $\color{#35bf28}+0.38\\%$ | | test_vmap_mlp_speed[False-True] | 0.6992ms | 0.5139ms | 1.9459 KOps/s | 1.9396 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_vmap_mlp_speed[False-False] | 0.6983ms | 0.5156ms | 1.9393 KOps/s | 1.9446 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.5814ms | 0.6369ms | 1.5701 KOps/s | 1.5511 KOps/s | $\color{#35bf28}+1.22\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8117ms | 0.6378ms | 1.5678 KOps/s | 1.5646 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7312ms | 0.5732ms | 1.7447 KOps/s | 1.7547 KOps/s | $\color{#d91a1a}-0.57\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7840ms | 0.5630ms | 1.7762 KOps/s | 1.7564 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_vmap_transformer_speed[True-True] | 8.0988ms | 7.6740ms | 130.3098 Ops/s | 128.4828 Ops/s | $\color{#35bf28}+1.42\\%$ | | test_vmap_transformer_speed[True-False] | 7.8927ms | 7.6349ms | 130.9772 Ops/s | 128.6842 Ops/s | $\color{#35bf28}+1.78\\%$ | | test_vmap_transformer_speed[False-True] | 7.9596ms | 7.5923ms | 131.7122 Ops/s | 128.8564 Ops/s | $\color{#35bf28}+2.22\\%$ | | test_vmap_transformer_speed[False-False] | 8.0866ms | 7.6460ms | 130.7877 Ops/s | 129.1978 Ops/s | $\color{#35bf28}+1.23\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 19.4575ms | 18.7361ms | 53.3729 Ops/s | 52.6646 Ops/s | $\color{#35bf28}+1.34\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 19.5924ms | 18.7688ms | 53.2800 Ops/s | 52.6122 Ops/s | $\color{#35bf28}+1.27\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 19.4503ms | 18.6653ms | 53.5755 Ops/s | 52.8831 Ops/s | $\color{#35bf28}+1.31\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 19.5610ms | 18.6410ms | 53.6451 Ops/s | 52.8930 Ops/s | $\color{#35bf28}+1.42\\%$ | | test_to_module_speed[True] | 1.7952ms | 1.4798ms | 675.7497 Ops/s | 659.9122 Ops/s | $\color{#35bf28}+2.40\\%$ | | test_to_module_speed[False] | 1.5738ms | 1.4727ms | 679.0476 Ops/s | 670.8579 Ops/s | $\color{#35bf28}+1.22\\%$ | | test_tc_init | 96.8930μs | 22.4361μs | 44.5710 KOps/s | 47.6597 KOps/s | $\textbf{\color{#d91a1a}-6.48\\%}$ | | test_tc_init_nested | 71.1620μs | 43.5710μs | 22.9511 KOps/s | 22.5499 KOps/s | $\color{#35bf28}+1.78\\%$ | | test_tc_first_layer_tensor | 0.7630μs | 0.3606μs | 2.7731 MOps/s | 2.8104 MOps/s | $\color{#d91a1a}-1.33\\%$ | | test_tc_first_layer_nontensor | 4.9516μs | 0.3953μs | 2.5300 MOps/s | 2.5959 MOps/s | $\color{#d91a1a}-2.54\\%$ | | test_tc_second_layer_tensor | 15.6500μs | 1.0815μs | 924.6219 KOps/s | 1.0106 MOps/s | $\textbf{\color{#d91a1a}-8.51\\%}$ | | test_tc_second_layer_nontensor | 4.2400μs | 0.8374μs | 1.1942 MOps/s | 1.2103 MOps/s | $\color{#d91a1a}-1.33\\%$ | | test_unbind | 0.1124s | 8.3827ms | 119.2934 Ops/s | 164.3177 Ops/s | $\textbf{\color{#d91a1a}-27.40\\%}$ | | test_full_like | 14.5557ms | 13.8155ms | 72.3824 Ops/s | 71.2242 Ops/s | $\color{#35bf28}+1.63\\%$ | | test_zeros_like | 8.6442ms | 7.9935ms | 125.1009 Ops/s | 124.2787 Ops/s | $\color{#35bf28}+0.66\\%$ | | test_ones_like | 8.5396ms | 8.0704ms | 123.9090 Ops/s | 123.3552 Ops/s | $\color{#35bf28}+0.45\\%$ | | test_clone | 10.7874ms | 9.9587ms | 100.4149 Ops/s | 100.6387 Ops/s | $\color{#d91a1a}-0.22\\%$ | | test_squeeze | 0.1436ms | 11.3146μs | 88.3815 KOps/s | 89.6299 KOps/s | $\color{#d91a1a}-1.39\\%$ | | test_unsqueeze | 0.1856ms | 51.7905μs | 19.3085 KOps/s | 18.9434 KOps/s | $\color{#35bf28}+1.93\\%$ | | test_split | 0.2583ms | 99.6921μs | 10.0309 KOps/s | 9.8180 KOps/s | $\color{#35bf28}+2.17\\%$ | | test_permute | 0.1936ms | 0.1099ms | 9.1019 KOps/s | 8.8721 KOps/s | $\color{#35bf28}+2.59\\%$ | | test_stack | 29.2820ms | 28.4766ms | 35.1165 Ops/s | 35.3405 Ops/s | $\color{#d91a1a}-0.63\\%$ | | test_cat | 28.9358ms | 28.3193ms | 35.3116 Ops/s | 35.3486 Ops/s | $\color{#d91a1a}-0.10\\%$ |