pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
808 stars 66 forks source link

[Versioning] Make dependence on uint16 optional for older PT versions #839

Closed vmoens closed 2 months ago

github-actions[bot] commented 2 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}15$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 67.0050μs | 16.8648μs | 59.2950 KOps/s | 59.8942 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_plain_set_stack_nested | 46.6570μs | 17.1354μs | 58.3586 KOps/s | 59.0690 KOps/s | $\color{#d91a1a}-1.20\\%$ | | test_plain_set_nested_inplace | 50.7650μs | 19.1894μs | 52.1121 KOps/s | 52.2937 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_plain_set_stack_nested_inplace | 51.8560μs | 19.2597μs | 51.9218 KOps/s | 52.9405 KOps/s | $\color{#d91a1a}-1.92\\%$ | | test_items | 28.2230μs | 2.5348μs | 394.5059 KOps/s | 389.8393 KOps/s | $\color{#35bf28}+1.20\\%$ | | test_items_nested | 1.3238ms | 0.2784ms | 3.5922 KOps/s | 3.6841 KOps/s | $\color{#d91a1a}-2.50\\%$ | | test_items_nested_locked | 0.5158ms | 0.2794ms | 3.5790 KOps/s | 3.6366 KOps/s | $\color{#d91a1a}-1.58\\%$ | | test_items_nested_leaf | 0.1309ms | 82.3603μs | 12.1418 KOps/s | 12.9537 KOps/s | $\textbf{\color{#d91a1a}-6.27\\%}$ | | test_items_stack_nested | 1.3927ms | 0.2833ms | 3.5302 KOps/s | 3.5897 KOps/s | $\color{#d91a1a}-1.66\\%$ | | test_items_stack_nested_leaf | 0.1338ms | 83.7034μs | 11.9469 KOps/s | 12.3368 KOps/s | $\color{#d91a1a}-3.16\\%$ | | test_items_stack_nested_locked | 0.5135ms | 0.2819ms | 3.5468 KOps/s | 3.6345 KOps/s | $\color{#d91a1a}-2.41\\%$ | | test_keys | 25.7480μs | 4.0403μs | 247.5075 KOps/s | 263.6170 KOps/s | $\textbf{\color{#d91a1a}-6.11\\%}$ | | test_keys_nested | 0.2778ms | 0.1395ms | 7.1701 KOps/s | 7.2703 KOps/s | $\color{#d91a1a}-1.38\\%$ | | test_keys_nested_locked | 0.7752ms | 0.1437ms | 6.9602 KOps/s | 7.0205 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_keys_nested_leaf | 0.2131ms | 0.1185ms | 8.4375 KOps/s | 8.5917 KOps/s | $\color{#d91a1a}-1.79\\%$ | | test_keys_stack_nested | 0.2661ms | 0.1406ms | 7.1117 KOps/s | 7.3238 KOps/s | $\color{#d91a1a}-2.90\\%$ | | test_keys_stack_nested_leaf | 0.2117ms | 0.1180ms | 8.4720 KOps/s | 8.5164 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_keys_stack_nested_locked | 0.2756ms | 0.1443ms | 6.9316 KOps/s | 7.0330 KOps/s | $\color{#d91a1a}-1.44\\%$ | | test_values | 6.7525μs | 1.1828μs | 845.4539 KOps/s | 749.8874 KOps/s | $\textbf{\color{#35bf28}+12.74\\%}$ | | test_values_nested | 0.1146ms | 51.4992μs | 19.4178 KOps/s | 19.7387 KOps/s | $\color{#d91a1a}-1.63\\%$ | | test_values_nested_locked | 0.1043ms | 51.4016μs | 19.4546 KOps/s | 19.7155 KOps/s | $\color{#d91a1a}-1.32\\%$ | | test_values_nested_leaf | 0.1093ms | 46.6645μs | 21.4296 KOps/s | 21.6827 KOps/s | $\color{#d91a1a}-1.17\\%$ | | test_values_stack_nested | 0.1006ms | 52.9444μs | 18.8877 KOps/s | 19.2500 KOps/s | $\color{#d91a1a}-1.88\\%$ | | test_values_stack_nested_leaf | 96.0590μs | 46.3596μs | 21.5705 KOps/s | 21.7884 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_values_stack_nested_locked | 96.6310μs | 52.3531μs | 19.1011 KOps/s | 19.4548 KOps/s | $\color{#d91a1a}-1.82\\%$ | | test_membership | 14.0460μs | 1.3597μs | 735.4710 KOps/s | 745.5273 KOps/s | $\color{#d91a1a}-1.35\\%$ | | test_membership_nested | 32.1500μs | 3.5974μs | 277.9812 KOps/s | 285.7855 KOps/s | $\color{#d91a1a}-2.73\\%$ | | test_membership_nested_leaf | 44.5230μs | 3.6010μs | 277.6984 KOps/s | 287.9084 KOps/s | $\color{#d91a1a}-3.55\\%$ | | test_membership_stacked_nested | 28.8540μs | 3.5280μs | 283.4460 KOps/s | 285.2103 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_membership_stacked_nested_leaf | 28.8040μs | 3.5492μs | 281.7555 KOps/s | 273.5672 KOps/s | $\color{#35bf28}+2.99\\%$ | | test_membership_nested_last | 43.1300μs | 4.3112μs | 231.9541 KOps/s | 235.5704 KOps/s | $\color{#d91a1a}-1.54\\%$ | | test_membership_nested_leaf_last | 38.7520μs | 4.3937μs | 227.6000 KOps/s | 233.0469 KOps/s | $\color{#d91a1a}-2.34\\%$ | | test_membership_stacked_nested_last | 31.6700μs | 5.4955μs | 181.9675 KOps/s | 208.1668 KOps/s | $\textbf{\color{#d91a1a}-12.59\\%}$ | | test_membership_stacked_nested_leaf_last | 28.7040μs | 5.5397μs | 180.5137 KOps/s | 207.7108 KOps/s | $\textbf{\color{#d91a1a}-13.09\\%}$ | | test_nested_getleaf | 36.0180μs | 10.6605μs | 93.8043 KOps/s | 93.9865 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_nested_get | 37.1100μs | 10.0546μs | 99.4573 KOps/s | 98.4257 KOps/s | $\color{#35bf28}+1.05\\%$ | | test_stacked_getleaf | 41.7080μs | 10.5907μs | 94.4222 KOps/s | 96.0176 KOps/s | $\color{#d91a1a}-1.66\\%$ | | test_stacked_get | 33.1320μs | 9.9345μs | 100.6589 KOps/s | 100.7476 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_nested_getitemleaf | 48.5910μs | 11.3334μs | 88.2349 KOps/s | 90.0631 KOps/s | $\color{#d91a1a}-2.03\\%$ | | test_nested_getitem | 33.7540μs | 10.4182μs | 95.9856 KOps/s | 96.9118 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_stacked_getitemleaf | 34.9150μs | 11.2157μs | 89.1605 KOps/s | 90.0522 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_stacked_getitem | 36.9890μs | 10.2549μs | 97.5144 KOps/s | 96.3338 KOps/s | $\color{#35bf28}+1.23\\%$ | | test_lock_nested | 51.6710ms | 0.4011ms | 2.4933 KOps/s | 2.9030 KOps/s | $\textbf{\color{#d91a1a}-14.11\\%}$ | | test_lock_stack_nested | 0.5233ms | 0.3138ms | 3.1865 KOps/s | 3.1945 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_unlock_nested | 0.7877ms | 0.3547ms | 2.8194 KOps/s | 2.8376 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_unlock_stack_nested | 0.5141ms | 0.3218ms | 3.1077 KOps/s | 3.1211 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_flatten_speed | 0.2361ms | 0.1011ms | 9.8928 KOps/s | 10.3738 KOps/s | $\color{#d91a1a}-4.64\\%$ | | test_unflatten_speed | 0.7340ms | 0.4178ms | 2.3937 KOps/s | 2.3710 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_common_ops | 5.1147ms | 0.7345ms | 1.3615 KOps/s | 1.4039 KOps/s | $\color{#d91a1a}-3.02\\%$ | | test_creation | 70.4220μs | 1.8813μs | 531.5594 KOps/s | 520.4760 KOps/s | $\color{#35bf28}+2.13\\%$ | | test_creation_empty | 34.6440μs | 9.9490μs | 100.5122 KOps/s | 101.5181 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_creation_nested_1 | 43.1510μs | 13.3146μs | 75.1056 KOps/s | 77.1244 KOps/s | $\color{#d91a1a}-2.62\\%$ | | test_creation_nested_2 | 46.2860μs | 16.1408μs | 61.9550 KOps/s | 62.4120 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_clone | 0.1378ms | 13.9857μs | 71.5017 KOps/s | 73.0856 KOps/s | $\color{#d91a1a}-2.17\\%$ | | test_getitem[int] | 61.7360μs | 11.5879μs | 86.2968 KOps/s | 86.8573 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_getitem[slice_int] | 84.2180μs | 23.3518μs | 42.8232 KOps/s | 42.6087 KOps/s | $\color{#35bf28}+0.50\\%$ | | test_getitem[range] | 87.8640μs | 60.7276μs | 16.4670 KOps/s | 15.1783 KOps/s | $\textbf{\color{#35bf28}+8.49\\%}$ | | test_getitem[tuple] | 60.4330μs | 19.6418μs | 50.9117 KOps/s | 51.6798 KOps/s | $\color{#d91a1a}-1.49\\%$ | | test_getitem[list] | 0.1547ms | 42.6347μs | 23.4551 KOps/s | 24.4805 KOps/s | $\color{#d91a1a}-4.19\\%$ | | test_setitem_dim[int] | 74.8400μs | 35.4342μs | 28.2213 KOps/s | 28.4875 KOps/s | $\color{#d91a1a}-0.93\\%$ | | test_setitem_dim[slice_int] | 0.1077ms | 62.3024μs | 16.0508 KOps/s | 16.0860 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_setitem_dim[range] | 0.1444ms | 84.9323μs | 11.7741 KOps/s | 11.8165 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_setitem_dim[tuple] | 0.1041ms | 51.4750μs | 19.4269 KOps/s | 19.2810 KOps/s | $\color{#35bf28}+0.76\\%$ | | test_setitem | 67.1450μs | 20.4165μs | 48.9799 KOps/s | 49.9832 KOps/s | $\color{#d91a1a}-2.01\\%$ | | test_set | 77.9250μs | 21.0762μs | 47.4470 KOps/s | 51.0931 KOps/s | $\textbf{\color{#d91a1a}-7.14\\%}$ | | test_set_shared | 1.7843ms | 0.1439ms | 6.9493 KOps/s | 6.9197 KOps/s | $\color{#35bf28}+0.43\\%$ | | test_update | 0.2303ms | 21.7794μs | 45.9149 KOps/s | 46.7350 KOps/s | $\color{#d91a1a}-1.75\\%$ | | test_update_nested | 0.1294ms | 30.3877μs | 32.9080 KOps/s | 32.6705 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_update__nested | 81.7930μs | 26.6272μs | 37.5555 KOps/s | 39.5408 KOps/s | $\textbf{\color{#d91a1a}-5.02\\%}$ | | test_set_nested | 59.1710μs | 21.7828μs | 45.9079 KOps/s | 47.6891 KOps/s | $\color{#d91a1a}-3.74\\%$ | | test_set_nested_new | 77.8550μs | 26.1986μs | 38.1700 KOps/s | 39.5521 KOps/s | $\color{#d91a1a}-3.49\\%$ | | test_select | 1.3463ms | 41.6038μs | 24.0362 KOps/s | 24.7735 KOps/s | $\color{#d91a1a}-2.98\\%$ | | test_select_nested | 0.1458ms | 61.7957μs | 16.1824 KOps/s | 16.5480 KOps/s | $\color{#d91a1a}-2.21\\%$ | | test_exclude_nested | 0.2627ms | 0.1231ms | 8.1252 KOps/s | 8.0146 KOps/s | $\color{#35bf28}+1.38\\%$ | | test_empty[True] | 0.6115ms | 0.4095ms | 2.4421 KOps/s | 2.4994 KOps/s | $\color{#d91a1a}-2.29\\%$ | | test_empty[False] | 9.0595μs | 1.1829μs | 845.3536 KOps/s | 877.8495 KOps/s | $\color{#d91a1a}-3.70\\%$ | | test_unbind_speed | 0.3364ms | 0.2641ms | 3.7862 KOps/s | 3.9466 KOps/s | $\color{#d91a1a}-4.06\\%$ | | test_unbind_speed_stack0 | 0.5373ms | 0.2599ms | 3.8469 KOps/s | 3.9703 KOps/s | $\color{#d91a1a}-3.11\\%$ | | test_unbind_speed_stack1 | 76.9480ms | 0.7492ms | 1.3347 KOps/s | 1.3818 KOps/s | $\color{#d91a1a}-3.41\\%$ | | test_split | 76.9533ms | 1.6516ms | 605.4872 Ops/s | 615.6309 Ops/s | $\color{#d91a1a}-1.65\\%$ | | test_chunk | 77.8419ms | 1.6416ms | 609.1779 Ops/s | 622.2475 Ops/s | $\color{#d91a1a}-2.10\\%$ | | test_creation[device0] | 0.2763ms | 86.9637μs | 11.4991 KOps/s | 11.8075 KOps/s | $\color{#d91a1a}-2.61\\%$ | | test_creation_from_tensor | 4.1597ms | 86.6974μs | 11.5344 KOps/s | 11.5706 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_add_one[memmap_tensor0] | 0.1108ms | 5.2683μs | 189.8133 KOps/s | 181.5287 KOps/s | $\color{#35bf28}+4.56\\%$ | | test_contiguous[memmap_tensor0] | 11.9220μs | 0.6323μs | 1.5815 MOps/s | 1.5785 MOps/s | $\color{#35bf28}+0.19\\%$ | | test_stack[memmap_tensor0] | 29.1350μs | 3.7695μs | 265.2873 KOps/s | 273.0128 KOps/s | $\color{#d91a1a}-2.83\\%$ | | test_memmaptd_index | 0.9241ms | 0.2568ms | 3.8943 KOps/s | 3.8183 KOps/s | $\color{#35bf28}+1.99\\%$ | | test_memmaptd_index_astensor | 0.7322ms | 0.3340ms | 2.9944 KOps/s | 2.9876 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_memmaptd_index_op | 1.5497ms | 0.6383ms | 1.5667 KOps/s | 1.6117 KOps/s | $\color{#d91a1a}-2.79\\%$ | | test_serialize_model | 0.1797s | 0.1165s | 8.5865 Ops/s | 8.4624 Ops/s | $\color{#35bf28}+1.47\\%$ | | test_serialize_model_pickle | 0.4511s | 0.3809s | 2.6251 Ops/s | 2.6577 Ops/s | $\color{#d91a1a}-1.23\\%$ | | test_serialize_weights | 0.1841s | 0.1145s | 8.7354 Ops/s | 9.0917 Ops/s | $\color{#d91a1a}-3.92\\%$ | | test_serialize_weights_returnearly | 0.2005s | 0.1391s | 7.1916 Ops/s | 7.8219 Ops/s | $\textbf{\color{#d91a1a}-8.06\\%}$ | | test_serialize_weights_pickle | 0.6027s | 0.4544s | 2.2005 Ops/s | 2.5286 Ops/s | $\textbf{\color{#d91a1a}-12.98\\%}$ | | test_serialize_weights_filesystem | 0.1847s | 0.1048s | 9.5432 Ops/s | 9.9219 Ops/s | $\color{#d91a1a}-3.82\\%$ | | test_serialize_model_filesystem | 0.1029s | 96.9521ms | 10.3144 Ops/s | 10.0895 Ops/s | $\color{#35bf28}+2.23\\%$ | | test_reshape_pytree | 0.2170ms | 29.4652μs | 33.9384 KOps/s | 38.0725 KOps/s | $\textbf{\color{#d91a1a}-10.86\\%}$ | | test_reshape_td | 92.9340μs | 35.6631μs | 28.0402 KOps/s | 28.9218 KOps/s | $\color{#d91a1a}-3.05\\%$ | | test_view_pytree | 66.2030μs | 25.6213μs | 39.0300 KOps/s | 38.0934 KOps/s | $\color{#35bf28}+2.46\\%$ | | test_view_td | 94.0760μs | 41.0493μs | 24.3610 KOps/s | 25.2112 KOps/s | $\color{#d91a1a}-3.37\\%$ | | test_unbind_pytree | 81.6030μs | 30.2108μs | 33.1008 KOps/s | 33.1734 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_unbind_td | 0.3952ms | 38.9826μs | 25.6525 KOps/s | 26.0359 KOps/s | $\color{#d91a1a}-1.47\\%$ | | test_split_pytree | 71.4240μs | 29.9588μs | 33.3792 KOps/s | 33.8251 KOps/s | $\color{#d91a1a}-1.32\\%$ | | test_split_td | 0.5079ms | 41.7375μs | 23.9593 KOps/s | 24.2670 KOps/s | $\color{#d91a1a}-1.27\\%$ | | test_add_pytree | 82.9050μs | 35.5218μs | 28.1517 KOps/s | 28.1083 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_add_td | 0.1173ms | 55.7661μs | 17.9320 KOps/s | 17.5140 KOps/s | $\color{#35bf28}+2.39\\%$ | | test_distributed | 0.2153ms | 0.1035ms | 9.6599 KOps/s | 9.5538 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_tdmodule | 0.1163ms | 17.7609μs | 56.3034 KOps/s | 56.8530 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_tdmodule_dispatch | 51.1660μs | 35.0985μs | 28.4912 KOps/s | 28.4482 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_tdseq | 39.4340μs | 20.3532μs | 49.1323 KOps/s | 48.6791 KOps/s | $\color{#35bf28}+0.93\\%$ | | test_tdseq_dispatch | 64.4800μs | 39.5035μs | 25.3142 KOps/s | 25.1130 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_instantiation_functorch | 1.6474ms | 1.3489ms | 741.3332 Ops/s | 731.9937 Ops/s | $\color{#35bf28}+1.28\\%$ | | test_instantiation_td | 71.2573ms | 1.1171ms | 895.1544 Ops/s | 957.6510 Ops/s | $\textbf{\color{#d91a1a}-6.53\\%}$ | | test_exec_functorch | 0.2963ms | 0.1615ms | 6.1918 KOps/s | 5.7540 KOps/s | $\textbf{\color{#35bf28}+7.61\\%}$ | | test_exec_functional_call | 0.3401ms | 0.1528ms | 6.5465 KOps/s | 6.5671 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_exec_td | 0.2890ms | 0.1498ms | 6.6769 KOps/s | 6.7181 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_exec_td_decorator | 1.0283ms | 0.2273ms | 4.3994 KOps/s | 4.3798 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_vmap_mlp_speed[True-True] | 0.7713ms | 0.4871ms | 2.0530 KOps/s | 2.0485 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_vmap_mlp_speed[True-False] | 0.7723ms | 0.4829ms | 2.0707 KOps/s | 2.0757 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_vmap_mlp_speed[False-True] | 0.8456ms | 0.3986ms | 2.5089 KOps/s | 2.5416 KOps/s | $\color{#d91a1a}-1.29\\%$ | | test_vmap_mlp_speed[False-False] | 0.6718ms | 0.3948ms | 2.5330 KOps/s | 2.5432 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1316ms | 0.5656ms | 1.7681 KOps/s | 1.7926 KOps/s | $\color{#d91a1a}-1.37\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8625ms | 0.5564ms | 1.7973 KOps/s | 1.6514 KOps/s | $\textbf{\color{#35bf28}+8.83\\%}$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7741ms | 0.4631ms | 2.1593 KOps/s | 2.1653 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.6769ms | 0.4588ms | 2.1798 KOps/s | 2.1742 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_to_module_speed[True] | 2.4293ms | 1.7233ms | 580.2845 Ops/s | 578.2675 Ops/s | $\color{#35bf28}+0.35\\%$ | | test_to_module_speed[False] | 1.8423ms | 1.7045ms | 586.6718 Ops/s | 584.3855 Ops/s | $\color{#35bf28}+0.39\\%$ | | test_tc_init | 59.0000μs | 27.0659μs | 36.9469 KOps/s | 36.9003 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_tc_init_nested | 0.1175ms | 58.3503μs | 17.1379 KOps/s | 17.7078 KOps/s | $\color{#d91a1a}-3.22\\%$ | | test_tc_first_layer_tensor | 5.2256μs | 0.7076μs | 1.4132 MOps/s | 1.4708 MOps/s | $\color{#d91a1a}-3.92\\%$ | | test_tc_first_layer_nontensor | 6.9830μs | 0.7031μs | 1.4224 MOps/s | 1.5025 MOps/s | $\textbf{\color{#d91a1a}-5.33\\%}$ | | test_tc_second_layer_tensor | 19.9470μs | 1.8572μs | 538.4584 KOps/s | 543.0257 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_tc_second_layer_nontensor | 18.6350μs | 1.6783μs | 595.8322 KOps/s | 611.2751 KOps/s | $\color{#d91a1a}-2.53\\%$ | | test_unbind | 85.0842ms | 7.8232ms | 127.8249 Ops/s | 158.3942 Ops/s | $\textbf{\color{#d91a1a}-19.30\\%}$ | | test_full_like | 15.5765ms | 10.9436ms | 91.3775 Ops/s | 89.4241 Ops/s | $\color{#35bf28}+2.18\\%$ | | test_zeros_like | 11.7763ms | 5.9091ms | 169.2294 Ops/s | 180.2557 Ops/s | $\textbf{\color{#d91a1a}-6.12\\%}$ | | test_ones_like | 11.9560ms | 6.2815ms | 159.1964 Ops/s | 149.6545 Ops/s | $\textbf{\color{#35bf28}+6.38\\%}$ | | test_clone | 12.3878ms | 8.0635ms | 124.0159 Ops/s | 120.8141 Ops/s | $\color{#35bf28}+2.65\\%$ | | test_squeeze | 68.0170μs | 15.4435μs | 64.7523 KOps/s | 70.0548 KOps/s | $\textbf{\color{#d91a1a}-7.57\\%}$ | | test_unsqueeze | 0.1326ms | 61.9170μs | 16.1506 KOps/s | 16.4681 KOps/s | $\color{#d91a1a}-1.93\\%$ | | test_split | 0.2614ms | 0.1164ms | 8.5891 KOps/s | 9.0168 KOps/s | $\color{#d91a1a}-4.74\\%$ | | test_permute | 0.2169ms | 0.1269ms | 7.8831 KOps/s | 7.9618 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_stack | 26.4938ms | 22.6102ms | 44.2278 Ops/s | 42.8289 Ops/s | $\color{#35bf28}+3.27\\%$ | | test_cat | 28.3176ms | 23.5450ms | 42.4718 Ops/s | 41.2069 Ops/s | $\color{#35bf28}+3.07\\%$ |
github-actions[bot] commented 2 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}21$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 0.4891ms | 13.0107μs | 76.8595 KOps/s | 85.5200 KOps/s | $\textbf{\color{#d91a1a}-10.13\\%}$ | | test_plain_set_stack_nested | 26.6600μs | 13.2475μs | 75.4860 KOps/s | 83.2685 KOps/s | $\textbf{\color{#d91a1a}-9.35\\%}$ | | test_plain_set_nested_inplace | 36.1410μs | 14.5911μs | 68.5348 KOps/s | 76.1418 KOps/s | $\textbf{\color{#d91a1a}-9.99\\%}$ | | test_plain_set_stack_nested_inplace | 36.7010μs | 14.7151μs | 67.9573 KOps/s | 75.0822 KOps/s | $\textbf{\color{#d91a1a}-9.49\\%}$ | | test_items | 28.5610μs | 4.6686μs | 214.1975 KOps/s | 210.0242 KOps/s | $\color{#35bf28}+1.99\\%$ | | test_items_nested | 0.3697ms | 0.3474ms | 2.8787 KOps/s | 2.9398 KOps/s | $\color{#d91a1a}-2.08\\%$ | | test_items_nested_locked | 0.3788ms | 0.3496ms | 2.8603 KOps/s | 2.8551 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_items_nested_leaf | 98.5920μs | 83.0731μs | 12.0376 KOps/s | 12.1168 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_items_stack_nested | 0.4279ms | 0.3500ms | 2.8575 KOps/s | 2.8692 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_items_stack_nested_leaf | 0.1170ms | 84.9727μs | 11.7685 KOps/s | 12.0043 KOps/s | $\color{#d91a1a}-1.96\\%$ | | test_items_stack_nested_locked | 0.4084ms | 0.3531ms | 2.8320 KOps/s | 2.8456 KOps/s | $\color{#d91a1a}-0.48\\%$ | | test_keys | 18.4910μs | 4.3319μs | 230.8465 KOps/s | 228.5917 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_keys_nested | 89.5110μs | 67.3139μs | 14.8558 KOps/s | 14.9079 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_keys_nested_locked | 2.0515ms | 72.1334μs | 13.8632 KOps/s | 13.7998 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_keys_nested_leaf | 77.2310μs | 57.7595μs | 17.3132 KOps/s | 17.3554 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_keys_stack_nested | 85.3220μs | 66.8613μs | 14.9563 KOps/s | 14.8990 KOps/s | $\color{#35bf28}+0.38\\%$ | | test_keys_stack_nested_leaf | 71.9320μs | 57.5488μs | 17.3766 KOps/s | 17.4245 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_keys_stack_nested_locked | 90.9020μs | 71.5040μs | 13.9852 KOps/s | 14.0757 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_values | 8.9103μs | 1.8332μs | 545.5052 KOps/s | 547.2125 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_values_nested | 58.1810μs | 35.1982μs | 28.4106 KOps/s | 28.3767 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_values_nested_locked | 54.9710μs | 37.3491μs | 26.7744 KOps/s | 26.9181 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_values_nested_leaf | 49.5310μs | 31.1744μs | 32.0776 KOps/s | 31.6190 KOps/s | $\color{#35bf28}+1.45\\%$ | | test_values_stack_nested | 63.5410μs | 36.1570μs | 27.6572 KOps/s | 28.0559 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_values_stack_nested_leaf | 54.9510μs | 32.1215μs | 31.1318 KOps/s | 31.4489 KOps/s | $\color{#d91a1a}-1.01\\%$ | | test_values_stack_nested_locked | 58.8610μs | 38.2336μs | 26.1550 KOps/s | 26.8997 KOps/s | $\color{#d91a1a}-2.77\\%$ | | test_membership | 3.3257μs | 0.7666μs | 1.3045 MOps/s | 1.2805 MOps/s | $\color{#35bf28}+1.87\\%$ | | test_membership_nested | 26.9610μs | 2.6165μs | 382.1954 KOps/s | 376.4177 KOps/s | $\color{#35bf28}+1.53\\%$ | | test_membership_nested_leaf | 19.1810μs | 2.6026μs | 384.2257 KOps/s | 377.6353 KOps/s | $\color{#35bf28}+1.75\\%$ | | test_membership_stacked_nested | 27.2300μs | 2.6397μs | 378.8358 KOps/s | 374.7874 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_membership_stacked_nested_leaf | 20.1500μs | 2.6206μs | 381.5923 KOps/s | 380.1463 KOps/s | $\color{#35bf28}+0.38\\%$ | | test_membership_nested_last | 34.5210μs | 3.2040μs | 312.1121 KOps/s | 316.1625 KOps/s | $\color{#d91a1a}-1.28\\%$ | | test_membership_nested_leaf_last | 18.8910μs | 3.1431μs | 318.1587 KOps/s | 317.6761 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_membership_stacked_nested_last | 24.4000μs | 3.6204μs | 276.2147 KOps/s | 316.8087 KOps/s | $\textbf{\color{#d91a1a}-12.81\\%}$ | | test_membership_stacked_nested_leaf_last | 35.7610μs | 3.5815μs | 279.2120 KOps/s | 315.8200 KOps/s | $\textbf{\color{#d91a1a}-11.59\\%}$ | | test_nested_getleaf | 25.3210μs | 8.3656μs | 119.5370 KOps/s | 118.8599 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_nested_get | 66.7210μs | 7.8663μs | 127.1238 KOps/s | 126.2368 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_stacked_getleaf | 38.7100μs | 8.3950μs | 119.1180 KOps/s | 118.7018 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_stacked_get | 24.8300μs | 7.8653μs | 127.1400 KOps/s | 126.5972 KOps/s | $\color{#35bf28}+0.43\\%$ | | test_nested_getitemleaf | 33.6710μs | 8.5226μs | 117.3357 KOps/s | 116.2076 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_nested_getitem | 27.6410μs | 8.0374μs | 124.4189 KOps/s | 123.1656 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_stacked_getitemleaf | 25.8800μs | 8.6042μs | 116.2228 KOps/s | 116.3390 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_stacked_getitem | 69.3010μs | 8.0502μs | 124.2211 KOps/s | 123.3771 KOps/s | $\color{#35bf28}+0.68\\%$ | | test_lock_nested | 58.8089ms | 0.4045ms | 2.4721 KOps/s | 2.4583 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_lock_stack_nested | 0.3357ms | 0.3002ms | 3.3314 KOps/s | 3.2686 KOps/s | $\color{#35bf28}+1.92\\%$ | | test_unlock_nested | 60.5803ms | 0.4090ms | 2.4452 KOps/s | 2.4345 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_unlock_stack_nested | 0.3323ms | 0.3096ms | 3.2300 KOps/s | 3.1897 KOps/s | $\color{#35bf28}+1.26\\%$ | | test_flatten_speed | 0.3275ms | 0.1020ms | 9.8008 KOps/s | 9.8776 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_unflatten_speed | 0.3812ms | 0.2924ms | 3.4201 KOps/s | 3.3580 KOps/s | $\color{#35bf28}+1.85\\%$ | | test_common_ops | 1.0498ms | 0.5947ms | 1.6816 KOps/s | 1.8181 KOps/s | $\textbf{\color{#d91a1a}-7.51\\%}$ | | test_creation | 32.7410μs | 1.6435μs | 608.4693 KOps/s | 598.5097 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_creation_empty | 47.8000μs | 9.3411μs | 107.0535 KOps/s | 159.5561 KOps/s | $\textbf{\color{#d91a1a}-32.91\\%}$ | | test_creation_nested_1 | 26.4200μs | 11.0006μs | 90.9041 KOps/s | 123.1820 KOps/s | $\textbf{\color{#d91a1a}-26.20\\%}$ | | test_creation_nested_2 | 37.2300μs | 13.2102μs | 75.6991 KOps/s | 98.2999 KOps/s | $\textbf{\color{#d91a1a}-22.99\\%}$ | | test_clone | 67.0810μs | 11.8231μs | 84.5801 KOps/s | 82.5357 KOps/s | $\color{#35bf28}+2.48\\%$ | | test_getitem[int] | 27.6510μs | 10.7732μs | 92.8232 KOps/s | 88.4376 KOps/s | $\color{#35bf28}+4.96\\%$ | | test_getitem[slice_int] | 43.9210μs | 20.2375μs | 49.4133 KOps/s | 45.4156 KOps/s | $\textbf{\color{#35bf28}+8.80\\%}$ | | test_getitem[range] | 66.3310μs | 47.6148μs | 21.0019 KOps/s | 20.1027 KOps/s | $\color{#35bf28}+4.47\\%$ | | test_getitem[tuple] | 52.7010μs | 18.6755μs | 53.5460 KOps/s | 51.4430 KOps/s | $\color{#35bf28}+4.09\\%$ | | test_getitem[list] | 0.1225ms | 34.6743μs | 28.8398 KOps/s | 29.1697 KOps/s | $\color{#d91a1a}-1.13\\%$ | | test_setitem_dim[int] | 46.7410μs | 30.2173μs | 33.0937 KOps/s | 36.2960 KOps/s | $\textbf{\color{#d91a1a}-8.82\\%}$ | | test_setitem_dim[slice_int] | 68.4910μs | 50.9178μs | 19.6395 KOps/s | 20.8365 KOps/s | $\textbf{\color{#d91a1a}-5.74\\%}$ | | test_setitem_dim[range] | 95.4210μs | 67.0238μs | 14.9201 KOps/s | 14.5953 KOps/s | $\color{#35bf28}+2.22\\%$ | | test_setitem_dim[tuple] | 61.7210μs | 44.6601μs | 22.3913 KOps/s | 22.8518 KOps/s | $\color{#d91a1a}-2.01\\%$ | | test_setitem | 52.4310μs | 16.8832μs | 59.2306 KOps/s | 60.9400 KOps/s | $\color{#d91a1a}-2.80\\%$ | | test_set | 44.2800μs | 16.2710μs | 61.4589 KOps/s | 65.0818 KOps/s | $\textbf{\color{#d91a1a}-5.57\\%}$ | | test_set_shared | 1.5914ms | 0.1003ms | 9.9691 KOps/s | 9.7335 KOps/s | $\color{#35bf28}+2.42\\%$ | | test_update | 63.1810μs | 19.3420μs | 51.7010 KOps/s | 59.3498 KOps/s | $\textbf{\color{#d91a1a}-12.89\\%}$ | | test_update_nested | 74.9620μs | 24.9168μs | 40.1336 KOps/s | 44.1944 KOps/s | $\textbf{\color{#d91a1a}-9.19\\%}$ | | test_update__nested | 56.7310μs | 22.7054μs | 44.0423 KOps/s | 42.5907 KOps/s | $\color{#35bf28}+3.41\\%$ | | test_set_nested | 51.3810μs | 17.2335μs | 58.0267 KOps/s | 59.1974 KOps/s | $\color{#d91a1a}-1.98\\%$ | | test_set_nested_new | 61.8410μs | 20.5082μs | 48.7609 KOps/s | 50.1372 KOps/s | $\color{#d91a1a}-2.74\\%$ | | test_select | 76.5710μs | 35.1322μs | 28.4639 KOps/s | 29.8689 KOps/s | $\color{#d91a1a}-4.70\\%$ | | test_select_nested | 0.1358ms | 54.3243μs | 18.4080 KOps/s | 17.9277 KOps/s | $\color{#35bf28}+2.68\\%$ | | test_exclude_nested | 0.1425ms | 0.1140ms | 8.7688 KOps/s | 8.9164 KOps/s | $\color{#d91a1a}-1.66\\%$ | | test_empty[True] | 0.3839ms | 0.3541ms | 2.8239 KOps/s | 2.8096 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_empty[False] | 2.7290μs | 0.9292μs | 1.0762 MOps/s | 1.0676 MOps/s | $\color{#35bf28}+0.80\\%$ | | test_to | 0.1056ms | 77.5163μs | 12.9005 KOps/s | 12.7773 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_to_nonblocking | 98.2720μs | 62.2692μs | 16.0593 KOps/s | 15.7991 KOps/s | $\color{#35bf28}+1.65\\%$ | | test_unbind_speed | 1.5093ms | 0.2673ms | 3.7418 KOps/s | 3.7382 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_unbind_speed_stack0 | 0.2954ms | 0.2641ms | 3.7868 KOps/s | 3.7611 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_unbind_speed_stack1 | 76.1570ms | 0.8012ms | 1.2481 KOps/s | 1.2235 KOps/s | $\color{#35bf28}+2.02\\%$ | | test_split | 76.5115ms | 1.6619ms | 601.7078 Ops/s | 576.7270 Ops/s | $\color{#35bf28}+4.33\\%$ | | test_chunk | 76.3607ms | 1.6553ms | 604.1194 Ops/s | 624.4696 Ops/s | $\color{#d91a1a}-3.26\\%$ | | test_creation[device0] | 0.1262ms | 56.6970μs | 17.6376 KOps/s | 17.1585 KOps/s | $\color{#35bf28}+2.79\\%$ | | test_creation_from_tensor | 0.1880ms | 53.5673μs | 18.6681 KOps/s | 17.8588 KOps/s | $\color{#35bf28}+4.53\\%$ | | test_add_one[memmap_tensor0] | 0.1022ms | 6.9029μs | 144.8668 KOps/s | 133.0137 KOps/s | $\textbf{\color{#35bf28}+8.91\\%}$ | | test_contiguous[memmap_tensor0] | 11.1100μs | 0.6689μs | 1.4949 MOps/s | 1.4404 MOps/s | $\color{#35bf28}+3.78\\%$ | | test_stack[memmap_tensor0] | 29.9110μs | 4.7147μs | 212.1034 KOps/s | 185.7259 KOps/s | $\textbf{\color{#35bf28}+14.20\\%}$ | | test_memmaptd_index | 1.0705ms | 0.2860ms | 3.4967 KOps/s | 2.5693 KOps/s | $\textbf{\color{#35bf28}+36.10\\%}$ | | test_memmaptd_index_astensor | 0.6239ms | 0.3602ms | 2.7760 KOps/s | 2.7081 KOps/s | $\color{#35bf28}+2.51\\%$ | | test_memmaptd_index_op | 1.1310ms | 0.6663ms | 1.5009 KOps/s | 1.5518 KOps/s | $\color{#d91a1a}-3.28\\%$ | | test_serialize_model | 0.1817s | 0.1103s | 9.0685 Ops/s | 9.4588 Ops/s | $\color{#d91a1a}-4.13\\%$ | | test_serialize_model_pickle | 1.3650s | 1.2377s | 0.8080 Ops/s | 0.8069 Ops/s | $\color{#35bf28}+0.13\\%$ | | test_serialize_weights | 0.1792s | 0.1086s | 9.2069 Ops/s | 8.9095 Ops/s | $\color{#35bf28}+3.34\\%$ | | test_serialize_weights_returnearly | 0.2613s | 0.1046s | 9.5573 Ops/s | 9.5788 Ops/s | $\color{#d91a1a}-0.22\\%$ | | test_serialize_weights_pickle | 1.3557s | 1.2480s | 0.8013 Ops/s | 0.8091 Ops/s | $\color{#d91a1a}-0.97\\%$ | | test_reshape_pytree | 50.9710μs | 25.9799μs | 38.4913 KOps/s | 37.6632 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_reshape_td | 51.8210μs | 31.5654μs | 31.6802 KOps/s | 31.4193 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_view_pytree | 0.2586ms | 25.7939μs | 38.7689 KOps/s | 37.7408 KOps/s | $\color{#35bf28}+2.72\\%$ | | test_view_td | 60.5210μs | 36.6015μs | 27.3213 KOps/s | 26.9948 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_unbind_pytree | 0.2329ms | 31.4852μs | 31.7610 KOps/s | 30.6563 KOps/s | $\color{#35bf28}+3.60\\%$ | | test_unbind_td | 0.4453ms | 40.0078μs | 24.9951 KOps/s | 24.0011 KOps/s | $\color{#35bf28}+4.14\\%$ | | test_split_pytree | 53.7010μs | 34.8674μs | 28.6801 KOps/s | 26.6072 KOps/s | $\textbf{\color{#35bf28}+7.79\\%}$ | | test_split_td | 0.1061ms | 38.8299μs | 25.7534 KOps/s | 25.2110 KOps/s | $\color{#35bf28}+2.15\\%$ | | test_add_pytree | 0.2486ms | 37.0713μs | 26.9750 KOps/s | 25.6884 KOps/s | $\textbf{\color{#35bf28}+5.01\\%}$ | | test_add_td | 83.4010μs | 50.5449μs | 19.7844 KOps/s | 20.1252 KOps/s | $\color{#d91a1a}-1.69\\%$ | | test_distributed | 0.2899ms | 66.0367μs | 15.1431 KOps/s | 14.6122 KOps/s | $\color{#35bf28}+3.63\\%$ | | test_tdmodule | 32.8300μs | 15.4412μs | 64.7618 KOps/s | 73.8558 KOps/s | $\textbf{\color{#d91a1a}-12.31\\%}$ | | test_tdmodule_dispatch | 47.9010μs | 30.5355μs | 32.7487 KOps/s | 36.5899 KOps/s | $\textbf{\color{#d91a1a}-10.50\\%}$ | | test_tdseq | 34.4600μs | 17.7622μs | 56.2993 KOps/s | 64.0086 KOps/s | $\textbf{\color{#d91a1a}-12.04\\%}$ | | test_tdseq_dispatch | 64.6710μs | 33.7406μs | 29.6379 KOps/s | 32.2822 KOps/s | $\textbf{\color{#d91a1a}-8.19\\%}$ | | test_instantiation_functorch | 1.7781ms | 1.5470ms | 646.4192 Ops/s | 610.0646 Ops/s | $\textbf{\color{#35bf28}+5.96\\%}$ | | test_instantiation_td | 1.5615ms | 1.0509ms | 951.5867 Ops/s | 925.0336 Ops/s | $\color{#35bf28}+2.87\\%$ | | test_exec_functorch | 0.2057ms | 0.1496ms | 6.6826 KOps/s | 6.4936 KOps/s | $\color{#35bf28}+2.91\\%$ | | test_exec_functional_call | 0.3398ms | 0.1395ms | 7.1665 KOps/s | 6.9329 KOps/s | $\color{#35bf28}+3.37\\%$ | | test_exec_td | 0.1709ms | 0.1384ms | 7.2250 KOps/s | 6.9758 KOps/s | $\color{#35bf28}+3.57\\%$ | | test_exec_td_decorator | 0.7794ms | 0.2129ms | 4.6964 KOps/s | 4.6183 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_vmap_mlp_speed[True-True] | 0.8617ms | 0.5890ms | 1.6977 KOps/s | 1.7055 KOps/s | $\color{#d91a1a}-0.46\\%$ | | test_vmap_mlp_speed[True-False] | 0.8185ms | 0.5847ms | 1.7103 KOps/s | 1.6648 KOps/s | $\color{#35bf28}+2.73\\%$ | | test_vmap_mlp_speed[False-True] | 0.7309ms | 0.5141ms | 1.9451 KOps/s | 1.8398 KOps/s | $\textbf{\color{#35bf28}+5.72\\%}$ | | test_vmap_mlp_speed[False-False] | 0.7210ms | 0.5135ms | 1.9473 KOps/s | 1.8400 KOps/s | $\textbf{\color{#35bf28}+5.83\\%}$ | | test_vmap_mlp_speed_decorator[True-True] | 0.8445ms | 0.6491ms | 1.5407 KOps/s | 1.3562 KOps/s | $\textbf{\color{#35bf28}+13.60\\%}$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9289ms | 0.6471ms | 1.5454 KOps/s | 1.4893 KOps/s | $\color{#35bf28}+3.77\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8554ms | 0.5847ms | 1.7102 KOps/s | 1.6490 KOps/s | $\color{#35bf28}+3.71\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7950ms | 0.5706ms | 1.7525 KOps/s | 1.6521 KOps/s | $\textbf{\color{#35bf28}+6.08\\%}$ | | test_vmap_transformer_speed[True-True] | 7.8635ms | 7.6356ms | 130.9659 Ops/s | 128.0857 Ops/s | $\color{#35bf28}+2.25\\%$ | | test_vmap_transformer_speed[True-False] | 8.0721ms | 7.6168ms | 131.2895 Ops/s | 125.2724 Ops/s | $\color{#35bf28}+4.80\\%$ | | test_vmap_transformer_speed[False-True] | 7.7256ms | 7.5313ms | 132.7784 Ops/s | 126.7172 Ops/s | $\color{#35bf28}+4.78\\%$ | | test_vmap_transformer_speed[False-False] | 7.7883ms | 7.5282ms | 132.8339 Ops/s | 127.7647 Ops/s | $\color{#35bf28}+3.97\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 19.4684ms | 18.8277ms | 53.1131 Ops/s | 52.6337 Ops/s | $\color{#35bf28}+0.91\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 19.2824ms | 18.6161ms | 53.7169 Ops/s | 52.3633 Ops/s | $\color{#35bf28}+2.59\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 18.7638ms | 18.4193ms | 54.2909 Ops/s | 53.0988 Ops/s | $\color{#35bf28}+2.25\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 18.9552ms | 18.3476ms | 54.5030 Ops/s | 53.0869 Ops/s | $\color{#35bf28}+2.67\\%$ | | test_to_module_speed[True] | 2.9451ms | 1.5524ms | 644.1564 Ops/s | 643.5436 Ops/s | $\color{#35bf28}+0.10\\%$ | | test_to_module_speed[False] | 1.9805ms | 1.5332ms | 652.2299 Ops/s | 651.5712 Ops/s | $\color{#35bf28}+0.10\\%$ | | test_tc_init | 44.4110μs | 27.2293μs | 36.7252 KOps/s | 48.5333 KOps/s | $\textbf{\color{#d91a1a}-24.33\\%}$ | | test_tc_init_nested | 0.2594ms | 55.4505μs | 18.0341 KOps/s | 22.9071 KOps/s | $\textbf{\color{#d91a1a}-21.27\\%}$ | | test_tc_first_layer_tensor | 5.1906μs | 0.3556μs | 2.8123 MOps/s | 2.7490 MOps/s | $\color{#35bf28}+2.30\\%$ | | test_tc_first_layer_nontensor | 3.2062μs | 0.3879μs | 2.5778 MOps/s | 2.5570 MOps/s | $\color{#35bf28}+0.81\\%$ | | test_tc_second_layer_tensor | 44.6008μs | 0.9649μs | 1.0363 MOps/s | 936.2695 KOps/s | $\textbf{\color{#35bf28}+10.69\\%}$ | | test_tc_second_layer_nontensor | 10.9822μs | 0.7963μs | 1.2558 MOps/s | 1.2241 MOps/s | $\color{#35bf28}+2.59\\%$ | | test_unbind | 0.1124s | 6.7782ms | 147.5325 Ops/s | 141.3793 Ops/s | $\color{#35bf28}+4.35\\%$ | | test_full_like | 11.6937ms | 11.0402ms | 90.5782 Ops/s | 75.6435 Ops/s | $\textbf{\color{#35bf28}+19.74\\%}$ | | test_zeros_like | 8.2667ms | 7.7743ms | 128.6297 Ops/s | 127.0539 Ops/s | $\color{#35bf28}+1.24\\%$ | | test_ones_like | 8.5316ms | 7.8983ms | 126.6099 Ops/s | 128.3566 Ops/s | $\color{#d91a1a}-1.36\\%$ | | test_clone | 9.4258ms | 9.1493ms | 109.2978 Ops/s | 109.0509 Ops/s | $\color{#35bf28}+0.23\\%$ | | test_squeeze | 60.9410μs | 11.3117μs | 88.4038 KOps/s | 88.7251 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_unsqueeze | 96.3110μs | 51.8558μs | 19.2843 KOps/s | 18.7260 KOps/s | $\color{#35bf28}+2.98\\%$ | | test_split | 0.8669ms | 97.4739μs | 10.2592 KOps/s | 9.9372 KOps/s | $\color{#35bf28}+3.24\\%$ | | test_permute | 0.1524ms | 0.1091ms | 9.1669 KOps/s | 8.7706 KOps/s | $\color{#35bf28}+4.52\\%$ | | test_stack | 26.9726ms | 26.4589ms | 37.7945 Ops/s | 37.7957 Ops/s | $-0.00\\%$ | | test_cat | 26.4930ms | 26.3789ms | 37.9091 Ops/s | 37.8367 Ops/s | $\color{#35bf28}+0.19\\%$ |