pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
832 stars 74 forks source link

[Feature] Multithreaded pin_memory #845

Closed vmoens closed 4 months ago

github-actions[bot] commented 4 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 34.7150μs | 16.9825μs | 58.8840 KOps/s | 60.2123 KOps/s | $\color{#d91a1a}-2.21\\%$ | | test_plain_set_stack_nested | 38.4020μs | 17.2206μs | 58.0699 KOps/s | 58.0892 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_plain_set_nested_inplace | 56.5270μs | 19.3548μs | 51.6667 KOps/s | 52.0042 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_plain_set_stack_nested_inplace | 54.9940μs | 19.3689μs | 51.6292 KOps/s | 51.6595 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_items | 27.5420μs | 2.5737μs | 388.5455 KOps/s | 369.1807 KOps/s | $\textbf{\color{#35bf28}+5.25\\%}$ | | test_items_nested | 0.5306ms | 0.2800ms | 3.5717 KOps/s | 3.6059 KOps/s | $\color{#d91a1a}-0.95\\%$ | | test_items_nested_locked | 1.5598ms | 0.2805ms | 3.5655 KOps/s | 3.5758 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_items_nested_leaf | 0.1396ms | 77.1734μs | 12.9578 KOps/s | 12.6440 KOps/s | $\color{#35bf28}+2.48\\%$ | | test_items_stack_nested | 0.5439ms | 0.2847ms | 3.5119 KOps/s | 3.6144 KOps/s | $\color{#d91a1a}-2.84\\%$ | | test_items_stack_nested_leaf | 0.1534ms | 77.5597μs | 12.8933 KOps/s | 12.8312 KOps/s | $\color{#35bf28}+0.48\\%$ | | test_items_stack_nested_locked | 1.4112ms | 0.2820ms | 3.5459 KOps/s | 3.5848 KOps/s | $\color{#d91a1a}-1.08\\%$ | | test_keys | 26.7400μs | 3.8261μs | 261.3622 KOps/s | 241.0064 KOps/s | $\textbf{\color{#35bf28}+8.45\\%}$ | | test_keys_nested | 0.2365ms | 0.1387ms | 7.2118 KOps/s | 7.1821 KOps/s | $\color{#35bf28}+0.41\\%$ | | test_keys_nested_locked | 0.7224ms | 0.1439ms | 6.9504 KOps/s | 6.8711 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_keys_nested_leaf | 0.2079ms | 0.1176ms | 8.5067 KOps/s | 8.3943 KOps/s | $\color{#35bf28}+1.34\\%$ | | test_keys_stack_nested | 0.4000ms | 0.1393ms | 7.1804 KOps/s | 7.2913 KOps/s | $\color{#d91a1a}-1.52\\%$ | | test_keys_stack_nested_leaf | 0.2067ms | 0.1150ms | 8.6952 KOps/s | 8.7050 KOps/s | $\color{#d91a1a}-0.11\\%$ | | test_keys_stack_nested_locked | 0.3070ms | 0.1411ms | 7.0878 KOps/s | 7.2241 KOps/s | $\color{#d91a1a}-1.89\\%$ | | test_values | 7.1185μs | 1.1230μs | 890.5063 KOps/s | 824.2288 KOps/s | $\textbf{\color{#35bf28}+8.04\\%}$ | | test_values_nested | 97.8220μs | 49.7808μs | 20.0881 KOps/s | 19.7933 KOps/s | $\color{#35bf28}+1.49\\%$ | | test_values_nested_locked | 0.1136ms | 50.5527μs | 19.7813 KOps/s | 19.7138 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_values_nested_leaf | 0.1876ms | 45.2647μs | 22.0923 KOps/s | 21.6869 KOps/s | $\color{#35bf28}+1.87\\%$ | | test_values_stack_nested | 0.1137ms | 51.4401μs | 19.4401 KOps/s | 19.3372 KOps/s | $\color{#35bf28}+0.53\\%$ | | test_values_stack_nested_leaf | 86.3220μs | 45.2098μs | 22.1191 KOps/s | 22.0799 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_values_stack_nested_locked | 0.1165ms | 51.1047μs | 19.5677 KOps/s | 19.2543 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_membership | 17.0520μs | 1.3517μs | 739.8350 KOps/s | 754.6117 KOps/s | $\color{#d91a1a}-1.96\\%$ | | test_membership_nested | 23.7950μs | 3.4558μs | 289.3695 KOps/s | 290.4025 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_membership_nested_leaf | 38.2120μs | 3.4827μs | 287.1372 KOps/s | 290.3730 KOps/s | $\color{#d91a1a}-1.11\\%$ | | test_membership_stacked_nested | 41.2600μs | 3.4516μs | 289.7214 KOps/s | 290.7116 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_membership_stacked_nested_leaf | 30.6480μs | 3.4649μs | 288.6085 KOps/s | 279.4433 KOps/s | $\color{#35bf28}+3.28\\%$ | | test_membership_nested_last | 48.1510μs | 4.1808μs | 239.1866 KOps/s | 236.7337 KOps/s | $\color{#35bf28}+1.04\\%$ | | test_membership_nested_leaf_last | 29.3050μs | 4.2175μs | 237.1055 KOps/s | 239.7360 KOps/s | $\color{#d91a1a}-1.10\\%$ | | test_membership_stacked_nested_last | 47.6000μs | 13.1296μs | 76.1637 KOps/s | 67.8608 KOps/s | $\textbf{\color{#35bf28}+12.24\\%}$ | | test_membership_stacked_nested_leaf_last | 83.3870μs | 13.2153μs | 75.6697 KOps/s | 74.7421 KOps/s | $\color{#35bf28}+1.24\\%$ | | test_nested_getleaf | 37.6510μs | 10.6125μs | 94.2287 KOps/s | 91.6459 KOps/s | $\color{#35bf28}+2.82\\%$ | | test_nested_get | 42.6180μs | 9.9716μs | 100.2849 KOps/s | 97.4957 KOps/s | $\color{#35bf28}+2.86\\%$ | | test_stacked_getleaf | 42.0990μs | 10.7478μs | 93.0423 KOps/s | 92.6171 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_stacked_get | 39.9750μs | 10.0226μs | 99.7744 KOps/s | 98.6609 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_nested_getitemleaf | 34.9750μs | 11.2965μs | 88.5230 KOps/s | 87.5099 KOps/s | $\color{#35bf28}+1.16\\%$ | | test_nested_getitem | 50.9760μs | 10.3818μs | 96.3221 KOps/s | 95.8519 KOps/s | $\color{#35bf28}+0.49\\%$ | | test_stacked_getitemleaf | 28.2830μs | 11.2104μs | 89.2033 KOps/s | 88.2264 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_stacked_getitem | 36.4580μs | 10.1396μs | 98.6237 KOps/s | 95.5732 KOps/s | $\color{#35bf28}+3.19\\%$ | | test_lock_nested | 0.8412ms | 0.3322ms | 3.0106 KOps/s | 2.9555 KOps/s | $\color{#35bf28}+1.86\\%$ | | test_lock_stack_nested | 0.4266ms | 0.2871ms | 3.4832 KOps/s | 3.3711 KOps/s | $\color{#35bf28}+3.33\\%$ | | test_unlock_nested | 0.7198ms | 0.3337ms | 2.9963 KOps/s | 2.9216 KOps/s | $\color{#35bf28}+2.56\\%$ | | test_unlock_stack_nested | 0.3658ms | 0.2947ms | 3.3933 KOps/s | 3.2980 KOps/s | $\color{#35bf28}+2.89\\%$ | | test_flatten_speed | 0.4922ms | 97.3024μs | 10.2772 KOps/s | 10.2416 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_unflatten_speed | 0.7905ms | 0.3991ms | 2.5055 KOps/s | 2.4146 KOps/s | $\color{#35bf28}+3.76\\%$ | | test_common_ops | 1.3431ms | 0.7050ms | 1.4184 KOps/s | 1.3851 KOps/s | $\color{#35bf28}+2.40\\%$ | | test_creation | 35.7870μs | 1.8376μs | 544.1966 KOps/s | 519.1031 KOps/s | $\color{#35bf28}+4.83\\%$ | | test_creation_empty | 34.6050μs | 11.2641μs | 88.7779 KOps/s | 95.8369 KOps/s | $\textbf{\color{#d91a1a}-7.37\\%}$ | | test_creation_nested_1 | 31.3690μs | 13.9345μs | 71.7643 KOps/s | 76.0731 KOps/s | $\textbf{\color{#d91a1a}-5.66\\%}$ | | test_creation_nested_2 | 56.8460μs | 17.3330μs | 57.6933 KOps/s | 60.2492 KOps/s | $\color{#d91a1a}-4.24\\%$ | | test_clone | 0.2366ms | 12.9786μs | 77.0497 KOps/s | 74.7033 KOps/s | $\color{#35bf28}+3.14\\%$ | | test_getitem[int] | 31.3180μs | 11.1101μs | 90.0083 KOps/s | 87.2952 KOps/s | $\color{#35bf28}+3.11\\%$ | | test_getitem[slice_int] | 49.8340μs | 22.2909μs | 44.8614 KOps/s | 44.1323 KOps/s | $\color{#35bf28}+1.65\\%$ | | test_getitem[range] | 78.1360μs | 57.5218μs | 17.3847 KOps/s | 16.8792 KOps/s | $\color{#35bf28}+3.00\\%$ | | test_getitem[tuple] | 49.6130μs | 18.6060μs | 53.7461 KOps/s | 51.1735 KOps/s | $\textbf{\color{#35bf28}+5.03\\%}$ | | test_getitem[list] | 0.1133ms | 39.6582μs | 25.2155 KOps/s | 24.4715 KOps/s | $\color{#35bf28}+3.04\\%$ | | test_setitem_dim[int] | 55.3940μs | 34.6227μs | 28.8828 KOps/s | 27.2058 KOps/s | $\textbf{\color{#35bf28}+6.16\\%}$ | | test_setitem_dim[slice_int] | 0.1118ms | 62.4124μs | 16.0225 KOps/s | 15.6983 KOps/s | $\color{#35bf28}+2.07\\%$ | | test_setitem_dim[range] | 0.1461ms | 84.4994μs | 11.8344 KOps/s | 11.6154 KOps/s | $\color{#35bf28}+1.89\\%$ | | test_setitem_dim[tuple] | 88.4560μs | 50.0759μs | 19.9697 KOps/s | 19.2878 KOps/s | $\color{#35bf28}+3.54\\%$ | | test_setitem | 56.9870μs | 20.2211μs | 49.4534 KOps/s | 48.9281 KOps/s | $\color{#35bf28}+1.07\\%$ | | test_set | 58.9510μs | 19.5654μs | 51.1107 KOps/s | 50.2502 KOps/s | $\color{#35bf28}+1.71\\%$ | | test_set_shared | 2.8338ms | 0.1394ms | 7.1746 KOps/s | 6.8913 KOps/s | $\color{#35bf28}+4.11\\%$ | | test_update | 0.1084ms | 22.8973μs | 43.6733 KOps/s | 44.4757 KOps/s | $\color{#d91a1a}-1.80\\%$ | | test_update_nested | 0.1219ms | 30.7103μs | 32.5624 KOps/s | 31.4112 KOps/s | $\color{#35bf28}+3.66\\%$ | | test_update__nested | 64.8420μs | 24.3715μs | 41.0315 KOps/s | 38.8960 KOps/s | $\textbf{\color{#35bf28}+5.49\\%}$ | | test_set_nested | 71.1140μs | 21.1196μs | 47.3494 KOps/s | 45.2210 KOps/s | $\color{#35bf28}+4.71\\%$ | | test_set_nested_new | 71.9550μs | 24.7419μs | 40.4172 KOps/s | 37.7533 KOps/s | $\textbf{\color{#35bf28}+7.06\\%}$ | | test_select | 87.6950μs | 39.1115μs | 25.5679 KOps/s | 24.3402 KOps/s | $\textbf{\color{#35bf28}+5.04\\%}$ | | test_select_nested | 0.1103ms | 56.7587μs | 17.6184 KOps/s | 17.1059 KOps/s | $\color{#35bf28}+3.00\\%$ | | test_exclude_nested | 0.2243ms | 0.1176ms | 8.5025 KOps/s | 8.3161 KOps/s | $\color{#35bf28}+2.24\\%$ | | test_empty[True] | 0.4642ms | 0.3931ms | 2.5441 KOps/s | 2.5144 KOps/s | $\color{#35bf28}+1.18\\%$ | | test_empty[False] | 9.6476μs | 1.0121μs | 988.0831 KOps/s | 969.6335 KOps/s | $\color{#35bf28}+1.90\\%$ | | test_unbind_speed | 1.6603ms | 0.2463ms | 4.0598 KOps/s | 3.9804 KOps/s | $\color{#35bf28}+2.00\\%$ | | test_unbind_speed_stack0 | 0.5415ms | 0.2339ms | 4.2762 KOps/s | 4.1547 KOps/s | $\color{#35bf28}+2.93\\%$ | | test_unbind_speed_stack1 | 71.4742ms | 0.6881ms | 1.4533 KOps/s | 1.4254 KOps/s | $\color{#35bf28}+1.96\\%$ | | test_split | 64.4586ms | 1.5682ms | 637.6854 Ops/s | 621.5054 Ops/s | $\color{#35bf28}+2.60\\%$ | | test_chunk | 64.7956ms | 1.5751ms | 634.8777 Ops/s | 621.1651 Ops/s | $\color{#35bf28}+2.21\\%$ | | test_creation[device0] | 0.1736ms | 82.8236μs | 12.0738 KOps/s | 11.9370 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_creation_from_tensor | 3.9885ms | 85.8929μs | 11.6424 KOps/s | 11.8378 KOps/s | $\color{#d91a1a}-1.65\\%$ | | test_add_one[memmap_tensor0] | 0.1119ms | 5.3561μs | 186.7020 KOps/s | 174.1264 KOps/s | $\textbf{\color{#35bf28}+7.22\\%}$ | | test_contiguous[memmap_tensor0] | 17.3320μs | 0.6364μs | 1.5714 MOps/s | 1.5329 MOps/s | $\color{#35bf28}+2.51\\%$ | | test_stack[memmap_tensor0] | 22.4720μs | 3.5166μs | 284.3628 KOps/s | 265.2966 KOps/s | $\textbf{\color{#35bf28}+7.19\\%}$ | | test_memmaptd_index | 0.9818ms | 0.2581ms | 3.8739 KOps/s | 3.8474 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_memmaptd_index_astensor | 0.5829ms | 0.3319ms | 3.0132 KOps/s | 3.0108 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_memmaptd_index_op | 1.5667ms | 0.6321ms | 1.5821 KOps/s | 1.6063 KOps/s | $\color{#d91a1a}-1.50\\%$ | | test_serialize_model | 0.1639s | 0.1034s | 9.6718 Ops/s | 9.2039 Ops/s | $\textbf{\color{#35bf28}+5.08\\%}$ | | test_serialize_model_pickle | 0.4605s | 0.3785s | 2.6418 Ops/s | 2.6108 Ops/s | $\color{#35bf28}+1.19\\%$ | | test_serialize_weights | 99.9073ms | 93.7466ms | 10.6670 Ops/s | 9.3249 Ops/s | $\textbf{\color{#35bf28}+14.39\\%}$ | | test_serialize_weights_returnearly | 0.1239s | 0.1178s | 8.4869 Ops/s | 8.1553 Ops/s | $\color{#35bf28}+4.07\\%$ | | test_serialize_weights_pickle | 1.0077s | 0.6139s | 1.6289 Ops/s | 1.4790 Ops/s | $\textbf{\color{#35bf28}+10.14\\%}$ | | test_serialize_weights_filesystem | 0.1031s | 92.0231ms | 10.8668 Ops/s | 10.0775 Ops/s | $\textbf{\color{#35bf28}+7.83\\%}$ | | test_serialize_model_filesystem | 92.2867ms | 90.9482ms | 10.9953 Ops/s | 10.6789 Ops/s | $\color{#35bf28}+2.96\\%$ | | test_reshape_pytree | 56.0860μs | 25.4642μs | 39.2709 KOps/s | 38.5802 KOps/s | $\color{#35bf28}+1.79\\%$ | | test_reshape_td | 79.7600μs | 34.1715μs | 29.2641 KOps/s | 28.3500 KOps/s | $\color{#35bf28}+3.22\\%$ | | test_view_pytree | 86.6020μs | 25.2771μs | 39.5615 KOps/s | 38.4743 KOps/s | $\color{#35bf28}+2.83\\%$ | | test_view_td | 82.6260μs | 39.4821μs | 25.3279 KOps/s | 24.2937 KOps/s | $\color{#35bf28}+4.26\\%$ | | test_unbind_pytree | 78.2870μs | 29.7339μs | 33.6317 KOps/s | 32.7725 KOps/s | $\color{#35bf28}+2.62\\%$ | | test_unbind_td | 0.4058ms | 36.3479μs | 27.5119 KOps/s | 26.6991 KOps/s | $\color{#35bf28}+3.04\\%$ | | test_split_pytree | 64.1200μs | 28.9972μs | 34.4861 KOps/s | 33.4489 KOps/s | $\color{#35bf28}+3.10\\%$ | | test_split_td | 0.1239ms | 40.8030μs | 24.5080 KOps/s | 24.6248 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_add_pytree | 76.2630μs | 35.2662μs | 28.3558 KOps/s | 27.2195 KOps/s | $\color{#35bf28}+4.17\\%$ | | test_add_td | 0.1151ms | 56.8263μs | 17.5975 KOps/s | 17.0460 KOps/s | $\color{#35bf28}+3.23\\%$ | | test_distributed | 0.2457ms | 0.1033ms | 9.6791 KOps/s | 9.5667 KOps/s | $\color{#35bf28}+1.18\\%$ | | test_tdmodule | 70.8130μs | 18.1778μs | 55.0121 KOps/s | 54.8839 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_tdmodule_dispatch | 66.4750μs | 36.4624μs | 27.4255 KOps/s | 27.8740 KOps/s | $\color{#d91a1a}-1.61\\%$ | | test_tdseq | 53.7900μs | 22.1279μs | 45.1918 KOps/s | 47.6169 KOps/s | $\textbf{\color{#d91a1a}-5.09\\%}$ | | test_tdseq_dispatch | 69.4010μs | 42.9739μs | 23.2699 KOps/s | 24.2006 KOps/s | $\color{#d91a1a}-3.85\\%$ | | test_instantiation_functorch | 2.3692ms | 1.3349ms | 749.1221 Ops/s | 750.9378 Ops/s | $\color{#d91a1a}-0.24\\%$ | | test_instantiation_td | 1.6230ms | 1.0393ms | 962.1728 Ops/s | 962.7417 Ops/s | $\color{#d91a1a}-0.06\\%$ | | test_exec_functorch | 0.2443ms | 0.1617ms | 6.1841 KOps/s | 6.1251 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_exec_functional_call | 0.3870ms | 0.1497ms | 6.6817 KOps/s | 6.5460 KOps/s | $\color{#35bf28}+2.07\\%$ | | test_exec_td | 0.3158ms | 0.1453ms | 6.8803 KOps/s | 6.8929 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_exec_td_decorator | 0.9250ms | 0.2256ms | 4.4321 KOps/s | 4.4753 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_vmap_mlp_speed[True-True] | 0.7806ms | 0.4886ms | 2.0467 KOps/s | 2.0161 KOps/s | $\color{#35bf28}+1.52\\%$ | | test_vmap_mlp_speed[True-False] | 0.6133ms | 0.4853ms | 2.0605 KOps/s | 2.0355 KOps/s | $\color{#35bf28}+1.23\\%$ | | test_vmap_mlp_speed[False-True] | 0.6066ms | 0.3924ms | 2.5482 KOps/s | 2.4944 KOps/s | $\color{#35bf28}+2.16\\%$ | | test_vmap_mlp_speed[False-False] | 0.5428ms | 0.3909ms | 2.5580 KOps/s | 2.4938 KOps/s | $\color{#35bf28}+2.57\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 0.8090ms | 0.5697ms | 1.7553 KOps/s | 1.7601 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8741ms | 0.5631ms | 1.7759 KOps/s | 1.7712 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.6746ms | 0.4567ms | 2.1896 KOps/s | 2.1535 KOps/s | $\color{#35bf28}+1.68\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7357ms | 0.4605ms | 2.1714 KOps/s | 2.1595 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_to_module_speed[True] | 3.8687ms | 1.7629ms | 567.2586 Ops/s | 593.2751 Ops/s | $\color{#d91a1a}-4.39\\%$ | | test_to_module_speed[False] | 1.9091ms | 1.6557ms | 603.9622 Ops/s | 599.2072 Ops/s | $\color{#35bf28}+0.79\\%$ | | test_tc_init | 82.8750μs | 33.0893μs | 30.2212 KOps/s | 34.1735 KOps/s | $\textbf{\color{#d91a1a}-11.57\\%}$ | | test_tc_init_nested | 0.1353ms | 66.7711μs | 14.9765 KOps/s | 16.6040 KOps/s | $\textbf{\color{#d91a1a}-9.80\\%}$ | | test_tc_first_layer_tensor | 5.1997μs | 0.7178μs | 1.3932 MOps/s | 1.4017 MOps/s | $\color{#d91a1a}-0.61\\%$ | | test_tc_first_layer_nontensor | 13.7388μs | 0.7131μs | 1.4023 MOps/s | 1.4350 MOps/s | $\color{#d91a1a}-2.28\\%$ | | test_tc_second_layer_tensor | 45.5050μs | 2.1588μs | 463.2144 KOps/s | 532.2423 KOps/s | $\textbf{\color{#d91a1a}-12.97\\%}$ | | test_tc_second_layer_nontensor | 0.1783ms | 1.8330μs | 545.5525 KOps/s | 636.1980 KOps/s | $\textbf{\color{#d91a1a}-14.25\\%}$ | | test_unbind | 79.3580ms | 8.0725ms | 123.8779 Ops/s | 195.5613 Ops/s | $\textbf{\color{#d91a1a}-36.66\\%}$ | | test_full_like | 15.3517ms | 10.3178ms | 96.9201 Ops/s | 94.7362 Ops/s | $\color{#35bf28}+2.31\\%$ | | test_zeros_like | 11.8738ms | 5.6997ms | 175.4493 Ops/s | 172.5648 Ops/s | $\color{#35bf28}+1.67\\%$ | | test_ones_like | 11.8520ms | 5.9366ms | 168.4473 Ops/s | 153.7278 Ops/s | $\textbf{\color{#35bf28}+9.58\\%}$ | | test_clone | 13.3101ms | 7.6074ms | 131.4505 Ops/s | 124.1707 Ops/s | $\textbf{\color{#35bf28}+5.86\\%}$ | | test_squeeze | 66.0650μs | 14.2111μs | 70.3678 KOps/s | 70.2974 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_unsqueeze | 0.1116ms | 60.1043μs | 16.6378 KOps/s | 16.4516 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_split | 0.1853ms | 0.1122ms | 8.9163 KOps/s | 8.8842 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_permute | 0.1966ms | 0.1279ms | 7.8192 KOps/s | 7.8556 KOps/s | $\color{#d91a1a}-0.46\\%$ | | test_stack | 29.7541ms | 22.0951ms | 45.2589 Ops/s | 45.6501 Ops/s | $\color{#d91a1a}-0.86\\%$ | | test_cat | 28.2445ms | 21.8182ms | 45.8332 Ops/s | 46.3530 Ops/s | $\color{#d91a1a}-1.12\\%$ |
github-actions[bot] commented 4 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}25$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 23.2300μs | 13.5009μs | 74.0690 KOps/s | 79.7305 KOps/s | $\textbf{\color{#d91a1a}-7.10\\%}$ | | test_plain_set_stack_nested | 29.9300μs | 13.5867μs | 73.6017 KOps/s | 79.7532 KOps/s | $\textbf{\color{#d91a1a}-7.71\\%}$ | | test_plain_set_nested_inplace | 37.2800μs | 14.9287μs | 66.9851 KOps/s | 72.3811 KOps/s | $\textbf{\color{#d91a1a}-7.45\\%}$ | | test_plain_set_stack_nested_inplace | 37.7610μs | 14.8423μs | 67.3752 KOps/s | 72.2516 KOps/s | $\textbf{\color{#d91a1a}-6.75\\%}$ | | test_items | 68.2810μs | 4.6061μs | 217.1043 KOps/s | 215.7461 KOps/s | $\color{#35bf28}+0.63\\%$ | | test_items_nested | 0.3975ms | 0.3443ms | 2.9042 KOps/s | 2.9211 KOps/s | $\color{#d91a1a}-0.58\\%$ | | test_items_nested_locked | 0.3821ms | 0.3450ms | 2.8987 KOps/s | 2.9040 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_items_nested_leaf | 0.1069ms | 83.0972μs | 12.0341 KOps/s | 12.0377 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_items_stack_nested | 0.3914ms | 0.3453ms | 2.8960 KOps/s | 2.8826 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_items_stack_nested_leaf | 0.1032ms | 83.8661μs | 11.9238 KOps/s | 11.8980 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_items_stack_nested_locked | 0.3832ms | 0.3452ms | 2.8970 KOps/s | 2.8810 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_keys | 21.6700μs | 4.4625μs | 224.0878 KOps/s | 229.9429 KOps/s | $\color{#d91a1a}-2.55\\%$ | | test_keys_nested | 95.8320μs | 69.7386μs | 14.3393 KOps/s | 14.2832 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_keys_nested_locked | 0.7569ms | 75.8903μs | 13.1769 KOps/s | 13.2772 KOps/s | $\color{#d91a1a}-0.76\\%$ | | test_keys_nested_leaf | 86.3200μs | 60.2934μs | 16.5856 KOps/s | 16.5511 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_keys_stack_nested | 90.3320μs | 68.8435μs | 14.5257 KOps/s | 14.4669 KOps/s | $\color{#35bf28}+0.41\\%$ | | test_keys_stack_nested_leaf | 79.7710μs | 58.5217μs | 17.0877 KOps/s | 17.0260 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_keys_stack_nested_locked | 99.3610μs | 75.0481μs | 13.3248 KOps/s | 13.3211 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_values | 7.8937μs | 1.8063μs | 553.6026 KOps/s | 529.6890 KOps/s | $\color{#35bf28}+4.51\\%$ | | test_values_nested | 59.0910μs | 36.0102μs | 27.7699 KOps/s | 28.2614 KOps/s | $\color{#d91a1a}-1.74\\%$ | | test_values_nested_locked | 60.8300μs | 37.5930μs | 26.6007 KOps/s | 26.9789 KOps/s | $\color{#d91a1a}-1.40\\%$ | | test_values_nested_leaf | 48.6200μs | 31.6005μs | 31.6450 KOps/s | 32.0946 KOps/s | $\color{#d91a1a}-1.40\\%$ | | test_values_stack_nested | 60.3010μs | 36.9553μs | 27.0597 KOps/s | 27.4829 KOps/s | $\color{#d91a1a}-1.54\\%$ | | test_values_stack_nested_leaf | 58.3000μs | 32.7568μs | 30.5280 KOps/s | 31.3720 KOps/s | $\color{#d91a1a}-2.69\\%$ | | test_values_stack_nested_locked | 68.2310μs | 38.3246μs | 26.0929 KOps/s | 26.5984 KOps/s | $\color{#d91a1a}-1.90\\%$ | | test_membership | 3.8160μs | 0.7310μs | 1.3679 MOps/s | 1.3671 MOps/s | $\color{#35bf28}+0.06\\%$ | | test_membership_nested | 18.1600μs | 2.5947μs | 385.3936 KOps/s | 394.2526 KOps/s | $\color{#d91a1a}-2.25\\%$ | | test_membership_nested_leaf | 28.7500μs | 2.5273μs | 395.6819 KOps/s | 387.7185 KOps/s | $\color{#35bf28}+2.05\\%$ | | test_membership_stacked_nested | 20.9300μs | 2.5500μs | 392.1621 KOps/s | 385.8764 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_membership_stacked_nested_leaf | 20.2400μs | 2.5239μs | 396.2114 KOps/s | 394.1269 KOps/s | $\color{#35bf28}+0.53\\%$ | | test_membership_nested_last | 17.8100μs | 3.0611μs | 326.6840 KOps/s | 325.8655 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_membership_nested_leaf_last | 20.1300μs | 3.0880μs | 323.8363 KOps/s | 321.7783 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_membership_stacked_nested_last | 35.8210μs | 3.8245μs | 261.4699 KOps/s | 323.4304 KOps/s | $\textbf{\color{#d91a1a}-19.16\\%}$ | | test_membership_stacked_nested_leaf_last | 17.1410μs | 3.8275μs | 261.2695 KOps/s | 324.3371 KOps/s | $\textbf{\color{#d91a1a}-19.45\\%}$ | | test_nested_getleaf | 35.4410μs | 8.3727μs | 119.4358 KOps/s | 119.3407 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_nested_get | 29.7090μs | 7.8398μs | 127.5541 KOps/s | 126.8077 KOps/s | $\color{#35bf28}+0.59\\%$ | | test_stacked_getleaf | 24.5010μs | 8.3896μs | 119.1958 KOps/s | 119.8073 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_stacked_get | 33.1910μs | 7.8317μs | 127.6869 KOps/s | 126.7992 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_nested_getitemleaf | 26.0100μs | 8.5519μs | 116.9327 KOps/s | 117.1240 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_nested_getitem | 30.7410μs | 8.0225μs | 124.6487 KOps/s | 123.9981 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_stacked_getitemleaf | 35.3910μs | 8.5870μs | 116.4549 KOps/s | 117.4720 KOps/s | $\color{#d91a1a}-0.87\\%$ | | test_stacked_getitem | 23.5400μs | 8.0561μs | 124.1297 KOps/s | 123.8654 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_lock_nested | 59.3044ms | 0.4033ms | 2.4794 KOps/s | 2.4987 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_lock_stack_nested | 0.3288ms | 0.2950ms | 3.3900 KOps/s | 3.3494 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_unlock_nested | 62.3646ms | 0.4029ms | 2.4819 KOps/s | 2.4757 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_unlock_stack_nested | 0.3426ms | 0.3044ms | 3.2857 KOps/s | 3.2487 KOps/s | $\color{#35bf28}+1.14\\%$ | | test_flatten_speed | 0.4389ms | 0.1011ms | 9.8909 KOps/s | 9.8570 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_unflatten_speed | 0.3537ms | 0.2885ms | 3.4667 KOps/s | 3.4096 KOps/s | $\color{#35bf28}+1.67\\%$ | | test_common_ops | 1.0620ms | 0.6053ms | 1.6522 KOps/s | 1.7458 KOps/s | $\textbf{\color{#d91a1a}-5.36\\%}$ | | test_creation | 38.6710μs | 1.5824μs | 631.9662 KOps/s | 613.5279 KOps/s | $\color{#35bf28}+3.01\\%$ | | test_creation_empty | 25.6900μs | 9.7778μs | 102.2724 KOps/s | 128.6314 KOps/s | $\textbf{\color{#d91a1a}-20.49\\%}$ | | test_creation_nested_1 | 0.2079ms | 11.5460μs | 86.6100 KOps/s | 105.3102 KOps/s | $\textbf{\color{#d91a1a}-17.76\\%}$ | | test_creation_nested_2 | 43.4310μs | 13.7773μs | 72.5832 KOps/s | 86.0265 KOps/s | $\textbf{\color{#d91a1a}-15.63\\%}$ | | test_clone | 71.1210μs | 11.2527μs | 88.8678 KOps/s | 86.0384 KOps/s | $\color{#35bf28}+3.29\\%$ | | test_getitem[int] | 27.6010μs | 10.5118μs | 95.1312 KOps/s | 94.3058 KOps/s | $\color{#35bf28}+0.88\\%$ | | test_getitem[slice_int] | 36.4210μs | 20.2367μs | 49.4151 KOps/s | 48.3068 KOps/s | $\color{#35bf28}+2.29\\%$ | | test_getitem[range] | 63.9410μs | 45.8282μs | 21.8206 KOps/s | 18.9578 KOps/s | $\textbf{\color{#35bf28}+15.10\\%}$ | | test_getitem[tuple] | 51.6310μs | 18.5366μs | 53.9472 KOps/s | 53.8428 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_getitem[list] | 0.1387ms | 32.2706μs | 30.9880 KOps/s | 29.6387 KOps/s | $\color{#35bf28}+4.55\\%$ | | test_setitem_dim[int] | 73.7410μs | 30.2379μs | 33.0711 KOps/s | 35.6167 KOps/s | $\textbf{\color{#d91a1a}-7.15\\%}$ | | test_setitem_dim[slice_int] | 0.1205ms | 52.0360μs | 19.2174 KOps/s | 20.4472 KOps/s | $\textbf{\color{#d91a1a}-6.01\\%}$ | | test_setitem_dim[range] | 90.9720μs | 68.8112μs | 14.5325 KOps/s | 14.8873 KOps/s | $\color{#d91a1a}-2.38\\%$ | | test_setitem_dim[tuple] | 65.6310μs | 45.4781μs | 21.9886 KOps/s | 23.5788 KOps/s | $\textbf{\color{#d91a1a}-6.74\\%}$ | | test_setitem | 56.2110μs | 16.2705μs | 61.4609 KOps/s | 64.4521 KOps/s | $\color{#d91a1a}-4.64\\%$ | | test_set | 72.5310μs | 15.8292μs | 63.1746 KOps/s | 66.9114 KOps/s | $\textbf{\color{#d91a1a}-5.58\\%}$ | | test_set_shared | 1.8884ms | 98.1491μs | 10.1886 KOps/s | 10.3213 KOps/s | $\color{#d91a1a}-1.29\\%$ | | test_update | 82.6810μs | 20.0031μs | 49.9923 KOps/s | 56.5121 KOps/s | $\textbf{\color{#d91a1a}-11.54\\%}$ | | test_update_nested | 66.8100μs | 24.2515μs | 41.2346 KOps/s | 44.1431 KOps/s | $\textbf{\color{#d91a1a}-6.59\\%}$ | | test_update__nested | 55.0210μs | 21.6203μs | 46.2528 KOps/s | 44.9732 KOps/s | $\color{#35bf28}+2.85\\%$ | | test_set_nested | 48.3800μs | 16.8589μs | 59.3160 KOps/s | 62.1749 KOps/s | $\color{#d91a1a}-4.60\\%$ | | test_set_nested_new | 64.2810μs | 19.4425μs | 51.4338 KOps/s | 53.4287 KOps/s | $\color{#d91a1a}-3.73\\%$ | | test_select | 64.7710μs | 32.1672μs | 31.0876 KOps/s | 31.0784 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_select_nested | 0.7326ms | 53.1999μs | 18.7970 KOps/s | 19.1865 KOps/s | $\color{#d91a1a}-2.03\\%$ | | test_exclude_nested | 0.1383ms | 0.1076ms | 9.2912 KOps/s | 9.1128 KOps/s | $\color{#35bf28}+1.96\\%$ | | test_empty[True] | 1.1571ms | 0.3533ms | 2.8303 KOps/s | 2.8932 KOps/s | $\color{#d91a1a}-2.17\\%$ | | test_empty[False] | 2.7271μs | 0.8095μs | 1.2353 MOps/s | 1.2220 MOps/s | $\color{#35bf28}+1.09\\%$ | | test_to | 86.4710μs | 56.6419μs | 17.6548 KOps/s | 16.9841 KOps/s | $\color{#35bf28}+3.95\\%$ | | test_to_nonblocking | 64.2310μs | 34.3615μs | 29.1024 KOps/s | 28.7565 KOps/s | $\color{#35bf28}+1.20\\%$ | | test_unbind_speed | 0.9329ms | 0.2542ms | 3.9338 KOps/s | 3.8131 KOps/s | $\color{#35bf28}+3.17\\%$ | | test_unbind_speed_stack0 | 0.3248ms | 0.2557ms | 3.9114 KOps/s | 3.8220 KOps/s | $\color{#35bf28}+2.34\\%$ | | test_unbind_speed_stack1 | 76.6261ms | 0.7781ms | 1.2851 KOps/s | 1.2675 KOps/s | $\color{#35bf28}+1.39\\%$ | | test_split | 76.9916ms | 1.6960ms | 589.6193 Ops/s | 587.7299 Ops/s | $\color{#35bf28}+0.32\\%$ | | test_chunk | 77.0890ms | 1.6922ms | 590.9470 Ops/s | 584.8451 Ops/s | $\color{#35bf28}+1.04\\%$ | | test_creation[device0] | 0.1316ms | 56.0388μs | 17.8448 KOps/s | 17.6323 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_creation_from_tensor | 0.1307ms | 53.6730μs | 18.6314 KOps/s | 18.9802 KOps/s | $\color{#d91a1a}-1.84\\%$ | | test_add_one[memmap_tensor0] | 89.5120μs | 6.9456μs | 143.9765 KOps/s | 139.5688 KOps/s | $\color{#35bf28}+3.16\\%$ | | test_contiguous[memmap_tensor0] | 29.8310μs | 0.6396μs | 1.5635 MOps/s | 1.5697 MOps/s | $\color{#d91a1a}-0.40\\%$ | | test_stack[memmap_tensor0] | 39.0500μs | 4.7143μs | 212.1221 KOps/s | 209.0847 KOps/s | $\color{#35bf28}+1.45\\%$ | | test_memmaptd_index | 1.3427ms | 0.2782ms | 3.5946 KOps/s | 3.5444 KOps/s | $\color{#35bf28}+1.42\\%$ | | test_memmaptd_index_astensor | 0.6586ms | 0.3392ms | 2.9480 KOps/s | 2.9079 KOps/s | $\color{#35bf28}+1.38\\%$ | | test_memmaptd_index_op | 1.0957ms | 0.6606ms | 1.5138 KOps/s | 1.5927 KOps/s | $\color{#d91a1a}-4.96\\%$ | | test_serialize_model | 98.4940ms | 94.7150ms | 10.5580 Ops/s | 10.1545 Ops/s | $\color{#35bf28}+3.97\\%$ | | test_serialize_model_pickle | 1.3713s | 1.2387s | 0.8073 Ops/s | 0.8061 Ops/s | $\color{#35bf28}+0.15\\%$ | | test_serialize_weights | 0.1065s | 95.9888ms | 10.4179 Ops/s | 10.3116 Ops/s | $\color{#35bf28}+1.03\\%$ | | test_serialize_weights_returnearly | 0.2787s | 86.3635ms | 11.5790 Ops/s | 11.9285 Ops/s | $\color{#d91a1a}-2.93\\%$ | | test_serialize_weights_pickle | 1.3482s | 1.2476s | 0.8015 Ops/s | 0.8015 Ops/s | $+0.00\\%$ | | test_reshape_pytree | 0.1655ms | 27.3332μs | 36.5855 KOps/s | 38.6401 KOps/s | $\textbf{\color{#d91a1a}-5.32\\%}$ | | test_reshape_td | 54.1320μs | 31.6407μs | 31.6048 KOps/s | 31.9607 KOps/s | $\color{#d91a1a}-1.11\\%$ | | test_view_pytree | 47.5210μs | 25.5809μs | 39.0916 KOps/s | 38.7684 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_view_td | 69.0910μs | 36.2133μs | 27.6142 KOps/s | 27.8108 KOps/s | $\color{#d91a1a}-0.71\\%$ | | test_unbind_pytree | 58.7510μs | 31.6118μs | 31.6338 KOps/s | 31.5532 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_unbind_td | 0.4690ms | 39.6674μs | 25.2096 KOps/s | 25.1190 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_split_pytree | 0.1364ms | 34.7962μs | 28.7387 KOps/s | 29.5474 KOps/s | $\color{#d91a1a}-2.74\\%$ | | test_split_td | 0.5158ms | 38.3647μs | 26.0656 KOps/s | 25.6026 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_add_pytree | 64.8810μs | 37.9345μs | 26.3612 KOps/s | 26.1558 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_add_td | 78.9510μs | 53.8714μs | 18.5627 KOps/s | 21.1086 KOps/s | $\textbf{\color{#d91a1a}-12.06\\%}$ | | test_distributed | 1.9378ms | 71.6007μs | 13.9663 KOps/s | 9.9933 KOps/s | $\textbf{\color{#35bf28}+39.76\\%}$ | | test_tdmodule | 34.3510μs | 15.8169μs | 63.2235 KOps/s | 68.2671 KOps/s | $\textbf{\color{#d91a1a}-7.39\\%}$ | | test_tdmodule_dispatch | 48.0300μs | 31.2357μs | 32.0147 KOps/s | 34.8535 KOps/s | $\textbf{\color{#d91a1a}-8.15\\%}$ | | test_tdseq | 34.8400μs | 17.1607μs | 58.2727 KOps/s | 61.8208 KOps/s | $\textbf{\color{#d91a1a}-5.74\\%}$ | | test_tdseq_dispatch | 85.0610μs | 33.6560μs | 29.7124 KOps/s | 31.7940 KOps/s | $\textbf{\color{#d91a1a}-6.55\\%}$ | | test_instantiation_functorch | 1.5911ms | 1.4435ms | 692.7772 Ops/s | 691.3964 Ops/s | $\color{#35bf28}+0.20\\%$ | | test_instantiation_td | 1.5348ms | 1.0147ms | 985.5335 Ops/s | 908.4744 Ops/s | $\textbf{\color{#35bf28}+8.48\\%}$ | | test_exec_functorch | 0.2337ms | 0.1456ms | 6.8676 KOps/s | 6.8106 KOps/s | $\color{#35bf28}+0.84\\%$ | | test_exec_functional_call | 0.1835ms | 0.1369ms | 7.3051 KOps/s | 7.1147 KOps/s | $\color{#35bf28}+2.68\\%$ | | test_exec_td | 0.1660ms | 0.1351ms | 7.4006 KOps/s | 7.2857 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_exec_td_decorator | 0.6658ms | 0.2092ms | 4.7805 KOps/s | 4.6699 KOps/s | $\color{#35bf28}+2.37\\%$ | | test_vmap_mlp_speed[True-True] | 0.7237ms | 0.5834ms | 1.7142 KOps/s | 1.7311 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_vmap_mlp_speed[True-False] | 0.6564ms | 0.5768ms | 1.7338 KOps/s | 1.6740 KOps/s | $\color{#35bf28}+3.57\\%$ | | test_vmap_mlp_speed[False-True] | 0.6237ms | 0.5068ms | 1.9731 KOps/s | 1.8739 KOps/s | $\textbf{\color{#35bf28}+5.29\\%}$ | | test_vmap_mlp_speed[False-False] | 0.7346ms | 0.5187ms | 1.9280 KOps/s | 1.8741 KOps/s | $\color{#35bf28}+2.88\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.3446ms | 0.6434ms | 1.5543 KOps/s | 1.4979 KOps/s | $\color{#35bf28}+3.76\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8122ms | 0.6616ms | 1.5116 KOps/s | 1.5314 KOps/s | $\color{#d91a1a}-1.29\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8436ms | 0.5810ms | 1.7213 KOps/s | 1.7278 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 1.0181ms | 0.5933ms | 1.6856 KOps/s | 1.5757 KOps/s | $\textbf{\color{#35bf28}+6.97\\%}$ | | test_vmap_transformer_speed[True-True] | 7.8147ms | 7.6059ms | 131.4766 Ops/s | 130.3027 Ops/s | $\color{#35bf28}+0.90\\%$ | | test_vmap_transformer_speed[True-False] | 7.9336ms | 7.6299ms | 131.0628 Ops/s | 130.6402 Ops/s | $\color{#35bf28}+0.32\\%$ | | test_vmap_transformer_speed[False-True] | 7.8667ms | 7.5727ms | 132.0534 Ops/s | 131.6662 Ops/s | $\color{#35bf28}+0.29\\%$ | | test_vmap_transformer_speed[False-False] | 7.8767ms | 7.5420ms | 132.5915 Ops/s | 131.3789 Ops/s | $\color{#35bf28}+0.92\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 18.8973ms | 18.5177ms | 54.0023 Ops/s | 53.4708 Ops/s | $\color{#35bf28}+0.99\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 18.7291ms | 18.4780ms | 54.1183 Ops/s | 53.5223 Ops/s | $\color{#35bf28}+1.11\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 19.1473ms | 18.4736ms | 54.1312 Ops/s | 53.7482 Ops/s | $\color{#35bf28}+0.71\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 18.5628ms | 18.3671ms | 54.4450 Ops/s | 53.9024 Ops/s | $\color{#35bf28}+1.01\\%$ | | test_to_module_speed[True] | 2.0797ms | 1.5318ms | 652.8271 Ops/s | 653.2114 Ops/s | $\color{#d91a1a}-0.06\\%$ | | test_to_module_speed[False] | 1.7173ms | 1.5167ms | 659.3235 Ops/s | 659.2992 Ops/s | $+0.00\\%$ | | test_tc_init | 45.9400μs | 28.2338μs | 35.4185 KOps/s | 42.3074 KOps/s | $\textbf{\color{#d91a1a}-16.28\\%}$ | | test_tc_init_nested | 0.2690ms | 56.8837μs | 17.5797 KOps/s | 21.2839 KOps/s | $\textbf{\color{#d91a1a}-17.40\\%}$ | | test_tc_first_layer_tensor | 5.1486μs | 0.3611μs | 2.7693 MOps/s | 2.8170 MOps/s | $\color{#d91a1a}-1.69\\%$ | | test_tc_first_layer_nontensor | 14.6166μs | 0.3831μs | 2.6102 MOps/s | 2.5944 MOps/s | $\color{#35bf28}+0.61\\%$ | | test_tc_second_layer_tensor | 14.7200μs | 1.0560μs | 946.9763 KOps/s | 1.0324 MOps/s | $\textbf{\color{#d91a1a}-8.28\\%}$ | | test_tc_second_layer_nontensor | 37.3690μs | 0.8384μs | 1.1927 MOps/s | 1.2238 MOps/s | $\color{#d91a1a}-2.54\\%$ | | test_unbind | 5.1691ms | 4.9123ms | 203.5698 Ops/s | 122.6676 Ops/s | $\textbf{\color{#35bf28}+65.95\\%}$ | | test_full_like | 13.3077ms | 13.1025ms | 76.3212 Ops/s | 75.8604 Ops/s | $\color{#35bf28}+0.61\\%$ | | test_zeros_like | 8.0279ms | 7.7905ms | 128.3608 Ops/s | 128.2013 Ops/s | $\color{#35bf28}+0.12\\%$ | | test_ones_like | 8.0617ms | 7.8143ms | 127.9711 Ops/s | 127.2815 Ops/s | $\color{#35bf28}+0.54\\%$ | | test_clone | 11.8084ms | 9.2790ms | 107.7698 Ops/s | 106.3068 Ops/s | $\color{#35bf28}+1.38\\%$ | | test_squeeze | 0.2108ms | 10.6649μs | 93.7658 KOps/s | 91.4837 KOps/s | $\color{#35bf28}+2.49\\%$ | | test_unsqueeze | 0.1614ms | 50.3915μs | 19.8446 KOps/s | 19.2129 KOps/s | $\color{#35bf28}+3.29\\%$ | | test_split | 0.1385ms | 96.3711μs | 10.3766 KOps/s | 10.3452 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_permute | 0.3359ms | 0.1083ms | 9.2303 KOps/s | 9.2697 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_stack | 28.6565ms | 27.0428ms | 36.9784 Ops/s | 37.1746 Ops/s | $\color{#d91a1a}-0.53\\%$ | | test_cat | 27.0416ms | 26.9025ms | 37.1713 Ops/s | 37.2354 Ops/s | $\color{#d91a1a}-0.17\\%$ |