pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[Quality] zip-strict when possible #886

Closed vmoens closed 1 month ago

vmoens commented 1 month ago

Description

Describe your changes in detail.

Motivation and Context

Why is this change required? What problem does it solve? If it fixes an open issue, please link to the issue here. You can use the syntax close #15213 if this solves the issue #15213

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Checklist

Go over all the following points, and put an x in all the boxes that apply. If you are unsure about any of these, don't hesitate to ask. We are here to help!

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 133. Improved: $\large\color{#35bf28}19$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 37.0690μs | 17.3290μs | 57.7069 KOps/s | 56.8121 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_plain_set_stack_nested | 34.5550μs | 17.6516μs | 56.6521 KOps/s | 55.8475 KOps/s | $\color{#35bf28}+1.44\\%$ | | test_plain_set_nested_inplace | 76.0220μs | 19.1295μs | 52.2753 KOps/s | 51.1032 KOps/s | $\color{#35bf28}+2.29\\%$ | | test_plain_set_stack_nested_inplace | 64.0590μs | 19.1134μs | 52.3192 KOps/s | 50.9084 KOps/s | $\color{#35bf28}+2.77\\%$ | | test_items | 31.7200μs | 2.6141μs | 382.5422 KOps/s | 315.6673 KOps/s | $\textbf{\color{#35bf28}+21.19\\%}$ | | test_items_nested | 0.5159ms | 0.3654ms | 2.7369 KOps/s | 2.7019 KOps/s | $\color{#35bf28}+1.29\\%$ | | test_items_nested_locked | 0.9281ms | 0.3678ms | 2.7192 KOps/s | 2.7399 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_items_nested_leaf | 0.1581ms | 88.0219μs | 11.3608 KOps/s | 11.6996 KOps/s | $\color{#d91a1a}-2.90\\%$ | | test_items_stack_nested | 0.4556ms | 0.3679ms | 2.7184 KOps/s | 2.7179 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_items_stack_nested_leaf | 0.1551ms | 87.4050μs | 11.4410 KOps/s | 11.9028 KOps/s | $\color{#d91a1a}-3.88\\%$ | | test_items_stack_nested_locked | 0.7448ms | 0.3660ms | 2.7324 KOps/s | 2.6989 KOps/s | $\color{#35bf28}+1.24\\%$ | | test_keys | 44.4830μs | 3.8905μs | 257.0389 KOps/s | 259.5611 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_keys_nested | 0.2456ms | 0.1437ms | 6.9578 KOps/s | 6.9272 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_keys_nested_locked | 0.7143ms | 0.1486ms | 6.7303 KOps/s | 6.6631 KOps/s | $\color{#35bf28}+1.01\\%$ | | test_keys_nested_leaf | 0.2206ms | 0.1221ms | 8.1899 KOps/s | 8.1363 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_keys_stack_nested | 0.2690ms | 0.1417ms | 7.0566 KOps/s | 6.9935 KOps/s | $\color{#35bf28}+0.90\\%$ | | test_keys_stack_nested_leaf | 0.2104ms | 0.1220ms | 8.1939 KOps/s | 8.1170 KOps/s | $\color{#35bf28}+0.95\\%$ | | test_keys_stack_nested_locked | 0.2523ms | 0.1473ms | 6.7869 KOps/s | 6.7118 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_values | 6.0392μs | 1.1446μs | 873.6376 KOps/s | 862.3896 KOps/s | $\color{#35bf28}+1.30\\%$ | | test_values_nested | 93.4750μs | 48.5162μs | 20.6117 KOps/s | 20.0532 KOps/s | $\color{#35bf28}+2.78\\%$ | | test_values_nested_locked | 0.2126ms | 49.4295μs | 20.2308 KOps/s | 20.2430 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_values_nested_leaf | 88.9860μs | 43.6787μs | 22.8944 KOps/s | 22.3397 KOps/s | $\color{#35bf28}+2.48\\%$ | | test_values_stack_nested | 0.1136ms | 50.1616μs | 19.9356 KOps/s | 19.6490 KOps/s | $\color{#35bf28}+1.46\\%$ | | test_values_stack_nested_leaf | 94.0160μs | 44.0131μs | 22.7205 KOps/s | 22.9235 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_values_stack_nested_locked | 91.1310μs | 49.8000μs | 20.0803 KOps/s | 19.7143 KOps/s | $\color{#35bf28}+1.86\\%$ | | test_membership | 20.7090μs | 0.9040μs | 1.1062 MOps/s | 1.3595 MOps/s | $\textbf{\color{#d91a1a}-18.63\\%}$ | | test_membership_nested | 95.8690μs | 2.6740μs | 373.9663 KOps/s | 363.8171 KOps/s | $\color{#35bf28}+2.79\\%$ | | test_membership_nested_leaf | 0.1075ms | 2.7201μs | 367.6388 KOps/s | 367.2169 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_membership_stacked_nested | 17.8330μs | 2.6760μs | 373.6963 KOps/s | 371.1984 KOps/s | $\color{#35bf28}+0.67\\%$ | | test_membership_stacked_nested_leaf | 27.4810μs | 2.6756μs | 373.7453 KOps/s | 373.3663 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_membership_nested_last | 28.8740μs | 3.9466μs | 253.3812 KOps/s | 250.8092 KOps/s | $\color{#35bf28}+1.03\\%$ | | test_membership_nested_leaf_last | 48.8310μs | 3.9912μs | 250.5539 KOps/s | 249.7861 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_membership_stacked_nested_last | 45.5150μs | 7.4855μs | 133.5913 KOps/s | 78.0405 KOps/s | $\textbf{\color{#35bf28}+71.18\\%}$ | | test_membership_stacked_nested_leaf_last | 51.1250μs | 7.4638μs | 133.9798 KOps/s | 77.8508 KOps/s | $\textbf{\color{#35bf28}+72.10\\%}$ | | test_nested_getleaf | 90.1780μs | 10.9785μs | 91.0869 KOps/s | 93.5408 KOps/s | $\color{#d91a1a}-2.62\\%$ | | test_nested_get | 92.5030μs | 10.3791μs | 96.3473 KOps/s | 98.0410 KOps/s | $\color{#d91a1a}-1.73\\%$ | | test_stacked_getleaf | 57.3270μs | 10.7885μs | 92.6912 KOps/s | 94.1451 KOps/s | $\color{#d91a1a}-1.54\\%$ | | test_stacked_get | 0.1165ms | 10.2268μs | 97.7818 KOps/s | 99.1930 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_nested_getitemleaf | 45.9760μs | 11.2394μs | 88.9728 KOps/s | 88.8566 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_nested_getitem | 58.4310μs | 10.3497μs | 96.6210 KOps/s | 96.7850 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_stacked_getitemleaf | 36.6290μs | 11.1799μs | 89.4463 KOps/s | 89.6279 KOps/s | $\color{#d91a1a}-0.20\\%$ | | test_stacked_getitem | 0.1130ms | 10.3801μs | 96.3380 KOps/s | 96.8040 KOps/s | $\color{#d91a1a}-0.48\\%$ | | test_lock_nested | 7.7477ms | 0.4589ms | 2.1790 KOps/s | 2.3014 KOps/s | $\textbf{\color{#d91a1a}-5.32\\%}$ | | test_lock_stack_nested | 0.6402ms | 0.4253ms | 2.3511 KOps/s | 2.5029 KOps/s | $\textbf{\color{#d91a1a}-6.07\\%}$ | | test_unlock_nested | 0.9485ms | 0.3753ms | 2.6645 KOps/s | 2.3764 KOps/s | $\textbf{\color{#35bf28}+12.13\\%}$ | | test_unlock_stack_nested | 0.5840ms | 0.3415ms | 2.9280 KOps/s | 3.2011 KOps/s | $\textbf{\color{#d91a1a}-8.53\\%}$ | | test_flatten_speed | 0.5113ms | 0.1046ms | 9.5633 KOps/s | 9.5300 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_unflatten_speed | 0.7922ms | 0.4360ms | 2.2936 KOps/s | 2.2859 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_common_ops | 3.9043ms | 0.7677ms | 1.3026 KOps/s | 1.2495 KOps/s | $\color{#35bf28}+4.25\\%$ | | test_creation | 17.0720μs | 2.2874μs | 437.1686 KOps/s | 419.6640 KOps/s | $\color{#35bf28}+4.17\\%$ | | test_creation_empty | 56.6560μs | 10.8741μs | 91.9618 KOps/s | 81.4158 KOps/s | $\textbf{\color{#35bf28}+12.95\\%}$ | | test_creation_nested_1 | 37.5900μs | 13.6705μs | 73.1503 KOps/s | 66.2487 KOps/s | $\textbf{\color{#35bf28}+10.42\\%}$ | | test_creation_nested_2 | 56.4450μs | 17.4735μs | 57.2294 KOps/s | 51.8890 KOps/s | $\textbf{\color{#35bf28}+10.29\\%}$ | | test_clone | 54.0610μs | 12.9801μs | 77.0412 KOps/s | 75.4281 KOps/s | $\color{#35bf28}+2.14\\%$ | | test_getitem[int] | 1.4342ms | 11.6170μs | 86.0810 KOps/s | 87.0297 KOps/s | $\color{#d91a1a}-1.09\\%$ | | test_getitem[slice_int] | 61.7860μs | 23.0358μs | 43.4106 KOps/s | 41.9444 KOps/s | $\color{#35bf28}+3.50\\%$ | | test_getitem[range] | 0.1707ms | 44.6882μs | 22.3772 KOps/s | 21.8887 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_getitem[tuple] | 56.7660μs | 19.3660μs | 51.6368 KOps/s | 50.6477 KOps/s | $\color{#35bf28}+1.95\\%$ | | test_getitem[list] | 0.1385ms | 39.8028μs | 25.1239 KOps/s | 25.0657 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_setitem_dim[int] | 67.0660μs | 31.9243μs | 31.3241 KOps/s | 29.1948 KOps/s | $\textbf{\color{#35bf28}+7.29\\%}$ | | test_setitem_dim[slice_int] | 0.1234ms | 59.3989μs | 16.8353 KOps/s | 15.8791 KOps/s | $\textbf{\color{#35bf28}+6.02\\%}$ | | test_setitem_dim[range] | 0.2539ms | 80.5777μs | 12.4104 KOps/s | 12.1582 KOps/s | $\color{#35bf28}+2.07\\%$ | | test_setitem_dim[tuple] | 0.1258ms | 49.4420μs | 20.2257 KOps/s | 19.6839 KOps/s | $\color{#35bf28}+2.75\\%$ | | test_setitem | 83.0760μs | 19.8281μs | 50.4335 KOps/s | 47.9498 KOps/s | $\textbf{\color{#35bf28}+5.18\\%}$ | | test_set | 62.2060μs | 19.2778μs | 51.8730 KOps/s | 49.7980 KOps/s | $\color{#35bf28}+4.17\\%$ | | test_set_shared | 2.1341ms | 0.1690ms | 5.9174 KOps/s | 5.8637 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_update | 0.1254ms | 21.6728μs | 46.1407 KOps/s | 42.2694 KOps/s | $\textbf{\color{#35bf28}+9.16\\%}$ | | test_update_nested | 0.1137ms | 31.4665μs | 31.7798 KOps/s | 30.5107 KOps/s | $\color{#35bf28}+4.16\\%$ | | test_update__nested | 89.9680μs | 25.1956μs | 39.6894 KOps/s | 38.9448 KOps/s | $\color{#35bf28}+1.91\\%$ | | test_set_nested | 74.9610μs | 20.9875μs | 47.6475 KOps/s | 45.3430 KOps/s | $\textbf{\color{#35bf28}+5.08\\%}$ | | test_set_nested_new | 0.1017ms | 25.4677μs | 39.2655 KOps/s | 37.5898 KOps/s | $\color{#35bf28}+4.46\\%$ | | test_select | 1.1055ms | 41.4450μs | 24.1283 KOps/s | 23.6854 KOps/s | $\color{#35bf28}+1.87\\%$ | | test_select_nested | 0.1437ms | 60.4240μs | 16.5497 KOps/s | 16.6680 KOps/s | $\color{#d91a1a}-0.71\\%$ | | test_exclude_nested | 0.1493ms | 80.8066μs | 12.3752 KOps/s | 12.4339 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_empty[True] | 0.4730ms | 0.3394ms | 2.9461 KOps/s | 2.9286 KOps/s | $\color{#35bf28}+0.60\\%$ | | test_empty[False] | 11.4365μs | 1.2579μs | 794.9763 KOps/s | 790.9410 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_unbind_speed | 0.4407ms | 0.2761ms | 3.6221 KOps/s | 3.8192 KOps/s | $\textbf{\color{#d91a1a}-5.16\\%}$ | | test_unbind_speed_stack0 | 0.5426ms | 0.2706ms | 3.6950 KOps/s | 3.9977 KOps/s | $\textbf{\color{#d91a1a}-7.57\\%}$ | | test_unbind_speed_stack1 | 77.9960ms | 0.7482ms | 1.3365 KOps/s | 1.5017 KOps/s | $\textbf{\color{#d91a1a}-11.00\\%}$ | | test_split | 74.3686ms | 1.6216ms | 616.6583 Ops/s | 620.3471 Ops/s | $\color{#d91a1a}-0.59\\%$ | | test_chunk | 76.5228ms | 1.6338ms | 612.0874 Ops/s | 615.2821 Ops/s | $\color{#d91a1a}-0.52\\%$ | | test_creation[device0] | 0.2470ms | 94.3248μs | 10.6017 KOps/s | 10.7157 KOps/s | $\color{#d91a1a}-1.06\\%$ | | test_creation_from_tensor | 6.2757ms | 96.9229μs | 10.3175 KOps/s | 10.4235 KOps/s | $\color{#d91a1a}-1.02\\%$ | | test_add_one[memmap_tensor0] | 0.1289ms | 5.5247μs | 181.0064 KOps/s | 173.4741 KOps/s | $\color{#35bf28}+4.34\\%$ | | test_contiguous[memmap_tensor0] | 9.9990μs | 0.6442μs | 1.5522 MOps/s | 1.5025 MOps/s | $\color{#35bf28}+3.31\\%$ | | test_stack[memmap_tensor0] | 41.7280μs | 3.5895μs | 278.5931 KOps/s | 244.5993 KOps/s | $\textbf{\color{#35bf28}+13.90\\%}$ | | test_memmaptd_index | 0.9536ms | 0.2565ms | 3.8990 KOps/s | 3.8980 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_memmaptd_index_astensor | 0.7612ms | 0.3355ms | 2.9804 KOps/s | 3.0270 KOps/s | $\color{#d91a1a}-1.54\\%$ | | test_memmaptd_index_op | 0.9234ms | 0.6232ms | 1.6046 KOps/s | 1.5832 KOps/s | $\color{#35bf28}+1.36\\%$ | | test_serialize_model | 0.1309s | 0.1267s | 7.8943 Ops/s | 7.2718 Ops/s | $\textbf{\color{#35bf28}+8.56\\%}$ | | test_serialize_model_pickle | 0.4318s | 0.3915s | 2.5544 Ops/s | 2.4981 Ops/s | $\color{#35bf28}+2.26\\%$ | | test_serialize_weights | 0.2030s | 0.1329s | 7.5264 Ops/s | 8.1777 Ops/s | $\textbf{\color{#d91a1a}-7.96\\%}$ | | test_serialize_weights_returnearly | 0.1714s | 0.1634s | 6.1210 Ops/s | 5.6416 Ops/s | $\textbf{\color{#35bf28}+8.50\\%}$ | | test_serialize_weights_pickle | 0.4466s | 0.4054s | 2.4667 Ops/s | 2.5558 Ops/s | $\color{#d91a1a}-3.49\\%$ | | test_serialize_weights_filesystem | 0.2183s | 0.1510s | 6.6203 Ops/s | 7.1232 Ops/s | $\textbf{\color{#d91a1a}-7.06\\%}$ | | test_serialize_model_filesystem | 0.1588s | 0.1541s | 6.4912 Ops/s | 6.5804 Ops/s | $\color{#d91a1a}-1.36\\%$ | | test_reshape_pytree | 60.0720μs | 25.9863μs | 38.4818 KOps/s | 37.9728 KOps/s | $\color{#35bf28}+1.34\\%$ | | test_reshape_td | 70.9130μs | 33.6123μs | 29.7510 KOps/s | 29.2814 KOps/s | $\color{#35bf28}+1.60\\%$ | | test_view_pytree | 77.6040μs | 25.8110μs | 38.7432 KOps/s | 39.7079 KOps/s | $\color{#d91a1a}-2.43\\%$ | | test_view_td | 85.8410μs | 38.8507μs | 25.7396 KOps/s | 25.6394 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_unbind_pytree | 87.2230μs | 29.4482μs | 33.9579 KOps/s | 34.0649 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_unbind_td | 0.3676ms | 40.1298μs | 24.9191 KOps/s | 26.2163 KOps/s | $\color{#d91a1a}-4.95\\%$ | | test_split_pytree | 62.0160μs | 29.3035μs | 34.1257 KOps/s | 33.4812 KOps/s | $\color{#35bf28}+1.92\\%$ | | test_split_td | 0.4763ms | 40.7686μs | 24.5287 KOps/s | 25.0091 KOps/s | $\color{#d91a1a}-1.92\\%$ | | test_add_pytree | 0.1053ms | 34.7074μs | 28.8123 KOps/s | 28.4227 KOps/s | $\color{#35bf28}+1.37\\%$ | | test_add_td | 0.1673ms | 54.3890μs | 18.3861 KOps/s | 17.1877 KOps/s | $\textbf{\color{#35bf28}+6.97\\%}$ | | test_distributed | 0.2571ms | 0.1280ms | 7.8099 KOps/s | 7.6415 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_tdmodule | 35.0650μs | 16.5195μs | 60.5344 KOps/s | 57.4175 KOps/s | $\textbf{\color{#35bf28}+5.43\\%}$ | | test_tdmodule_dispatch | 61.3950μs | 34.4181μs | 29.0544 KOps/s | 27.3034 KOps/s | $\textbf{\color{#35bf28}+6.41\\%}$ | | test_tdseq | 46.4470μs | 18.5916μs | 53.7877 KOps/s | 52.5285 KOps/s | $\color{#35bf28}+2.40\\%$ | | test_tdseq_dispatch | 62.5060μs | 38.6941μs | 25.8437 KOps/s | 24.9152 KOps/s | $\color{#35bf28}+3.73\\%$ | | test_instantiation_functorch | 1.5614ms | 1.3143ms | 760.8634 Ops/s | 750.5978 Ops/s | $\color{#35bf28}+1.37\\%$ | | test_instantiation_td | 1.9247ms | 1.0179ms | 982.3796 Ops/s | 968.5522 Ops/s | $\color{#35bf28}+1.43\\%$ | | test_exec_functorch | 0.2890ms | 0.1727ms | 5.7909 KOps/s | 6.0288 KOps/s | $\color{#d91a1a}-3.95\\%$ | | test_exec_functional_call | 0.2310ms | 0.1499ms | 6.6714 KOps/s | 6.5148 KOps/s | $\color{#35bf28}+2.40\\%$ | | test_exec_td | 0.2919ms | 0.1467ms | 6.8188 KOps/s | 6.4777 KOps/s | $\textbf{\color{#35bf28}+5.27\\%}$ | | test_exec_td_decorator | 0.8096ms | 0.2338ms | 4.2776 KOps/s | 4.2180 KOps/s | $\color{#35bf28}+1.41\\%$ | | test_vmap_mlp_speed[True-True] | 0.7757ms | 0.4918ms | 2.0334 KOps/s | 2.0146 KOps/s | $\color{#35bf28}+0.93\\%$ | | test_vmap_mlp_speed[True-False] | 0.6492ms | 0.4855ms | 2.0598 KOps/s | 1.9938 KOps/s | $\color{#35bf28}+3.31\\%$ | | test_vmap_mlp_speed[False-True] | 0.7801ms | 0.4012ms | 2.4925 KOps/s | 2.4950 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_vmap_mlp_speed[False-False] | 0.6826ms | 0.3992ms | 2.5053 KOps/s | 2.4824 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1723ms | 0.5816ms | 1.7195 KOps/s | 1.7167 KOps/s | $\color{#35bf28}+0.16\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9049ms | 0.5810ms | 1.7211 KOps/s | 1.7187 KOps/s | $\color{#35bf28}+0.14\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7470ms | 0.4762ms | 2.0999 KOps/s | 2.1077 KOps/s | $\color{#d91a1a}-0.37\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8129ms | 0.4764ms | 2.0993 KOps/s | 2.1096 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_to_module_speed[True] | 80.6200ms | 1.9675ms | 508.2655 Ops/s | 504.9593 Ops/s | $\color{#35bf28}+0.65\\%$ | | test_to_module_speed[False] | 2.3666ms | 1.7604ms | 568.0534 Ops/s | 556.0623 Ops/s | $\color{#35bf28}+2.16\\%$ | | test_tc_init | 71.3540μs | 37.9127μs | 26.3764 KOps/s | 26.4086 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_tc_init_nested | 0.1468ms | 75.9635μs | 13.1642 KOps/s | 12.7990 KOps/s | $\color{#35bf28}+2.85\\%$ | | test_tc_first_layer_tensor | 32.2600μs | 8.1342μs | 122.9370 KOps/s | 121.4800 KOps/s | $\color{#35bf28}+1.20\\%$ | | test_tc_first_layer_nontensor | 54.0310μs | 8.0771μs | 123.8069 KOps/s | 121.7196 KOps/s | $\color{#35bf28}+1.71\\%$ | | test_tc_second_layer_tensor | 26.3990μs | 2.5010μs | 399.8368 KOps/s | 399.4364 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_tc_second_layer_nontensor | 34.3150μs | 9.1460μs | 109.3378 KOps/s | 108.0159 KOps/s | $\color{#35bf28}+1.22\\%$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 141. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 61.3010μs | 12.7271μs | 78.5727 KOps/s | 78.1830 KOps/s | $\color{#35bf28}+0.50\\%$ | | test_plain_set_stack_nested | 30.3710μs | 12.7426μs | 78.4769 KOps/s | 78.1624 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_plain_set_nested_inplace | 36.4400μs | 13.8219μs | 72.3489 KOps/s | 72.6391 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_plain_set_stack_nested_inplace | 49.3410μs | 13.8116μs | 72.4031 KOps/s | 72.6925 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_items | 15.5500μs | 4.7439μs | 210.7991 KOps/s | 213.3090 KOps/s | $\color{#d91a1a}-1.18\\%$ | | test_items_nested | 0.4199ms | 0.3958ms | 2.5263 KOps/s | 2.5261 KOps/s | $+0.01\\%$ | | test_items_nested_locked | 0.4184ms | 0.3995ms | 2.5029 KOps/s | 2.5132 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_items_nested_leaf | 0.1050ms | 86.7907μs | 11.5220 KOps/s | 11.5487 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_items_stack_nested | 0.4461ms | 0.4003ms | 2.4982 KOps/s | 2.5186 KOps/s | $\color{#d91a1a}-0.81\\%$ | | test_items_stack_nested_leaf | 0.1050ms | 86.8122μs | 11.5191 KOps/s | 11.4498 KOps/s | $\color{#35bf28}+0.61\\%$ | | test_items_stack_nested_locked | 0.4187ms | 0.4000ms | 2.4997 KOps/s | 2.5356 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_keys | 20.2800μs | 4.3528μs | 229.7371 KOps/s | 228.0155 KOps/s | $\color{#35bf28}+0.76\\%$ | | test_keys_nested | 0.1104ms | 68.6897μs | 14.5582 KOps/s | 14.4448 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_keys_nested_locked | 2.5945ms | 75.3306μs | 13.2748 KOps/s | 13.2610 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_keys_nested_leaf | 80.4220μs | 59.3896μs | 16.8380 KOps/s | 17.1480 KOps/s | $\color{#d91a1a}-1.81\\%$ | | test_keys_stack_nested | 87.5220μs | 68.5327μs | 14.5916 KOps/s | 14.4974 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_keys_stack_nested_leaf | 0.1295ms | 57.6814μs | 17.3366 KOps/s | 16.7417 KOps/s | $\color{#35bf28}+3.55\\%$ | | test_keys_stack_nested_locked | 98.0520μs | 74.2407μs | 13.4697 KOps/s | 13.3522 KOps/s | $\color{#35bf28}+0.88\\%$ | | test_values | 8.2733μs | 1.7784μs | 562.3011 KOps/s | 570.1860 KOps/s | $\color{#d91a1a}-1.38\\%$ | | test_values_nested | 57.3410μs | 34.6581μs | 28.8533 KOps/s | 28.6196 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_values_nested_locked | 58.0310μs | 36.3510μs | 27.5096 KOps/s | 27.0421 KOps/s | $\color{#35bf28}+1.73\\%$ | | test_values_nested_leaf | 50.6110μs | 30.4774μs | 32.8112 KOps/s | 32.2663 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_values_stack_nested | 59.7110μs | 35.6381μs | 28.0598 KOps/s | 27.7746 KOps/s | $\color{#35bf28}+1.03\\%$ | | test_values_stack_nested_leaf | 69.9620μs | 31.5410μs | 31.7048 KOps/s | 31.2768 KOps/s | $\color{#35bf28}+1.37\\%$ | | test_values_stack_nested_locked | 63.6710μs | 37.1850μs | 26.8925 KOps/s | 26.3561 KOps/s | $\color{#35bf28}+2.04\\%$ | | test_membership | 3.5741μs | 0.5406μs | 1.8499 MOps/s | 1.8608 MOps/s | $\color{#d91a1a}-0.59\\%$ | | test_membership_nested | 29.1910μs | 2.0935μs | 477.6685 KOps/s | 477.3930 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_membership_nested_leaf | 13.3850μs | 2.0642μs | 484.4403 KOps/s | 492.2864 KOps/s | $\color{#d91a1a}-1.59\\%$ | | test_membership_stacked_nested | 33.6200μs | 2.0680μs | 483.5498 KOps/s | 469.5312 KOps/s | $\color{#35bf28}+2.99\\%$ | | test_membership_stacked_nested_leaf | 21.2100μs | 2.1020μs | 475.7464 KOps/s | 480.6952 KOps/s | $\color{#d91a1a}-1.03\\%$ | | test_membership_nested_last | 22.9500μs | 3.0141μs | 331.7699 KOps/s | 330.5309 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_membership_nested_leaf_last | 32.9810μs | 2.9697μs | 336.7311 KOps/s | 330.3411 KOps/s | $\color{#35bf28}+1.93\\%$ | | test_membership_stacked_nested_last | 20.0100μs | 3.4084μs | 293.3931 KOps/s | 289.6682 KOps/s | $\color{#35bf28}+1.29\\%$ | | test_membership_stacked_nested_leaf_last | 35.0600μs | 3.4198μs | 292.4158 KOps/s | 285.4982 KOps/s | $\color{#35bf28}+2.42\\%$ | | test_nested_getleaf | 24.0100μs | 7.9655μs | 125.5414 KOps/s | 123.7768 KOps/s | $\color{#35bf28}+1.43\\%$ | | test_nested_get | 31.6010μs | 7.5209μs | 132.9630 KOps/s | 131.6561 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_stacked_getleaf | 35.0200μs | 8.0052μs | 124.9183 KOps/s | 123.7503 KOps/s | $\color{#35bf28}+0.94\\%$ | | test_stacked_get | 19.6510μs | 7.5090μs | 133.1734 KOps/s | 131.6385 KOps/s | $\color{#35bf28}+1.17\\%$ | | test_nested_getitemleaf | 19.5710μs | 8.1808μs | 122.2371 KOps/s | 121.3716 KOps/s | $\color{#35bf28}+0.71\\%$ | | test_nested_getitem | 71.2510μs | 7.6814μs | 130.1850 KOps/s | 129.5107 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_stacked_getitemleaf | 28.2700μs | 8.1738μs | 122.3421 KOps/s | 121.8509 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_stacked_getitem | 35.4210μs | 7.7067μs | 129.7579 KOps/s | 129.6215 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_lock_nested | 7.1708ms | 0.4300ms | 2.3255 KOps/s | 2.3382 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_lock_stack_nested | 0.4511ms | 0.3882ms | 2.5759 KOps/s | 2.5563 KOps/s | $\color{#35bf28}+0.76\\%$ | | test_unlock_nested | 89.0052ms | 0.4305ms | 2.3229 KOps/s | 2.8917 KOps/s | $\textbf{\color{#d91a1a}-19.67\\%}$ | | test_unlock_stack_nested | 0.3342ms | 0.3083ms | 3.2439 KOps/s | 3.2365 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_flatten_speed | 0.4132ms | 0.1072ms | 9.3315 KOps/s | 9.2724 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_unflatten_speed | 0.3483ms | 0.2980ms | 3.3556 KOps/s | 3.3946 KOps/s | $\color{#d91a1a}-1.15\\%$ | | test_common_ops | 1.0069ms | 0.5944ms | 1.6824 KOps/s | 1.4615 KOps/s | $\textbf{\color{#35bf28}+15.11\\%}$ | | test_creation | 33.9010μs | 1.8636μs | 536.6002 KOps/s | 538.6355 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_creation_empty | 24.1200μs | 9.2822μs | 107.7334 KOps/s | 114.0508 KOps/s | $\textbf{\color{#d91a1a}-5.54\\%}$ | | test_creation_nested_1 | 28.3600μs | 11.3933μs | 87.7707 KOps/s | 93.8001 KOps/s | $\textbf{\color{#d91a1a}-6.43\\%}$ | | test_creation_nested_2 | 39.5110μs | 13.7165μs | 72.9051 KOps/s | 75.9646 KOps/s | $\color{#d91a1a}-4.03\\%$ | | test_clone | 81.2210μs | 10.9800μs | 91.0744 KOps/s | 84.3014 KOps/s | $\textbf{\color{#35bf28}+8.03\\%}$ | | test_getitem[int] | 24.5000μs | 10.0140μs | 99.8603 KOps/s | 93.4186 KOps/s | $\textbf{\color{#35bf28}+6.90\\%}$ | | test_getitem[slice_int] | 37.4700μs | 19.4824μs | 51.3283 KOps/s | 47.2723 KOps/s | $\textbf{\color{#35bf28}+8.58\\%}$ | | test_getitem[range] | 0.1838ms | 36.7314μs | 27.2247 KOps/s | 26.1211 KOps/s | $\color{#35bf28}+4.22\\%$ | | test_getitem[tuple] | 39.1710μs | 17.2961μs | 57.8164 KOps/s | 54.4397 KOps/s | $\textbf{\color{#35bf28}+6.20\\%}$ | | test_getitem[list] | 0.1626ms | 32.0948μs | 31.1577 KOps/s | 29.5281 KOps/s | $\textbf{\color{#35bf28}+5.52\\%}$ | | test_setitem_dim[int] | 41.3310μs | 25.6785μs | 38.9430 KOps/s | 38.3443 KOps/s | $\color{#35bf28}+1.56\\%$ | | test_setitem_dim[slice_int] | 64.5010μs | 46.6600μs | 21.4317 KOps/s | 21.3729 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_setitem_dim[range] | 85.1220μs | 63.1304μs | 15.8402 KOps/s | 15.7883 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_setitem_dim[tuple] | 57.4710μs | 40.3032μs | 24.8120 KOps/s | 24.8793 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_setitem | 83.8420μs | 16.0054μs | 62.4789 KOps/s | 58.3748 KOps/s | $\textbf{\color{#35bf28}+7.03\\%}$ | | test_set | 62.6910μs | 15.4501μs | 64.7247 KOps/s | 60.4534 KOps/s | $\textbf{\color{#35bf28}+7.07\\%}$ | | test_set_shared | 2.7345ms | 96.2525μs | 10.3893 KOps/s | 9.9768 KOps/s | $\color{#35bf28}+4.13\\%$ | | test_update | 0.1004ms | 18.7464μs | 53.3436 KOps/s | 52.1691 KOps/s | $\color{#35bf28}+2.25\\%$ | | test_update_nested | 84.5920μs | 24.2282μs | 41.2743 KOps/s | 40.0837 KOps/s | $\color{#35bf28}+2.97\\%$ | | test_update__nested | 83.5710μs | 21.7241μs | 46.0318 KOps/s | 42.8795 KOps/s | $\textbf{\color{#35bf28}+7.35\\%}$ | | test_set_nested | 79.9520μs | 16.4909μs | 60.6395 KOps/s | 55.4827 KOps/s | $\textbf{\color{#35bf28}+9.29\\%}$ | | test_set_nested_new | 65.7210μs | 19.4917μs | 51.3038 KOps/s | 48.3337 KOps/s | $\textbf{\color{#35bf28}+6.15\\%}$ | | test_select | 78.7620μs | 31.5020μs | 31.7440 KOps/s | 29.8713 KOps/s | $\textbf{\color{#35bf28}+6.27\\%}$ | | test_select_nested | 1.0682ms | 52.3718μs | 19.0942 KOps/s | 18.6780 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_exclude_nested | 0.1052ms | 72.5911μs | 13.7758 KOps/s | 13.6159 KOps/s | $\color{#35bf28}+1.17\\%$ | | test_empty[True] | 0.3375ms | 0.3031ms | 3.2989 KOps/s | 3.3907 KOps/s | $\color{#d91a1a}-2.71\\%$ | | test_empty[False] | 3.0761μs | 0.9105μs | 1.0983 MOps/s | 1.0865 MOps/s | $\color{#35bf28}+1.08\\%$ | | test_to | 87.3710μs | 58.0870μs | 17.2156 KOps/s | 16.1679 KOps/s | $\textbf{\color{#35bf28}+6.48\\%}$ | | test_to_nonblocking | 63.2910μs | 34.6336μs | 28.8737 KOps/s | 27.6241 KOps/s | $\color{#35bf28}+4.52\\%$ | | test_unbind_speed | 0.2997ms | 0.2647ms | 3.7779 KOps/s | 3.7823 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_unbind_speed_stack0 | 0.2991ms | 0.2645ms | 3.7809 KOps/s | 3.8372 KOps/s | $\color{#d91a1a}-1.46\\%$ | | test_unbind_speed_stack1 | 92.4756ms | 0.7868ms | 1.2709 KOps/s | 1.3986 KOps/s | $\textbf{\color{#d91a1a}-9.13\\%}$ | | test_split | 89.8904ms | 1.5876ms | 629.9008 Ops/s | 609.7816 Ops/s | $\color{#35bf28}+3.30\\%$ | | test_chunk | 1.4773ms | 1.4386ms | 695.1167 Ops/s | 668.4066 Ops/s | $\color{#35bf28}+4.00\\%$ | | test_creation[device0] | 0.1284ms | 55.0862μs | 18.1534 KOps/s | 17.5925 KOps/s | $\color{#35bf28}+3.19\\%$ | | test_creation_from_tensor | 0.1417ms | 52.3676μs | 19.0958 KOps/s | 18.5618 KOps/s | $\color{#35bf28}+2.88\\%$ | | test_add_one[memmap_tensor0] | 91.7710μs | 6.9745μs | 143.3793 KOps/s | 132.2033 KOps/s | $\textbf{\color{#35bf28}+8.45\\%}$ | | test_contiguous[memmap_tensor0] | 26.5300μs | 0.5960μs | 1.6780 MOps/s | 1.7077 MOps/s | $\color{#d91a1a}-1.74\\%$ | | test_stack[memmap_tensor0] | 36.6710μs | 4.4206μs | 226.2138 KOps/s | 202.0893 KOps/s | $\textbf{\color{#35bf28}+11.94\\%}$ | | test_memmaptd_index | 1.1161ms | 0.2586ms | 3.8676 KOps/s | 3.6425 KOps/s | $\textbf{\color{#35bf28}+6.18\\%}$ | | test_memmaptd_index_astensor | 0.5856ms | 0.3211ms | 3.1139 KOps/s | 2.9399 KOps/s | $\textbf{\color{#35bf28}+5.92\\%}$ | | test_memmaptd_index_op | 0.9139ms | 0.6147ms | 1.6268 KOps/s | 1.5188 KOps/s | $\textbf{\color{#35bf28}+7.11\\%}$ | | test_serialize_model | 95.3120ms | 91.4266ms | 10.9377 Ops/s | 10.4348 Ops/s | $\color{#35bf28}+4.82\\%$ | | test_serialize_model_pickle | 1.3623s | 1.2387s | 0.8073 Ops/s | 0.8085 Ops/s | $\color{#d91a1a}-0.16\\%$ | | test_serialize_weights | 92.5481ms | 88.6782ms | 11.2767 Ops/s | 10.7017 Ops/s | $\textbf{\color{#35bf28}+5.37\\%}$ | | test_serialize_weights_returnearly | 0.1918s | 73.8125ms | 13.5478 Ops/s | 13.5108 Ops/s | $\color{#35bf28}+0.27\\%$ | | test_serialize_weights_pickle | 1.3514s | 1.2480s | 0.8013 Ops/s | 0.8009 Ops/s | $\color{#35bf28}+0.05\\%$ | | test_reshape_pytree | 52.7710μs | 25.3705μs | 39.4159 KOps/s | 38.4223 KOps/s | $\color{#35bf28}+2.59\\%$ | | test_reshape_td | 87.6810μs | 29.7972μs | 33.5603 KOps/s | 32.0842 KOps/s | $\color{#35bf28}+4.60\\%$ | | test_view_pytree | 0.1243ms | 24.9976μs | 40.0039 KOps/s | 38.9885 KOps/s | $\color{#35bf28}+2.60\\%$ | | test_view_td | 0.1813ms | 36.5021μs | 27.3957 KOps/s | 27.1818 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_unbind_pytree | 46.9910μs | 30.4263μs | 32.8663 KOps/s | 31.9954 KOps/s | $\color{#35bf28}+2.72\\%$ | | test_unbind_td | 0.5033ms | 39.8765μs | 25.0774 KOps/s | 24.8415 KOps/s | $\color{#35bf28}+0.95\\%$ | | test_split_pytree | 49.7410μs | 33.0925μs | 30.2183 KOps/s | 28.9775 KOps/s | $\color{#35bf28}+4.28\\%$ | | test_split_td | 0.1029ms | 36.6722μs | 27.2686 KOps/s | 25.9127 KOps/s | $\textbf{\color{#35bf28}+5.23\\%}$ | | test_add_pytree | 60.7210μs | 36.7590μs | 27.2042 KOps/s | 25.1518 KOps/s | $\textbf{\color{#35bf28}+8.16\\%}$ | | test_add_td | 78.1820μs | 52.1021μs | 19.1931 KOps/s | 18.9588 KOps/s | $\color{#35bf28}+1.24\\%$ | | test_distributed | 0.2236ms | 69.2519μs | 14.4400 KOps/s | 14.1018 KOps/s | $\color{#35bf28}+2.40\\%$ | | test_tdmodule | 39.2610μs | 13.9599μs | 71.6339 KOps/s | 71.2649 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_tdmodule_dispatch | 45.6210μs | 28.8091μs | 34.7113 KOps/s | 35.2358 KOps/s | $\color{#d91a1a}-1.49\\%$ | | test_tdseq | 30.5500μs | 15.2767μs | 65.4593 KOps/s | 66.6815 KOps/s | $\color{#d91a1a}-1.83\\%$ | | test_tdseq_dispatch | 51.2210μs | 31.4194μs | 31.8275 KOps/s | 32.4391 KOps/s | $\color{#d91a1a}-1.89\\%$ | | test_instantiation_functorch | 1.4492ms | 1.3675ms | 731.2542 Ops/s | 716.8012 Ops/s | $\color{#35bf28}+2.02\\%$ | | test_instantiation_td | 92.4157ms | 1.0822ms | 924.0252 Ops/s | 903.4073 Ops/s | $\color{#35bf28}+2.28\\%$ | | test_exec_functorch | 0.3331ms | 0.1484ms | 6.7391 KOps/s | 6.6578 KOps/s | $\color{#35bf28}+1.22\\%$ | | test_exec_functional_call | 0.1736ms | 0.1351ms | 7.4041 KOps/s | 7.2421 KOps/s | $\color{#35bf28}+2.24\\%$ | | test_exec_td | 0.1706ms | 0.1328ms | 7.5288 KOps/s | 7.3047 KOps/s | $\color{#35bf28}+3.07\\%$ | | test_exec_td_decorator | 0.7928ms | 0.2059ms | 4.8573 KOps/s | 4.8266 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_vmap_mlp_speed[True-True] | 0.7819ms | 0.5758ms | 1.7367 KOps/s | 1.7273 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_vmap_mlp_speed[True-False] | 0.6284ms | 0.5754ms | 1.7379 KOps/s | 1.7344 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_vmap_mlp_speed[False-True] | 0.5540ms | 0.5074ms | 1.9707 KOps/s | 1.9519 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_vmap_mlp_speed[False-False] | 0.5701ms | 0.5090ms | 1.9645 KOps/s | 1.9532 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1245ms | 0.6513ms | 1.5354 KOps/s | 1.5377 KOps/s | $\color{#d91a1a}-0.15\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.7982ms | 0.6492ms | 1.5402 KOps/s | 1.5363 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7302ms | 0.5773ms | 1.7321 KOps/s | 1.7170 KOps/s | $\color{#35bf28}+0.88\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7017ms | 0.5671ms | 1.7633 KOps/s | 1.7363 KOps/s | $\color{#35bf28}+1.55\\%$ | | test_vmap_transformer_speed[True-True] | 8.3295ms | 7.9524ms | 125.7480 Ops/s | 128.4583 Ops/s | $\color{#d91a1a}-2.11\\%$ | | test_vmap_transformer_speed[True-False] | 8.1217ms | 7.8071ms | 128.0886 Ops/s | 128.4898 Ops/s | $\color{#d91a1a}-0.31\\%$ | | test_vmap_transformer_speed[False-True] | 8.3509ms | 7.7586ms | 128.8897 Ops/s | 129.8980 Ops/s | $\color{#d91a1a}-0.78\\%$ | | test_vmap_transformer_speed[False-False] | 8.1513ms | 7.7487ms | 129.0541 Ops/s | 130.0662 Ops/s | $\color{#d91a1a}-0.78\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 19.6389ms | 19.2826ms | 51.8602 Ops/s | 52.2955 Ops/s | $\color{#d91a1a}-0.83\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 19.6471ms | 19.3008ms | 51.8114 Ops/s | 52.0380 Ops/s | $\color{#d91a1a}-0.44\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 20.2323ms | 19.1647ms | 52.1792 Ops/s | 52.3089 Ops/s | $\color{#d91a1a}-0.25\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 19.4441ms | 19.1563ms | 52.2021 Ops/s | 52.4109 Ops/s | $\color{#d91a1a}-0.40\\%$ | | test_to_module_speed[True] | 2.0821ms | 1.5277ms | 654.5918 Ops/s | 667.5826 Ops/s | $\color{#d91a1a}-1.95\\%$ | | test_to_module_speed[False] | 1.7309ms | 1.5008ms | 666.2897 Ops/s | 672.4107 Ops/s | $\color{#d91a1a}-0.91\\%$ | | test_tc_init | 0.1599ms | 34.5621μs | 28.9334 KOps/s | 30.5728 KOps/s | $\textbf{\color{#d91a1a}-5.36\\%}$ | | test_tc_init_nested | 0.2009ms | 70.1830μs | 14.2485 KOps/s | 14.7881 KOps/s | $\color{#d91a1a}-3.65\\%$ | | test_tc_first_layer_tensor | 0.1242ms | 3.5739μs | 279.8089 KOps/s | 284.1301 KOps/s | $\color{#d91a1a}-1.52\\%$ | | test_tc_first_layer_nontensor | 0.1172ms | 3.6005μs | 277.7412 KOps/s | 282.5440 KOps/s | $\color{#d91a1a}-1.70\\%$ | | test_tc_second_layer_tensor | 25.8184μs | 1.1332μs | 882.4828 KOps/s | 904.7765 KOps/s | $\color{#d91a1a}-2.46\\%$ | | test_tc_second_layer_nontensor | 0.1267ms | 4.0975μs | 244.0525 KOps/s | 247.2206 KOps/s | $\color{#d91a1a}-1.28\\%$ |