pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[Performance] Faster tensorclass set #880

Closed vmoens closed 1 month ago

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 133. Improved: $\large\color{#35bf28}29$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ------------------------------------ | | test_plain_set_nested | 36.5680μs | 16.4427μs | 60.8174 KOps/s | 57.0725 KOps/s | $\textbf{\color{#35bf28}+6.56\\%}$ | | test_plain_set_stack_nested | 44.5230μs | 16.7501μs | 59.7013 KOps/s | 57.5169 KOps/s | $\color{#35bf28}+3.80\\%$ | | test_plain_set_nested_inplace | 50.2040μs | 18.5411μs | 53.9344 KOps/s | 51.9336 KOps/s | $\color{#35bf28}+3.85\\%$ | | test_plain_set_stack_nested_inplace | 52.3080μs | 18.5047μs | 54.0404 KOps/s | 51.8639 KOps/s | $\color{#35bf28}+4.20\\%$ | | test_items | 22.5820μs | 2.6250μs | 380.9463 KOps/s | 375.6204 KOps/s | $\color{#35bf28}+1.42\\%$ | | test_items_nested | 1.6499ms | 0.3669ms | 2.7258 KOps/s | 2.7369 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_items_nested_locked | 0.8026ms | 0.3667ms | 2.7271 KOps/s | 2.7113 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_items_nested_leaf | 0.1482ms | 85.9318μs | 11.6371 KOps/s | 11.6459 KOps/s | $\color{#d91a1a}-0.08\\%$ | | test_items_stack_nested | 0.6719ms | 0.3721ms | 2.6872 KOps/s | 2.7231 KOps/s | $\color{#d91a1a}-1.32\\%$ | | test_items_stack_nested_leaf | 0.1562ms | 85.3369μs | 11.7183 KOps/s | 11.9082 KOps/s | $\color{#d91a1a}-1.60\\%$ | | test_items_stack_nested_locked | 0.6933ms | 0.3710ms | 2.6957 KOps/s | 2.7353 KOps/s | $\color{#d91a1a}-1.45\\%$ | | test_keys | 48.8110μs | 3.9720μs | 251.7620 KOps/s | 254.0530 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_keys_nested | 0.3092ms | 0.1457ms | 6.8645 KOps/s | 6.8826 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_keys_nested_locked | 2.1192ms | 0.1518ms | 6.5863 KOps/s | 6.6413 KOps/s | $\color{#d91a1a}-0.83\\%$ | | test_keys_nested_leaf | 0.2328ms | 0.1241ms | 8.0595 KOps/s | 8.1027 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_keys_stack_nested | 0.2506ms | 0.1450ms | 6.8958 KOps/s | 6.9860 KOps/s | $\color{#d91a1a}-1.29\\%$ | | test_keys_stack_nested_leaf | 0.2374ms | 0.1238ms | 8.0745 KOps/s | 8.2219 KOps/s | $\color{#d91a1a}-1.79\\%$ | | test_keys_stack_nested_locked | 0.2621ms | 0.1504ms | 6.6508 KOps/s | 6.7671 KOps/s | $\color{#d91a1a}-1.72\\%$ | | test_values | 7.1985μs | 1.2333μs | 810.8359 KOps/s | 844.6142 KOps/s | $\color{#d91a1a}-4.00\\%$ | | test_values_nested | 0.1124ms | 49.2816μs | 20.2916 KOps/s | 20.1597 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_values_nested_locked | 0.1472ms | 49.8826μs | 20.0471 KOps/s | 20.3494 KOps/s | $\color{#d91a1a}-1.49\\%$ | | test_values_nested_leaf | 84.6480μs | 45.1046μs | 22.1707 KOps/s | 22.4476 KOps/s | $\color{#d91a1a}-1.23\\%$ | | test_values_stack_nested | 0.1427ms | 49.5468μs | 20.1829 KOps/s | 19.5765 KOps/s | $\color{#35bf28}+3.10\\%$ | | test_values_stack_nested_leaf | 0.1122ms | 44.1921μs | 22.6285 KOps/s | 22.7514 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_values_stack_nested_locked | 90.2580μs | 49.5433μs | 20.1843 KOps/s | 19.7605 KOps/s | $\color{#35bf28}+2.14\\%$ | | test_membership | 4.5256μs | 0.7367μs | 1.3573 MOps/s | 1.0897 MOps/s | $\textbf{\color{#35bf28}+24.56\\%}$ | | test_membership_nested | 45.0640μs | 2.6547μs | 376.6916 KOps/s | 334.8930 KOps/s | $\textbf{\color{#35bf28}+12.48\\%}$ | | test_membership_nested_leaf | 24.8170μs | 2.7081μs | 369.2583 KOps/s | 368.0026 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_membership_stacked_nested | 31.0880μs | 2.6676μs | 374.8625 KOps/s | 369.9455 KOps/s | $\color{#35bf28}+1.33\\%$ | | test_membership_stacked_nested_leaf | 18.3040μs | 2.7147μs | 368.3654 KOps/s | 370.8934 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_membership_nested_last | 36.4080μs | 4.0026μs | 249.8363 KOps/s | 250.2794 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_membership_nested_leaf_last | 31.1180μs | 4.0094μs | 249.4111 KOps/s | 244.5662 KOps/s | $\color{#35bf28}+1.98\\%$ | | test_membership_stacked_nested_last | 30.7080μs | 3.9566μs | 252.7404 KOps/s | 78.4059 KOps/s | $\textbf{\color{#35bf28}+222.35\\%}$ | | test_membership_stacked_nested_leaf_last | 28.9640μs | 4.0196μs | 248.7803 KOps/s | 77.5393 KOps/s | $\textbf{\color{#35bf28}+220.84\\%}$ | | test_nested_getleaf | 35.8570μs | 10.9676μs | 91.1779 KOps/s | 92.1690 KOps/s | $\color{#d91a1a}-1.08\\%$ | | test_nested_get | 38.5820μs | 10.3522μs | 96.5981 KOps/s | 98.4252 KOps/s | $\color{#d91a1a}-1.86\\%$ | | test_stacked_getleaf | 48.4200μs | 10.9302μs | 91.4900 KOps/s | 94.0127 KOps/s | $\color{#d91a1a}-2.68\\%$ | | test_stacked_get | 36.7780μs | 10.3663μs | 96.4663 KOps/s | 98.3106 KOps/s | $\color{#d91a1a}-1.88\\%$ | | test_nested_getitemleaf | 37.7510μs | 11.3697μs | 87.9532 KOps/s | 89.2116 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_nested_getitem | 38.9130μs | 10.6048μs | 94.2967 KOps/s | 96.7293 KOps/s | $\color{#d91a1a}-2.51\\%$ | | test_stacked_getitemleaf | 50.5960μs | 11.4027μs | 87.6988 KOps/s | 89.0341 KOps/s | $\color{#d91a1a}-1.50\\%$ | | test_stacked_getitem | 32.4810μs | 10.5830μs | 94.4913 KOps/s | 98.2178 KOps/s | $\color{#d91a1a}-3.79\\%$ | | test_lock_nested | 3.4835ms | 0.4419ms | 2.2630 KOps/s | 2.2845 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_lock_stack_nested | 0.7341ms | 0.4101ms | 2.4383 KOps/s | 2.5299 KOps/s | $\color{#d91a1a}-3.62\\%$ | | test_unlock_nested | 0.7243ms | 0.3593ms | 2.7829 KOps/s | 2.3652 KOps/s | $\textbf{\color{#35bf28}+17.66\\%}$ | | test_unlock_stack_nested | 0.5388ms | 0.3268ms | 3.0603 KOps/s | 3.2324 KOps/s | $\textbf{\color{#d91a1a}-5.32\\%}$ | | test_flatten_speed | 0.2373ms | 0.1050ms | 9.5252 KOps/s | 9.6057 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_unflatten_speed | 0.7708ms | 0.4425ms | 2.2598 KOps/s | 2.2702 KOps/s | $\color{#d91a1a}-0.45\\%$ | | test_common_ops | 5.2313ms | 0.7163ms | 1.3961 KOps/s | 1.2824 KOps/s | $\textbf{\color{#35bf28}+8.86\\%}$ | | test_creation | 70.9820μs | 2.2993μs | 434.9102 KOps/s | 434.9955 KOps/s | $\color{#d91a1a}-0.02\\%$ | | test_creation_empty | 51.1950μs | 9.3236μs | 107.2547 KOps/s | 87.1683 KOps/s | $\textbf{\color{#35bf28}+23.04\\%}$ | | test_creation_nested_1 | 38.9520μs | 12.6059μs | 79.3280 KOps/s | 67.8378 KOps/s | $\textbf{\color{#35bf28}+16.94\\%}$ | | test_creation_nested_2 | 44.0820μs | 16.0634μs | 62.2533 KOps/s | 55.4781 KOps/s | $\textbf{\color{#35bf28}+12.21\\%}$ | | test_clone | 69.2090μs | 13.2030μs | 75.7401 KOps/s | 78.0975 KOps/s | $\color{#d91a1a}-3.02\\%$ | | test_getitem[int] | 37.8610μs | 11.7856μs | 84.8492 KOps/s | 87.1455 KOps/s | $\color{#d91a1a}-2.64\\%$ | | test_getitem[slice_int] | 76.0520μs | 23.9848μs | 41.6931 KOps/s | 43.4937 KOps/s | $\color{#d91a1a}-4.14\\%$ | | test_getitem[range] | 0.1975ms | 44.3853μs | 22.5300 KOps/s | 21.5570 KOps/s | $\color{#35bf28}+4.51\\%$ | | test_getitem[tuple] | 56.3750μs | 19.4061μs | 51.5303 KOps/s | 52.7721 KOps/s | $\color{#d91a1a}-2.35\\%$ | | test_getitem[list] | 0.1564ms | 39.8530μs | 25.0922 KOps/s | 24.3392 KOps/s | $\color{#35bf28}+3.09\\%$ | | test_setitem_dim[int] | 56.9660μs | 29.4446μs | 33.9621 KOps/s | 30.5968 KOps/s | $\textbf{\color{#35bf28}+11.00\\%}$ | | test_setitem_dim[slice_int] | 0.1120ms | 56.7043μs | 17.6354 KOps/s | 16.0631 KOps/s | $\textbf{\color{#35bf28}+9.79\\%}$ | | test_setitem_dim[range] | 0.1133ms | 76.5954μs | 13.0556 KOps/s | 12.0372 KOps/s | $\textbf{\color{#35bf28}+8.46\\%}$ | | test_setitem_dim[tuple] | 69.1790μs | 45.4444μs | 22.0049 KOps/s | 20.0201 KOps/s | $\textbf{\color{#35bf28}+9.91\\%}$ | | test_setitem | 82.3340μs | 18.8865μs | 52.9480 KOps/s | 50.8290 KOps/s | $\color{#35bf28}+4.17\\%$ | | test_set | 91.2800μs | 18.3934μs | 54.3675 KOps/s | 51.0313 KOps/s | $\textbf{\color{#35bf28}+6.54\\%}$ | | test_set_shared | 2.3313ms | 0.1669ms | 5.9934 KOps/s | 6.0280 KOps/s | $\color{#d91a1a}-0.57\\%$ | | test_update | 0.1320ms | 20.2026μs | 49.4985 KOps/s | 44.3283 KOps/s | $\textbf{\color{#35bf28}+11.66\\%}$ | | test_update_nested | 0.1454ms | 30.0125μs | 33.3195 KOps/s | 31.1137 KOps/s | $\textbf{\color{#35bf28}+7.09\\%}$ | | test_update__nested | 83.0550μs | 25.0295μs | 39.9529 KOps/s | 40.4570 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_set_nested | 0.1529ms | 19.9493μs | 50.1271 KOps/s | 46.5141 KOps/s | $\textbf{\color{#35bf28}+7.77\\%}$ | | test_set_nested_new | 0.1316ms | 24.8988μs | 40.1626 KOps/s | 38.1614 KOps/s | $\textbf{\color{#35bf28}+5.24\\%}$ | | test_select | 0.1658ms | 40.7220μs | 24.5568 KOps/s | 23.9314 KOps/s | $\color{#35bf28}+2.61\\%$ | | test_select_nested | 0.1306ms | 61.9022μs | 16.1545 KOps/s | 17.0040 KOps/s | $\color{#d91a1a}-5.00\\%$ | | test_exclude_nested | 0.1437ms | 81.7474μs | 12.2328 KOps/s | 12.8058 KOps/s | $\color{#d91a1a}-4.47\\%$ | | test_empty[True] | 0.4070ms | 0.3412ms | 2.9310 KOps/s | 2.9905 KOps/s | $\color{#d91a1a}-1.99\\%$ | | test_empty[False] | 11.6192μs | 1.3142μs | 760.9007 KOps/s | 775.2453 KOps/s | $\color{#d91a1a}-1.85\\%$ | | test_unbind_speed | 0.3349ms | 0.2596ms | 3.8523 KOps/s | 3.8981 KOps/s | $\color{#d91a1a}-1.18\\%$ | | test_unbind_speed_stack0 | 0.3600ms | 0.2582ms | 3.8727 KOps/s | 4.0297 KOps/s | $\color{#d91a1a}-3.89\\%$ | | test_unbind_speed_stack1 | 80.8449ms | 0.7617ms | 1.3129 KOps/s | 1.5107 KOps/s | $\textbf{\color{#d91a1a}-13.10\\%}$ | | test_split | 77.9265ms | 1.6480ms | 606.8031 Ops/s | 620.4178 Ops/s | $\color{#d91a1a}-2.19\\%$ | | test_chunk | 75.7063ms | 1.6539ms | 604.6136 Ops/s | 617.7223 Ops/s | $\color{#d91a1a}-2.12\\%$ | | test_creation[device0] | 4.3493ms | 95.3848μs | 10.4838 KOps/s | 10.6288 KOps/s | $\color{#d91a1a}-1.36\\%$ | | test_creation_from_tensor | 0.2532ms | 96.7175μs | 10.3394 KOps/s | 10.4599 KOps/s | $\color{#d91a1a}-1.15\\%$ | | test_add_one[memmap_tensor0] | 0.1986ms | 5.5040μs | 181.6852 KOps/s | 182.0623 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_contiguous[memmap_tensor0] | 18.2040μs | 0.6353μs | 1.5741 MOps/s | 1.5394 MOps/s | $\color{#35bf28}+2.25\\%$ | | test_stack[memmap_tensor0] | 43.3910μs | 3.6931μs | 270.7787 KOps/s | 269.6970 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_memmaptd_index | 0.9684ms | 0.2528ms | 3.9551 KOps/s | 3.8869 KOps/s | $\color{#35bf28}+1.75\\%$ | | test_memmaptd_index_astensor | 0.8899ms | 0.3269ms | 3.0592 KOps/s | 3.0229 KOps/s | $\color{#35bf28}+1.20\\%$ | | test_memmaptd_index_op | 1.3610ms | 0.5785ms | 1.7286 KOps/s | 1.6251 KOps/s | $\textbf{\color{#35bf28}+6.36\\%}$ | | test_serialize_model | 0.1287s | 0.1217s | 8.2168 Ops/s | 7.1286 Ops/s | $\textbf{\color{#35bf28}+15.27\\%}$ | | test_serialize_model_pickle | 0.4600s | 0.3912s | 2.5562 Ops/s | 2.5418 Ops/s | $\color{#35bf28}+0.57\\%$ | | test_serialize_weights | 0.2079s | 0.1333s | 7.5027 Ops/s | 8.0350 Ops/s | $\textbf{\color{#d91a1a}-6.62\\%}$ | | test_serialize_weights_returnearly | 0.1721s | 0.1610s | 6.2113 Ops/s | 6.0853 Ops/s | $\color{#35bf28}+2.07\\%$ | | test_serialize_weights_pickle | 0.4617s | 0.4129s | 2.4217 Ops/s | 2.5344 Ops/s | $\color{#d91a1a}-4.45\\%$ | | test_serialize_weights_filesystem | 0.1522s | 0.1420s | 7.0414 Ops/s | 6.9212 Ops/s | $\color{#35bf28}+1.74\\%$ | | test_serialize_model_filesystem | 0.1609s | 0.1523s | 6.5649 Ops/s | 6.5889 Ops/s | $\color{#d91a1a}-0.36\\%$ | | test_reshape_pytree | 70.6410μs | 25.8727μs | 38.6508 KOps/s | 38.9750 KOps/s | $\color{#d91a1a}-0.83\\%$ | | test_reshape_td | 78.0560μs | 34.2945μs | 29.1592 KOps/s | 30.0694 KOps/s | $\color{#d91a1a}-3.03\\%$ | | test_view_pytree | 92.3890μs | 25.6274μs | 39.0207 KOps/s | 38.5869 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_view_td | 0.1309ms | 40.3641μs | 24.7745 KOps/s | 26.0369 KOps/s | $\color{#d91a1a}-4.85\\%$ | | test_unbind_pytree | 80.2100μs | 29.8295μs | 33.5238 KOps/s | 34.2462 KOps/s | $\color{#d91a1a}-2.11\\%$ | | test_unbind_td | 0.3550ms | 38.5228μs | 25.9587 KOps/s | 26.4676 KOps/s | $\color{#d91a1a}-1.92\\%$ | | test_split_pytree | 61.3350μs | 29.5111μs | 33.8855 KOps/s | 34.2265 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_split_td | 0.4707ms | 39.9424μs | 25.0360 KOps/s | 25.3058 KOps/s | $\color{#d91a1a}-1.07\\%$ | | test_add_pytree | 83.4350μs | 35.4789μs | 28.1858 KOps/s | 28.9087 KOps/s | $\color{#d91a1a}-2.50\\%$ | | test_add_td | 0.1138ms | 51.9229μs | 19.2593 KOps/s | 17.8895 KOps/s | $\textbf{\color{#35bf28}+7.66\\%}$ | | test_distributed | 0.2678ms | 0.1310ms | 7.6360 KOps/s | 7.5354 KOps/s | $\color{#35bf28}+1.34\\%$ | | test_tdmodule | 28.5640μs | 15.2074μs | 65.7573 KOps/s | 58.8963 KOps/s | $\textbf{\color{#35bf28}+11.65\\%}$ | | test_tdmodule_dispatch | 62.7480μs | 32.3615μs | 30.9009 KOps/s | 28.3202 KOps/s | $\textbf{\color{#35bf28}+9.11\\%}$ | | test_tdseq | 36.6690μs | 16.9253μs | 59.0831 KOps/s | 52.2886 KOps/s | $\textbf{\color{#35bf28}+12.99\\%}$ | | test_tdseq_dispatch | 71.3330μs | 35.7599μs | 27.9643 KOps/s | 24.9445 KOps/s | $\textbf{\color{#35bf28}+12.11\\%}$ | | test_instantiation_functorch | 1.6107ms | 1.3212ms | 756.9161 Ops/s | 749.0238 Ops/s | $\color{#35bf28}+1.05\\%$ | | test_instantiation_td | 1.6382ms | 1.0375ms | 963.8401 Ops/s | 890.2906 Ops/s | $\textbf{\color{#35bf28}+8.26\\%}$ | | test_exec_functorch | 0.2864ms | 0.1674ms | 5.9735 KOps/s | 6.2156 KOps/s | $\color{#d91a1a}-3.89\\%$ | | test_exec_functional_call | 0.2824ms | 0.1517ms | 6.5918 KOps/s | 6.6425 KOps/s | $\color{#d91a1a}-0.76\\%$ | | test_exec_td | 0.2406ms | 0.1510ms | 6.6236 KOps/s | 6.7799 KOps/s | $\color{#d91a1a}-2.31\\%$ | | test_exec_td_decorator | 0.5210ms | 0.2372ms | 4.2157 KOps/s | 4.3415 KOps/s | $\color{#d91a1a}-2.90\\%$ | | test_vmap_mlp_speed[True-True] | 0.6865ms | 0.4800ms | 2.0832 KOps/s | 2.0493 KOps/s | $\color{#35bf28}+1.65\\%$ | | test_vmap_mlp_speed[True-False] | 0.8128ms | 0.4787ms | 2.0888 KOps/s | 2.0600 KOps/s | $\color{#35bf28}+1.40\\%$ | | test_vmap_mlp_speed[False-True] | 0.6767ms | 0.3984ms | 2.5099 KOps/s | 2.5217 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_vmap_mlp_speed[False-False] | 0.5969ms | 0.3996ms | 2.5027 KOps/s | 2.5204 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1672ms | 0.5780ms | 1.7301 KOps/s | 1.7176 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8237ms | 0.5807ms | 1.7220 KOps/s | 1.7394 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7912ms | 0.4751ms | 2.1047 KOps/s | 2.1320 KOps/s | $\color{#d91a1a}-1.28\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7863ms | 0.4816ms | 2.0763 KOps/s | 2.1226 KOps/s | $\color{#d91a1a}-2.18\\%$ | | test_to_module_speed[True] | 2.4602ms | 1.8351ms | 544.9220 Ops/s | 560.9202 Ops/s | $\color{#d91a1a}-2.85\\%$ | | test_to_module_speed[False] | 2.8385ms | 1.7848ms | 560.2867 Ops/s | 569.6100 Ops/s | $\color{#d91a1a}-1.64\\%$ | | test_tc_init | 98.5840μs | 33.4518μs | 29.8937 KOps/s | 17.8970 KOps/s | $\textbf{\color{#35bf28}+67.03\\%}$ | | test_tc_init_nested | 0.1275ms | 68.8372μs | 14.5270 KOps/s | 8.6041 KOps/s | $\textbf{\color{#35bf28}+68.84\\%}$ | | test_tc_first_layer_tensor | 45.8960μs | 8.0951μs | 123.5322 KOps/s | 120.8341 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_tc_first_layer_nontensor | 35.8160μs | 8.0456μs | 124.2913 KOps/s | 120.8220 KOps/s | $\color{#35bf28}+2.87\\%$ | | test_tc_second_layer_tensor | 40.3350μs | 2.5023μs | 399.6248 KOps/s | 412.2442 KOps/s | $\color{#d91a1a}-3.06\\%$ | | test_tc_second_layer_nontensor | 37.3490μs | 9.0181μs | 110.8879 KOps/s | 109.6898 KOps/s | $\color{#35bf28}+1.09\\%$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 141. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}21$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 72.3483ms | 16.2061μs | 61.7052 KOps/s | 84.9381 KOps/s | $\textbf{\color{#d91a1a}-27.35\\%}$ | | test_plain_set_stack_nested | 29.6700μs | 13.0026μs | 76.9077 KOps/s | 84.7436 KOps/s | $\textbf{\color{#d91a1a}-9.25\\%}$ | | test_plain_set_nested_inplace | 29.8800μs | 13.5403μs | 73.8537 KOps/s | 77.6066 KOps/s | $\color{#d91a1a}-4.84\\%$ | | test_plain_set_stack_nested_inplace | 36.5900μs | 13.5905μs | 73.5805 KOps/s | 77.8230 KOps/s | $\textbf{\color{#d91a1a}-5.45\\%}$ | | test_items | 19.0500μs | 4.7778μs | 209.3008 KOps/s | 210.7509 KOps/s | $\color{#d91a1a}-0.69\\%$ | | test_items_nested | 0.4634ms | 0.3935ms | 2.5416 KOps/s | 2.5398 KOps/s | $\color{#35bf28}+0.07\\%$ | | test_items_nested_locked | 0.4772ms | 0.3949ms | 2.5323 KOps/s | 2.5103 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_items_nested_leaf | 0.1144ms | 87.0381μs | 11.4892 KOps/s | 11.6094 KOps/s | $\color{#d91a1a}-1.04\\%$ | | test_items_stack_nested | 0.4924ms | 0.3996ms | 2.5026 KOps/s | 2.5250 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_items_stack_nested_leaf | 0.1129ms | 85.9541μs | 11.6341 KOps/s | 11.5961 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_items_stack_nested_locked | 0.4824ms | 0.3976ms | 2.5154 KOps/s | 2.5289 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_keys | 19.7800μs | 4.3852μs | 228.0384 KOps/s | 229.5498 KOps/s | $\color{#d91a1a}-0.66\\%$ | | test_keys_nested | 93.6610μs | 68.5627μs | 14.5852 KOps/s | 14.7949 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_keys_nested_locked | 2.0834ms | 75.5931μs | 13.2287 KOps/s | 13.5152 KOps/s | $\color{#d91a1a}-2.12\\%$ | | test_keys_nested_leaf | 87.3510μs | 57.7691μs | 17.3103 KOps/s | 17.1791 KOps/s | $\color{#35bf28}+0.76\\%$ | | test_keys_stack_nested | 86.9810μs | 67.3466μs | 14.8486 KOps/s | 14.6798 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_keys_stack_nested_leaf | 75.2210μs | 57.1981μs | 17.4831 KOps/s | 17.0538 KOps/s | $\color{#35bf28}+2.52\\%$ | | test_keys_stack_nested_locked | 99.6620μs | 72.4353μs | 13.8054 KOps/s | 13.4693 KOps/s | $\color{#35bf28}+2.50\\%$ | | test_values | 7.3633μs | 1.7595μs | 568.3308 KOps/s | 572.6121 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_values_nested | 47.6410μs | 34.3500μs | 29.1120 KOps/s | 29.0164 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_values_nested_locked | 58.2110μs | 36.1995μs | 27.6247 KOps/s | 27.3766 KOps/s | $\color{#35bf28}+0.91\\%$ | | test_values_nested_leaf | 0.5280ms | 30.5224μs | 32.7628 KOps/s | 32.5515 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_values_stack_nested | 54.3700μs | 35.0870μs | 28.5006 KOps/s | 28.3969 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_values_stack_nested_leaf | 52.5810μs | 31.1516μs | 32.1010 KOps/s | 31.7247 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_values_stack_nested_locked | 54.3710μs | 36.7451μs | 27.2145 KOps/s | 27.1170 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_membership | 1.2340μs | 0.5556μs | 1.7998 MOps/s | 1.8781 MOps/s | $\color{#d91a1a}-4.17\\%$ | | test_membership_nested | 15.3300μs | 2.0887μs | 478.7612 KOps/s | 479.2769 KOps/s | $\color{#d91a1a}-0.11\\%$ | | test_membership_nested_leaf | 10.3800μs | 2.0057μs | 498.5733 KOps/s | 494.4746 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_membership_stacked_nested | 15.5400μs | 2.0857μs | 479.4552 KOps/s | 480.4516 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_membership_stacked_nested_leaf | 24.2100μs | 2.0744μs | 482.0559 KOps/s | 481.8150 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_membership_nested_last | 21.1800μs | 2.9743μs | 336.2175 KOps/s | 337.9324 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_membership_nested_leaf_last | 16.9700μs | 3.0297μs | 330.0707 KOps/s | 329.3451 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_membership_stacked_nested_last | 28.2710μs | 9.1980μs | 108.7187 KOps/s | 334.0949 KOps/s | $\textbf{\color{#d91a1a}-67.46\\%}$ | | test_membership_stacked_nested_leaf_last | 36.6910μs | 9.1629μs | 109.1356 KOps/s | 329.8273 KOps/s | $\textbf{\color{#d91a1a}-66.91\\%}$ | | test_nested_getleaf | 26.5610μs | 8.0147μs | 124.7710 KOps/s | 124.4205 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_nested_get | 22.8600μs | 7.5312μs | 132.7817 KOps/s | 132.3651 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_stacked_getleaf | 29.6910μs | 8.0245μs | 124.6190 KOps/s | 124.4203 KOps/s | $\color{#35bf28}+0.16\\%$ | | test_stacked_get | 0.1928ms | 7.5401μs | 132.6241 KOps/s | 132.7370 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_nested_getitemleaf | 22.5800μs | 8.2482μs | 121.2382 KOps/s | 122.4789 KOps/s | $\color{#d91a1a}-1.01\\%$ | | test_nested_getitem | 23.0800μs | 7.7018μs | 129.8400 KOps/s | 129.6281 KOps/s | $\color{#35bf28}+0.16\\%$ | | test_stacked_getitemleaf | 31.8300μs | 8.1955μs | 122.0179 KOps/s | 122.4285 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_stacked_getitem | 23.3000μs | 7.6928μs | 129.9921 KOps/s | 130.1241 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_lock_nested | 4.0169ms | 0.4225ms | 2.3666 KOps/s | 2.4132 KOps/s | $\color{#d91a1a}-1.93\\%$ | | test_lock_stack_nested | 0.4011ms | 0.3750ms | 2.6669 KOps/s | 2.6189 KOps/s | $\color{#35bf28}+1.83\\%$ | | test_unlock_nested | 88.1824ms | 0.4248ms | 2.3543 KOps/s | 3.0033 KOps/s | $\textbf{\color{#d91a1a}-21.61\\%}$ | | test_unlock_stack_nested | 0.3205ms | 0.2932ms | 3.4105 KOps/s | 3.3373 KOps/s | $\color{#35bf28}+2.19\\%$ | | test_flatten_speed | 0.3496ms | 0.1066ms | 9.3836 KOps/s | 9.4314 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_unflatten_speed | 0.3183ms | 0.2878ms | 3.4746 KOps/s | 3.4212 KOps/s | $\color{#35bf28}+1.56\\%$ | | test_common_ops | 0.9630ms | 0.5616ms | 1.7807 KOps/s | 1.6346 KOps/s | $\textbf{\color{#35bf28}+8.94\\%}$ | | test_creation | 15.5400μs | 1.8588μs | 537.9711 KOps/s | 533.0910 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_creation_empty | 23.5510μs | 8.8128μs | 113.4718 KOps/s | 133.1204 KOps/s | $\textbf{\color{#d91a1a}-14.76\\%}$ | | test_creation_nested_1 | 29.6900μs | 10.6035μs | 94.3087 KOps/s | 109.2056 KOps/s | $\textbf{\color{#d91a1a}-13.64\\%}$ | | test_creation_nested_2 | 28.9400μs | 12.9554μs | 77.1878 KOps/s | 86.0788 KOps/s | $\textbf{\color{#d91a1a}-10.33\\%}$ | | test_clone | 57.0610μs | 10.9108μs | 91.6520 KOps/s | 91.6961 KOps/s | $\color{#d91a1a}-0.05\\%$ | | test_getitem[int] | 28.6810μs | 10.0680μs | 99.3250 KOps/s | 97.3141 KOps/s | $\color{#35bf28}+2.07\\%$ | | test_getitem[slice_int] | 43.9510μs | 19.5044μs | 51.2705 KOps/s | 51.2673 KOps/s | $+0.01\\%$ | | test_getitem[range] | 0.1574ms | 38.0440μs | 26.2853 KOps/s | 26.8312 KOps/s | $\color{#d91a1a}-2.03\\%$ | | test_getitem[tuple] | 33.8600μs | 17.4255μs | 57.3872 KOps/s | 56.9187 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_getitem[list] | 0.1619ms | 31.2780μs | 31.9714 KOps/s | 31.4708 KOps/s | $\color{#35bf28}+1.59\\%$ | | test_setitem_dim[int] | 59.4410μs | 24.5652μs | 40.7080 KOps/s | 43.3845 KOps/s | $\textbf{\color{#d91a1a}-6.17\\%}$ | | test_setitem_dim[slice_int] | 73.5510μs | 44.8001μs | 22.3214 KOps/s | 22.1633 KOps/s | $\color{#35bf28}+0.71\\%$ | | test_setitem_dim[range] | 94.5110μs | 60.7692μs | 16.4557 KOps/s | 16.8510 KOps/s | $\color{#d91a1a}-2.35\\%$ | | test_setitem_dim[tuple] | 65.3010μs | 38.9330μs | 25.6851 KOps/s | 26.0204 KOps/s | $\color{#d91a1a}-1.29\\%$ | | test_setitem | 65.1310μs | 15.3387μs | 65.1948 KOps/s | 68.9567 KOps/s | $\textbf{\color{#d91a1a}-5.46\\%}$ | | test_set | 67.0610μs | 14.8874μs | 67.1707 KOps/s | 70.8883 KOps/s | $\textbf{\color{#d91a1a}-5.24\\%}$ | | test_set_shared | 2.7831ms | 95.4996μs | 10.4712 KOps/s | 10.1596 KOps/s | $\color{#35bf28}+3.07\\%$ | | test_update | 84.7410μs | 17.5662μs | 56.9274 KOps/s | 61.8422 KOps/s | $\textbf{\color{#d91a1a}-7.95\\%}$ | | test_update_nested | 92.5210μs | 23.1134μs | 43.2649 KOps/s | 47.0844 KOps/s | $\textbf{\color{#d91a1a}-8.11\\%}$ | | test_update__nested | 79.2810μs | 21.3643μs | 46.8070 KOps/s | 47.5971 KOps/s | $\color{#d91a1a}-1.66\\%$ | | test_set_nested | 78.3410μs | 15.8513μs | 63.0865 KOps/s | 66.9140 KOps/s | $\textbf{\color{#d91a1a}-5.72\\%}$ | | test_set_nested_new | 0.1107ms | 19.1289μs | 52.2771 KOps/s | 54.3748 KOps/s | $\color{#d91a1a}-3.86\\%$ | | test_select | 0.1064ms | 31.9616μs | 31.2876 KOps/s | 32.2158 KOps/s | $\color{#d91a1a}-2.88\\%$ | | test_select_nested | 86.1310μs | 52.7404μs | 18.9608 KOps/s | 18.2226 KOps/s | $\color{#35bf28}+4.05\\%$ | | test_exclude_nested | 95.4710μs | 71.0778μs | 14.0691 KOps/s | 13.8422 KOps/s | $\color{#35bf28}+1.64\\%$ | | test_empty[True] | 0.3459ms | 0.2980ms | 3.3553 KOps/s | 3.3729 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_empty[False] | 2.1990μs | 0.9153μs | 1.0925 MOps/s | 1.0268 MOps/s | $\textbf{\color{#35bf28}+6.40\\%}$ | | test_to | 87.8310μs | 58.2296μs | 17.1734 KOps/s | 16.7980 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_to_nonblocking | 61.3110μs | 34.6040μs | 28.8984 KOps/s | 27.3899 KOps/s | $\textbf{\color{#35bf28}+5.51\\%}$ | | test_unbind_speed | 0.2744ms | 0.2544ms | 3.9312 KOps/s | 3.9417 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_unbind_speed_stack0 | 0.2960ms | 0.2481ms | 4.0305 KOps/s | 3.9251 KOps/s | $\color{#35bf28}+2.69\\%$ | | test_unbind_speed_stack1 | 91.4297ms | 0.7697ms | 1.2991 KOps/s | 1.3646 KOps/s | $\color{#d91a1a}-4.80\\%$ | | test_split | 89.4906ms | 1.5678ms | 637.8270 Ops/s | 624.6162 Ops/s | $\color{#35bf28}+2.12\\%$ | | test_chunk | 1.4810ms | 1.4286ms | 700.0103 Ops/s | 686.6939 Ops/s | $\color{#35bf28}+1.94\\%$ | | test_creation[device0] | 0.1272ms | 53.5222μs | 18.6838 KOps/s | 17.6267 KOps/s | $\textbf{\color{#35bf28}+6.00\\%}$ | | test_creation_from_tensor | 0.1902ms | 53.2773μs | 18.7697 KOps/s | 18.1860 KOps/s | $\color{#35bf28}+3.21\\%$ | | test_add_one[memmap_tensor0] | 76.2610μs | 6.4768μs | 154.3972 KOps/s | 157.7866 KOps/s | $\color{#d91a1a}-2.15\\%$ | | test_contiguous[memmap_tensor0] | 23.9110μs | 0.5792μs | 1.7264 MOps/s | 1.7006 MOps/s | $\color{#35bf28}+1.52\\%$ | | test_stack[memmap_tensor0] | 30.8800μs | 4.3154μs | 231.7257 KOps/s | 230.0374 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_memmaptd_index | 1.1580ms | 0.2514ms | 3.9780 KOps/s | 4.0027 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_memmaptd_index_astensor | 0.6472ms | 0.3140ms | 3.1845 KOps/s | 3.1548 KOps/s | $\color{#35bf28}+0.94\\%$ | | test_memmaptd_index_op | 0.8466ms | 0.5796ms | 1.7253 KOps/s | 1.6254 KOps/s | $\textbf{\color{#35bf28}+6.15\\%}$ | | test_serialize_model | 0.1867s | 0.1014s | 9.8609 Ops/s | 10.5217 Ops/s | $\textbf{\color{#d91a1a}-6.28\\%}$ | | test_serialize_model_pickle | 1.3507s | 1.2355s | 0.8094 Ops/s | 0.8078 Ops/s | $\color{#35bf28}+0.20\\%$ | | test_serialize_weights | 92.4095ms | 88.1544ms | 11.3437 Ops/s | 9.6534 Ops/s | $\textbf{\color{#35bf28}+17.51\\%}$ | | test_serialize_weights_returnearly | 0.1700s | 71.0804ms | 14.0686 Ops/s | 12.7420 Ops/s | $\textbf{\color{#35bf28}+10.41\\%}$ | | test_serialize_weights_pickle | 1.3506s | 1.2434s | 0.8042 Ops/s | 0.8012 Ops/s | $\color{#35bf28}+0.38\\%$ | | test_reshape_pytree | 57.3710μs | 24.9392μs | 40.0975 KOps/s | 40.1308 KOps/s | $\color{#d91a1a}-0.08\\%$ | | test_reshape_td | 54.1910μs | 29.8241μs | 33.5299 KOps/s | 33.7264 KOps/s | $\color{#d91a1a}-0.58\\%$ | | test_view_pytree | 54.7410μs | 24.6256μs | 40.6082 KOps/s | 39.0081 KOps/s | $\color{#35bf28}+4.10\\%$ | | test_view_td | 61.7010μs | 36.6280μs | 27.3015 KOps/s | 27.2293 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_unbind_pytree | 0.1382ms | 30.4208μs | 32.8722 KOps/s | 33.2645 KOps/s | $\color{#d91a1a}-1.18\\%$ | | test_unbind_td | 0.4929ms | 37.8550μs | 26.4166 KOps/s | 26.2325 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_split_pytree | 57.8910μs | 32.5895μs | 30.6847 KOps/s | 29.9882 KOps/s | $\color{#35bf28}+2.32\\%$ | | test_split_td | 0.1728ms | 36.8942μs | 27.1045 KOps/s | 27.8412 KOps/s | $\color{#d91a1a}-2.65\\%$ | | test_add_pytree | 0.1977ms | 38.1657μs | 26.2015 KOps/s | 27.5814 KOps/s | $\textbf{\color{#d91a1a}-5.00\\%}$ | | test_add_td | 79.9310μs | 47.2661μs | 21.1568 KOps/s | 21.5317 KOps/s | $\color{#d91a1a}-1.74\\%$ | | test_distributed | 1.8492ms | 71.7807μs | 13.9313 KOps/s | 12.6709 KOps/s | $\textbf{\color{#35bf28}+9.95\\%}$ | | test_tdmodule | 0.1382ms | 14.2458μs | 70.1959 KOps/s | 76.8187 KOps/s | $\textbf{\color{#d91a1a}-8.62\\%}$ | | test_tdmodule_dispatch | 45.8110μs | 28.3875μs | 35.2267 KOps/s | 37.5329 KOps/s | $\textbf{\color{#d91a1a}-6.14\\%}$ | | test_tdseq | 30.8200μs | 15.1658μs | 65.9380 KOps/s | 69.2038 KOps/s | $\color{#d91a1a}-4.72\\%$ | | test_tdseq_dispatch | 51.3200μs | 30.8989μs | 32.3637 KOps/s | 34.1273 KOps/s | $\textbf{\color{#d91a1a}-5.17\\%}$ | | test_instantiation_functorch | 1.5365ms | 1.3667ms | 731.6872 Ops/s | 734.3950 Ops/s | $\color{#d91a1a}-0.37\\%$ | | test_instantiation_td | 92.6883ms | 1.0848ms | 921.8122 Ops/s | 1.0397 KOps/s | $\textbf{\color{#d91a1a}-11.34\\%}$ | | test_exec_functorch | 0.1805ms | 0.1421ms | 7.0350 KOps/s | 6.9509 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_exec_functional_call | 0.1693ms | 0.1339ms | 7.4692 KOps/s | 7.6727 KOps/s | $\color{#d91a1a}-2.65\\%$ | | test_exec_td | 0.1704ms | 0.1334ms | 7.4973 KOps/s | 7.7864 KOps/s | $\color{#d91a1a}-3.71\\%$ | | test_exec_td_decorator | 0.7454ms | 0.2122ms | 4.7117 KOps/s | 4.8847 KOps/s | $\color{#d91a1a}-3.54\\%$ | | test_vmap_mlp_speed[True-True] | 0.7677ms | 0.5903ms | 1.6941 KOps/s | 1.7364 KOps/s | $\color{#d91a1a}-2.44\\%$ | | test_vmap_mlp_speed[True-False] | 0.6456ms | 0.5896ms | 1.6960 KOps/s | 1.7454 KOps/s | $\color{#d91a1a}-2.83\\%$ | | test_vmap_mlp_speed[False-True] | 0.6996ms | 0.5340ms | 1.8726 KOps/s | 1.9572 KOps/s | $\color{#d91a1a}-4.32\\%$ | | test_vmap_mlp_speed[False-False] | 0.6971ms | 0.5317ms | 1.8807 KOps/s | 1.9433 KOps/s | $\color{#d91a1a}-3.22\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.2255ms | 0.6610ms | 1.5128 KOps/s | 1.5423 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9377ms | 0.6705ms | 1.4913 KOps/s | 1.5451 KOps/s | $\color{#d91a1a}-3.48\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8116ms | 0.5758ms | 1.7368 KOps/s | 1.7439 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7733ms | 0.5751ms | 1.7389 KOps/s | 1.7506 KOps/s | $\color{#d91a1a}-0.67\\%$ | | test_vmap_transformer_speed[True-True] | 7.7946ms | 7.6034ms | 131.5196 Ops/s | 126.4953 Ops/s | $\color{#35bf28}+3.97\\%$ | | test_vmap_transformer_speed[True-False] | 7.9651ms | 7.5830ms | 131.8735 Ops/s | 128.8308 Ops/s | $\color{#35bf28}+2.36\\%$ | | test_vmap_transformer_speed[False-True] | 8.9126ms | 7.5354ms | 132.7065 Ops/s | 129.4671 Ops/s | $\color{#35bf28}+2.50\\%$ | | test_vmap_transformer_speed[False-False] | 8.4337ms | 7.8352ms | 127.6294 Ops/s | 131.0116 Ops/s | $\color{#d91a1a}-2.58\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 19.4653ms | 19.1562ms | 52.2023 Ops/s | 52.5511 Ops/s | $\color{#d91a1a}-0.66\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 19.6081ms | 19.1634ms | 52.1829 Ops/s | 52.7703 Ops/s | $\color{#d91a1a}-1.11\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 19.6467ms | 19.0140ms | 52.5928 Ops/s | 52.9564 Ops/s | $\color{#d91a1a}-0.69\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 20.0737ms | 19.1902ms | 52.1099 Ops/s | 52.8977 Ops/s | $\color{#d91a1a}-1.49\\%$ | | test_to_module_speed[True] | 2.9206ms | 1.5976ms | 625.9245 Ops/s | 648.6298 Ops/s | $\color{#d91a1a}-3.50\\%$ | | test_to_module_speed[False] | 2.0358ms | 1.5688ms | 637.4180 Ops/s | 651.8659 Ops/s | $\color{#d91a1a}-2.22\\%$ | | test_tc_init | 0.1689ms | 34.2705μs | 29.1796 KOps/s | 19.8105 KOps/s | $\textbf{\color{#35bf28}+47.29\\%}$ | | test_tc_init_nested | 0.3959ms | 70.1147μs | 14.2624 KOps/s | 10.0658 KOps/s | $\textbf{\color{#35bf28}+41.69\\%}$ | | test_tc_first_layer_tensor | 0.1355ms | 3.5831μs | 279.0878 KOps/s | 285.2919 KOps/s | $\color{#d91a1a}-2.17\\%$ | | test_tc_first_layer_nontensor | 0.1284ms | 3.5945μs | 278.2041 KOps/s | 281.5400 KOps/s | $\color{#d91a1a}-1.18\\%$ | | test_tc_second_layer_tensor | 27.5964μs | 1.1356μs | 880.5617 KOps/s | 906.5597 KOps/s | $\color{#d91a1a}-2.87\\%$ | | test_tc_second_layer_nontensor | 0.1168ms | 4.1234μs | 242.5176 KOps/s | 247.0890 KOps/s | $\color{#d91a1a}-1.85\\%$ |