pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[Feature] Compile - tensorclass compatibility #878

Closed vmoens closed 1 month ago

vmoens commented 1 month ago

Stack from ghstack (oldest at bottom):

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 133. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 48.4110μs | 18.0836μs | 55.2987 KOps/s | 54.0285 KOps/s | $\color{#35bf28}+2.35\\%$ | | test_plain_set_stack_nested | 55.1510μs | 18.3504μs | 54.4947 KOps/s | 53.9939 KOps/s | $\color{#35bf28}+0.93\\%$ | | test_plain_set_nested_inplace | 49.2520μs | 20.0334μs | 49.9167 KOps/s | 48.7680 KOps/s | $\color{#35bf28}+2.36\\%$ | | test_plain_set_stack_nested_inplace | 70.2620μs | 19.7957μs | 50.5160 KOps/s | 49.0013 KOps/s | $\color{#35bf28}+3.09\\%$ | | test_items | 20.3080μs | 2.6509μs | 377.2349 KOps/s | 369.0890 KOps/s | $\color{#35bf28}+2.21\\%$ | | test_items_nested | 0.6701ms | 0.3687ms | 2.7121 KOps/s | 2.7373 KOps/s | $\color{#d91a1a}-0.92\\%$ | | test_items_nested_locked | 0.7523ms | 0.3652ms | 2.7382 KOps/s | 2.7644 KOps/s | $\color{#d91a1a}-0.95\\%$ | | test_items_nested_leaf | 0.1703ms | 86.8071μs | 11.5198 KOps/s | 11.9258 KOps/s | $\color{#d91a1a}-3.40\\%$ | | test_items_stack_nested | 0.5213ms | 0.3669ms | 2.7255 KOps/s | 2.7487 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_items_stack_nested_leaf | 0.1693ms | 87.8826μs | 11.3788 KOps/s | 11.6490 KOps/s | $\color{#d91a1a}-2.32\\%$ | | test_items_stack_nested_locked | 0.7152ms | 0.3690ms | 2.7098 KOps/s | 2.7346 KOps/s | $\color{#d91a1a}-0.91\\%$ | | test_keys | 47.7310μs | 3.9989μs | 250.0711 KOps/s | 259.9747 KOps/s | $\color{#d91a1a}-3.81\\%$ | | test_keys_nested | 0.2445ms | 0.1437ms | 6.9612 KOps/s | 6.7692 KOps/s | $\color{#35bf28}+2.84\\%$ | | test_keys_nested_locked | 2.1013ms | 0.1519ms | 6.5848 KOps/s | 6.6248 KOps/s | $\color{#d91a1a}-0.60\\%$ | | test_keys_nested_leaf | 0.7155ms | 0.1284ms | 7.7888 KOps/s | 8.0508 KOps/s | $\color{#d91a1a}-3.25\\%$ | | test_keys_stack_nested | 0.2678ms | 0.1452ms | 6.8876 KOps/s | 6.8981 KOps/s | $\color{#d91a1a}-0.15\\%$ | | test_keys_stack_nested_leaf | 0.2504ms | 0.1244ms | 8.0417 KOps/s | 8.0045 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_keys_stack_nested_locked | 0.2201ms | 0.1520ms | 6.5800 KOps/s | 6.6071 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_values | 7.6204μs | 1.1333μs | 882.3703 KOps/s | 846.5522 KOps/s | $\color{#35bf28}+4.23\\%$ | | test_values_nested | 99.9760μs | 49.6384μs | 20.1457 KOps/s | 20.3920 KOps/s | $\color{#d91a1a}-1.21\\%$ | | test_values_nested_locked | 91.4200μs | 49.9313μs | 20.0275 KOps/s | 20.2804 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_values_nested_leaf | 0.1023ms | 44.7377μs | 22.3525 KOps/s | 22.8876 KOps/s | $\color{#d91a1a}-2.34\\%$ | | test_values_stack_nested | 97.2010μs | 49.6318μs | 20.1484 KOps/s | 20.0958 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_values_stack_nested_leaf | 0.3469ms | 44.5557μs | 22.4438 KOps/s | 20.8435 KOps/s | $\textbf{\color{#35bf28}+7.68\\%}$ | | test_values_stack_nested_locked | 0.1064ms | 50.1483μs | 19.9409 KOps/s | 20.1464 KOps/s | $\color{#d91a1a}-1.02\\%$ | | test_membership | 4.0146μs | 0.7453μs | 1.3417 MOps/s | 1.4123 MOps/s | $\color{#d91a1a}-5.00\\%$ | | test_membership_nested | 31.8190μs | 2.7109μs | 368.8843 KOps/s | 369.9288 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_membership_nested_leaf | 30.0060μs | 2.7359μs | 365.5152 KOps/s | 371.2966 KOps/s | $\color{#d91a1a}-1.56\\%$ | | test_membership_stacked_nested | 36.9090μs | 2.6986μs | 370.5657 KOps/s | 374.1551 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_membership_stacked_nested_leaf | 22.7020μs | 2.7854μs | 359.0130 KOps/s | 372.3242 KOps/s | $\color{#d91a1a}-3.58\\%$ | | test_membership_nested_last | 30.6770μs | 4.0120μs | 249.2513 KOps/s | 249.9708 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_membership_nested_leaf_last | 24.2250μs | 4.0977μs | 244.0369 KOps/s | 250.2334 KOps/s | $\color{#d91a1a}-2.48\\%$ | | test_membership_stacked_nested_last | 37.3590μs | 4.0496μs | 246.9392 KOps/s | 195.5578 KOps/s | $\textbf{\color{#35bf28}+26.27\\%}$ | | test_membership_stacked_nested_leaf_last | 22.9120μs | 4.0133μs | 249.1685 KOps/s | 192.4975 KOps/s | $\textbf{\color{#35bf28}+29.44\\%}$ | | test_nested_getleaf | 38.0310μs | 11.2679μs | 88.7478 KOps/s | 90.3226 KOps/s | $\color{#d91a1a}-1.74\\%$ | | test_nested_get | 33.9840μs | 10.8558μs | 92.1165 KOps/s | 96.5338 KOps/s | $\color{#d91a1a}-4.58\\%$ | | test_stacked_getleaf | 45.0750μs | 11.1459μs | 89.7187 KOps/s | 91.8417 KOps/s | $\color{#d91a1a}-2.31\\%$ | | test_stacked_get | 38.4620μs | 10.5346μs | 94.9254 KOps/s | 97.3004 KOps/s | $\color{#d91a1a}-2.44\\%$ | | test_nested_getitemleaf | 45.6760μs | 11.7858μs | 84.8480 KOps/s | 87.1835 KOps/s | $\color{#d91a1a}-2.68\\%$ | | test_nested_getitem | 40.6360μs | 10.7418μs | 93.0947 KOps/s | 94.0493 KOps/s | $\color{#d91a1a}-1.02\\%$ | | test_stacked_getitemleaf | 40.0350μs | 11.7805μs | 84.8858 KOps/s | 87.7826 KOps/s | $\color{#d91a1a}-3.30\\%$ | | test_stacked_getitem | 37.4400μs | 10.9055μs | 91.6967 KOps/s | 94.7829 KOps/s | $\color{#d91a1a}-3.26\\%$ | | test_lock_nested | 4.6898ms | 0.4419ms | 2.2628 KOps/s | 2.3135 KOps/s | $\color{#d91a1a}-2.19\\%$ | | test_lock_stack_nested | 0.6595ms | 0.4064ms | 2.4605 KOps/s | 2.4714 KOps/s | $\color{#d91a1a}-0.44\\%$ | | test_unlock_nested | 0.7018ms | 0.3528ms | 2.8348 KOps/s | 2.3743 KOps/s | $\textbf{\color{#35bf28}+19.40\\%}$ | | test_unlock_stack_nested | 0.4951ms | 0.3246ms | 3.0810 KOps/s | 3.0993 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_flatten_speed | 0.2006ms | 0.1056ms | 9.4691 KOps/s | 9.5262 KOps/s | $\color{#d91a1a}-0.60\\%$ | | test_unflatten_speed | 0.6223ms | 0.4427ms | 2.2588 KOps/s | 2.2706 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_common_ops | 1.6498ms | 0.7815ms | 1.2796 KOps/s | 1.2713 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_creation | 18.4940μs | 2.4225μs | 412.7895 KOps/s | 419.4848 KOps/s | $\color{#d91a1a}-1.60\\%$ | | test_creation_empty | 53.6400μs | 12.0768μs | 82.8037 KOps/s | 77.9749 KOps/s | $\textbf{\color{#35bf28}+6.19\\%}$ | | test_creation_nested_1 | 37.8710μs | 15.1407μs | 66.0470 KOps/s | 63.1791 KOps/s | $\color{#35bf28}+4.54\\%$ | | test_creation_nested_2 | 46.1960μs | 19.0126μs | 52.5968 KOps/s | 51.4167 KOps/s | $\color{#35bf28}+2.30\\%$ | | test_clone | 0.1480ms | 14.0413μs | 71.2185 KOps/s | 75.8092 KOps/s | $\textbf{\color{#d91a1a}-6.06\\%}$ | | test_getitem[int] | 0.1487ms | 13.1534μs | 76.0261 KOps/s | 89.4415 KOps/s | $\textbf{\color{#d91a1a}-15.00\\%}$ | | test_getitem[slice_int] | 67.3260μs | 23.5920μs | 42.3872 KOps/s | 42.6660 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_getitem[range] | 1.6606ms | 46.8157μs | 21.3603 KOps/s | 22.3098 KOps/s | $\color{#d91a1a}-4.26\\%$ | | test_getitem[tuple] | 59.4710μs | 19.5428μs | 51.1698 KOps/s | 53.5598 KOps/s | $\color{#d91a1a}-4.46\\%$ | | test_getitem[list] | 0.1566ms | 41.0079μs | 24.3856 KOps/s | 25.5596 KOps/s | $\color{#d91a1a}-4.59\\%$ | | test_setitem_dim[int] | 69.7310μs | 34.2665μs | 29.1830 KOps/s | 28.9174 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_setitem_dim[slice_int] | 90.0280μs | 62.0630μs | 16.1127 KOps/s | 15.9441 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_setitem_dim[range] | 0.1372ms | 82.4067μs | 12.1349 KOps/s | 11.7763 KOps/s | $\color{#35bf28}+3.05\\%$ | | test_setitem_dim[tuple] | 94.2060μs | 50.5421μs | 19.7855 KOps/s | 19.6860 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_setitem | 82.4040μs | 20.5559μs | 48.6477 KOps/s | 48.0112 KOps/s | $\color{#35bf28}+1.33\\%$ | | test_set | 89.5760μs | 20.0676μs | 49.8316 KOps/s | 48.7656 KOps/s | $\color{#35bf28}+2.19\\%$ | | test_set_shared | 2.3561ms | 0.1659ms | 6.0263 KOps/s | 5.9180 KOps/s | $\color{#35bf28}+1.83\\%$ | | test_update | 0.1087ms | 23.7568μs | 42.0932 KOps/s | 41.0879 KOps/s | $\color{#35bf28}+2.45\\%$ | | test_update_nested | 0.1120ms | 32.9419μs | 30.3565 KOps/s | 29.8295 KOps/s | $\color{#35bf28}+1.77\\%$ | | test_update__nested | 0.1158ms | 25.6290μs | 39.0184 KOps/s | 40.8578 KOps/s | $\color{#d91a1a}-4.50\\%$ | | test_set_nested | 96.0590μs | 21.9829μs | 45.4900 KOps/s | 45.0120 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_set_nested_new | 84.2170μs | 27.0403μs | 36.9818 KOps/s | 37.3818 KOps/s | $\color{#d91a1a}-1.07\\%$ | | test_select | 0.1109ms | 41.9431μs | 23.8418 KOps/s | 23.3288 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_select_nested | 1.0195ms | 61.7210μs | 16.2019 KOps/s | 16.7428 KOps/s | $\color{#d91a1a}-3.23\\%$ | | test_exclude_nested | 0.1941ms | 83.5420μs | 11.9700 KOps/s | 12.4282 KOps/s | $\color{#d91a1a}-3.69\\%$ | | test_empty[True] | 0.9612ms | 0.3494ms | 2.8617 KOps/s | 2.9276 KOps/s | $\color{#d91a1a}-2.25\\%$ | | test_empty[False] | 6.5223μs | 1.2507μs | 799.5453 KOps/s | 802.3628 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_unbind_speed | 0.3355ms | 0.2573ms | 3.8867 KOps/s | 3.9379 KOps/s | $\color{#d91a1a}-1.30\\%$ | | test_unbind_speed_stack0 | 0.4284ms | 0.2548ms | 3.9241 KOps/s | 3.9760 KOps/s | $\color{#d91a1a}-1.31\\%$ | | test_unbind_speed_stack1 | 86.6522ms | 0.7513ms | 1.3310 KOps/s | 1.3647 KOps/s | $\color{#d91a1a}-2.47\\%$ | | test_split | 74.5949ms | 1.6343ms | 611.8703 Ops/s | 677.3343 Ops/s | $\textbf{\color{#d91a1a}-9.66\\%}$ | | test_chunk | 74.2997ms | 1.6434ms | 608.4871 Ops/s | 620.4393 Ops/s | $\color{#d91a1a}-1.93\\%$ | | test_creation[device0] | 0.2286ms | 95.1444μs | 10.5103 KOps/s | 10.2982 KOps/s | $\color{#35bf28}+2.06\\%$ | | test_creation_from_tensor | 3.9038ms | 98.6881μs | 10.1329 KOps/s | 10.3743 KOps/s | $\color{#d91a1a}-2.33\\%$ | | test_add_one[memmap_tensor0] | 0.2188ms | 5.3445μs | 187.1074 KOps/s | 184.5278 KOps/s | $\color{#35bf28}+1.40\\%$ | | test_contiguous[memmap_tensor0] | 19.1560μs | 0.6440μs | 1.5529 MOps/s | 1.5580 MOps/s | $\color{#d91a1a}-0.33\\%$ | | test_stack[memmap_tensor0] | 43.8320μs | 3.5597μs | 280.9238 KOps/s | 286.6019 KOps/s | $\color{#d91a1a}-1.98\\%$ | | test_memmaptd_index | 1.0833ms | 0.2585ms | 3.8679 KOps/s | 3.9265 KOps/s | $\color{#d91a1a}-1.49\\%$ | | test_memmaptd_index_astensor | 1.3178ms | 0.3447ms | 2.9010 KOps/s | 3.0492 KOps/s | $\color{#d91a1a}-4.86\\%$ | | test_memmaptd_index_op | 1.3205ms | 0.6244ms | 1.6015 KOps/s | 1.5632 KOps/s | $\color{#35bf28}+2.45\\%$ | | test_serialize_model | 0.1386s | 0.1245s | 8.0321 Ops/s | 6.7829 Ops/s | $\textbf{\color{#35bf28}+18.42\\%}$ | | test_serialize_model_pickle | 0.4412s | 0.3889s | 2.5716 Ops/s | 2.4987 Ops/s | $\color{#35bf28}+2.92\\%$ | | test_serialize_weights | 0.2115s | 0.1364s | 7.3333 Ops/s | 8.1142 Ops/s | $\textbf{\color{#d91a1a}-9.62\\%}$ | | test_serialize_weights_returnearly | 0.1924s | 0.1656s | 6.0392 Ops/s | 5.6460 Ops/s | $\textbf{\color{#35bf28}+6.96\\%}$ | | test_serialize_weights_pickle | 0.5146s | 0.4124s | 2.4248 Ops/s | 2.4860 Ops/s | $\color{#d91a1a}-2.46\\%$ | | test_serialize_weights_filesystem | 0.2191s | 0.1562s | 6.4003 Ops/s | 6.9497 Ops/s | $\textbf{\color{#d91a1a}-7.91\\%}$ | | test_serialize_model_filesystem | 0.1609s | 0.1529s | 6.5385 Ops/s | 6.7299 Ops/s | $\color{#d91a1a}-2.84\\%$ | | test_reshape_pytree | 72.0440μs | 26.7682μs | 37.3578 KOps/s | 39.3989 KOps/s | $\textbf{\color{#d91a1a}-5.18\\%}$ | | test_reshape_td | 74.4090μs | 34.0474μs | 29.3708 KOps/s | 29.6481 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_view_pytree | 72.1350μs | 26.1987μs | 38.1698 KOps/s | 39.1148 KOps/s | $\color{#d91a1a}-2.42\\%$ | | test_view_td | 0.1142ms | 39.5952μs | 25.2556 KOps/s | 25.9772 KOps/s | $\color{#d91a1a}-2.78\\%$ | | test_unbind_pytree | 66.2140μs | 29.4855μs | 33.9150 KOps/s | 34.0432 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_unbind_td | 0.4480ms | 37.7964μs | 26.4575 KOps/s | 26.4058 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_split_pytree | 67.0250μs | 29.9492μs | 33.3898 KOps/s | 34.1918 KOps/s | $\color{#d91a1a}-2.35\\%$ | | test_split_td | 0.1358ms | 40.9951μs | 24.3932 KOps/s | 24.9883 KOps/s | $\color{#d91a1a}-2.38\\%$ | | test_add_pytree | 90.6290μs | 35.5125μs | 28.1591 KOps/s | 28.4772 KOps/s | $\color{#d91a1a}-1.12\\%$ | | test_add_td | 0.3118ms | 58.5133μs | 17.0901 KOps/s | 16.2033 KOps/s | $\textbf{\color{#35bf28}+5.47\\%}$ | | test_distributed | 0.3045ms | 0.1298ms | 7.7034 KOps/s | 7.5854 KOps/s | $\color{#35bf28}+1.56\\%$ | | test_tdmodule | 51.2260μs | 17.1955μs | 58.1549 KOps/s | 57.1343 KOps/s | $\color{#35bf28}+1.79\\%$ | | test_tdmodule_dispatch | 54.8730μs | 36.3680μs | 27.4967 KOps/s | 26.9978 KOps/s | $\color{#35bf28}+1.85\\%$ | | test_tdseq | 47.3280μs | 19.6162μs | 50.9784 KOps/s | 49.5007 KOps/s | $\color{#35bf28}+2.99\\%$ | | test_tdseq_dispatch | 71.5740μs | 40.7105μs | 24.5637 KOps/s | 23.7592 KOps/s | $\color{#35bf28}+3.39\\%$ | | test_instantiation_functorch | 1.5742ms | 1.3327ms | 750.3315 Ops/s | 763.8910 Ops/s | $\color{#d91a1a}-1.78\\%$ | | test_instantiation_td | 1.6172ms | 1.0398ms | 961.6895 Ops/s | 978.3889 Ops/s | $\color{#d91a1a}-1.71\\%$ | | test_exec_functorch | 0.5010ms | 0.1602ms | 6.2423 KOps/s | 6.1803 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_exec_functional_call | 0.3105ms | 0.1466ms | 6.8200 KOps/s | 6.6490 KOps/s | $\color{#35bf28}+2.57\\%$ | | test_exec_td | 0.3523ms | 0.1470ms | 6.8031 KOps/s | 6.7662 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_exec_td_decorator | 0.4949ms | 0.2331ms | 4.2904 KOps/s | 4.1227 KOps/s | $\color{#35bf28}+4.07\\%$ | | test_vmap_mlp_speed[True-True] | 0.6931ms | 0.4877ms | 2.0504 KOps/s | 2.0229 KOps/s | $\color{#35bf28}+1.36\\%$ | | test_vmap_mlp_speed[True-False] | 0.7753ms | 0.4901ms | 2.0404 KOps/s | 2.0180 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_vmap_mlp_speed[False-True] | 0.7125ms | 0.4017ms | 2.4893 KOps/s | 2.5050 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_vmap_mlp_speed[False-False] | 0.7181ms | 0.4003ms | 2.4981 KOps/s | 2.4863 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 0.9838ms | 0.5811ms | 1.7208 KOps/s | 1.7146 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8116ms | 0.5802ms | 1.7235 KOps/s | 1.7169 KOps/s | $\color{#35bf28}+0.38\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7939ms | 0.4723ms | 2.1171 KOps/s | 2.0841 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.6573ms | 0.4712ms | 2.1221 KOps/s | 2.0918 KOps/s | $\color{#35bf28}+1.45\\%$ | | test_to_module_speed[True] | 2.3718ms | 1.8134ms | 551.4643 Ops/s | 506.2161 Ops/s | $\textbf{\color{#35bf28}+8.94\\%}$ | | test_to_module_speed[False] | 2.8168ms | 1.7654ms | 566.4507 Ops/s | 567.7126 Ops/s | $\color{#d91a1a}-0.22\\%$ | | test_tc_init | 0.1331ms | 66.4272μs | 15.0541 KOps/s | 16.7130 KOps/s | $\textbf{\color{#d91a1a}-9.93\\%}$ | | test_tc_init_nested | 0.2844ms | 0.1338ms | 7.4725 KOps/s | 8.5267 KOps/s | $\textbf{\color{#d91a1a}-12.36\\%}$ | | test_tc_first_layer_tensor | 50.5140μs | 9.1281μs | 109.5521 KOps/s | 121.6535 KOps/s | $\textbf{\color{#d91a1a}-9.95\\%}$ | | test_tc_first_layer_nontensor | 46.6570μs | 9.1793μs | 108.9411 KOps/s | 115.2188 KOps/s | $\textbf{\color{#d91a1a}-5.45\\%}$ | | test_tc_second_layer_tensor | 22.9830μs | 2.8590μs | 349.7699 KOps/s | 398.1200 KOps/s | $\textbf{\color{#d91a1a}-12.14\\%}$ | | test_tc_second_layer_nontensor | 0.1538ms | 10.5968μs | 94.3678 KOps/s | 107.9547 KOps/s | $\textbf{\color{#d91a1a}-12.59\\%}$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 141. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 73.9510μs | 12.4659μs | 80.2188 KOps/s | 76.9928 KOps/s | $\color{#35bf28}+4.19\\%$ | | test_plain_set_stack_nested | 38.7810μs | 12.4718μs | 80.1810 KOps/s | 76.8099 KOps/s | $\color{#35bf28}+4.39\\%$ | | test_plain_set_nested_inplace | 52.3910μs | 13.5481μs | 73.8109 KOps/s | 71.4524 KOps/s | $\color{#35bf28}+3.30\\%$ | | test_plain_set_stack_nested_inplace | 62.6010μs | 13.5861μs | 73.6044 KOps/s | 71.4541 KOps/s | $\color{#35bf28}+3.01\\%$ | | test_items | 17.0900μs | 4.6483μs | 215.1334 KOps/s | 216.9480 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_items_nested | 0.4478ms | 0.3898ms | 2.5651 KOps/s | 2.5709 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_items_nested_locked | 0.4369ms | 0.3897ms | 2.5663 KOps/s | 2.5429 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_items_nested_leaf | 0.1080ms | 86.4065μs | 11.5732 KOps/s | 11.7752 KOps/s | $\color{#d91a1a}-1.72\\%$ | | test_items_stack_nested | 0.4506ms | 0.3910ms | 2.5578 KOps/s | 2.5493 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_items_stack_nested_leaf | 0.1581ms | 88.0017μs | 11.3634 KOps/s | 11.6758 KOps/s | $\color{#d91a1a}-2.68\\%$ | | test_items_stack_nested_locked | 0.4645ms | 0.3977ms | 2.5148 KOps/s | 2.5629 KOps/s | $\color{#d91a1a}-1.88\\%$ | | test_keys | 24.6400μs | 4.3455μs | 230.1226 KOps/s | 228.9622 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_keys_nested | 91.9110μs | 69.3326μs | 14.4232 KOps/s | 15.0732 KOps/s | $\color{#d91a1a}-4.31\\%$ | | test_keys_nested_locked | 2.1072ms | 75.4635μs | 13.2514 KOps/s | 13.4149 KOps/s | $\color{#d91a1a}-1.22\\%$ | | test_keys_nested_leaf | 87.0810μs | 58.9601μs | 16.9606 KOps/s | 17.4110 KOps/s | $\color{#d91a1a}-2.59\\%$ | | test_keys_stack_nested | 94.9330μs | 68.3296μs | 14.6349 KOps/s | 14.9929 KOps/s | $\color{#d91a1a}-2.39\\%$ | | test_keys_stack_nested_leaf | 87.5520μs | 59.5855μs | 16.7826 KOps/s | 17.2693 KOps/s | $\color{#d91a1a}-2.82\\%$ | | test_keys_stack_nested_locked | 0.1039ms | 74.2389μs | 13.4700 KOps/s | 13.6224 KOps/s | $\color{#d91a1a}-1.12\\%$ | | test_values | 8.1637μs | 1.7575μs | 568.9815 KOps/s | 570.2542 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_values_nested | 58.0210μs | 33.9379μs | 29.4656 KOps/s | 29.4635 KOps/s | $+0.01\\%$ | | test_values_nested_locked | 53.3420μs | 35.8775μs | 27.8726 KOps/s | 27.8927 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_values_nested_leaf | 53.5510μs | 30.1365μs | 33.1823 KOps/s | 33.4275 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_values_stack_nested | 63.6710μs | 35.1716μs | 28.4320 KOps/s | 28.8143 KOps/s | $\color{#d91a1a}-1.33\\%$ | | test_values_stack_nested_leaf | 54.3310μs | 31.1515μs | 32.1012 KOps/s | 32.4224 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_values_stack_nested_locked | 61.7610μs | 36.6383μs | 27.2938 KOps/s | 27.2523 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_membership | 1.6131μs | 0.5303μs | 1.8856 MOps/s | 1.8819 MOps/s | $\color{#35bf28}+0.19\\%$ | | test_membership_nested | 14.0700μs | 2.0245μs | 493.9466 KOps/s | 476.5956 KOps/s | $\color{#35bf28}+3.64\\%$ | | test_membership_nested_leaf | 17.5955μs | 1.9814μs | 504.6989 KOps/s | 492.4469 KOps/s | $\color{#35bf28}+2.49\\%$ | | test_membership_stacked_nested | 31.7300μs | 2.0504μs | 487.7123 KOps/s | 478.6934 KOps/s | $\color{#35bf28}+1.88\\%$ | | test_membership_stacked_nested_leaf | 33.4100μs | 2.0119μs | 497.0519 KOps/s | 485.8740 KOps/s | $\color{#35bf28}+2.30\\%$ | | test_membership_nested_last | 21.7300μs | 2.9130μs | 343.2884 KOps/s | 339.6789 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_membership_nested_leaf_last | 15.2110μs | 2.9016μs | 344.6351 KOps/s | 338.6224 KOps/s | $\color{#35bf28}+1.78\\%$ | | test_membership_stacked_nested_last | 21.8600μs | 2.9377μs | 340.4020 KOps/s | 333.4585 KOps/s | $\color{#35bf28}+2.08\\%$ | | test_membership_stacked_nested_leaf_last | 21.0800μs | 2.9065μs | 344.0608 KOps/s | 336.2202 KOps/s | $\color{#35bf28}+2.33\\%$ | | test_nested_getleaf | 36.9710μs | 7.9553μs | 125.7024 KOps/s | 125.4797 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_nested_get | 32.7910μs | 7.4896μs | 133.5185 KOps/s | 133.5382 KOps/s | $\color{#d91a1a}-0.01\\%$ | | test_stacked_getleaf | 24.7210μs | 7.9936μs | 125.1000 KOps/s | 125.3608 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_stacked_get | 37.4710μs | 7.5078μs | 133.1943 KOps/s | 133.8681 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_nested_getitemleaf | 24.0010μs | 8.1204μs | 123.1470 KOps/s | 123.4429 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_nested_getitem | 29.6610μs | 7.6616μs | 130.5215 KOps/s | 131.2849 KOps/s | $\color{#d91a1a}-0.58\\%$ | | test_stacked_getitemleaf | 33.6910μs | 8.1561μs | 122.6076 KOps/s | 122.9560 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_stacked_getitem | 33.3420μs | 7.6819μs | 130.1756 KOps/s | 130.7118 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_lock_nested | 4.7200ms | 0.4184ms | 2.3901 KOps/s | 2.3991 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_lock_stack_nested | 0.4076ms | 0.3807ms | 2.6266 KOps/s | 2.6408 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_unlock_nested | 89.1425ms | 0.4244ms | 2.3561 KOps/s | 3.0144 KOps/s | $\textbf{\color{#d91a1a}-21.84\\%}$ | | test_unlock_stack_nested | 0.3298ms | 0.3001ms | 3.3321 KOps/s | 3.3532 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_flatten_speed | 0.3785ms | 0.1059ms | 9.4412 KOps/s | 9.4477 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_unflatten_speed | 0.3504ms | 0.2913ms | 3.4334 KOps/s | 3.4547 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_common_ops | 0.9939ms | 0.5836ms | 1.7135 KOps/s | 1.4735 KOps/s | $\textbf{\color{#35bf28}+16.29\\%}$ | | test_creation | 32.5000μs | 1.9298μs | 518.1781 KOps/s | 551.7396 KOps/s | $\textbf{\color{#d91a1a}-6.08\\%}$ | | test_creation_empty | 27.7310μs | 8.8769μs | 112.6514 KOps/s | 103.4108 KOps/s | $\textbf{\color{#35bf28}+8.94\\%}$ | | test_creation_nested_1 | 25.1100μs | 10.7400μs | 93.1096 KOps/s | 87.9216 KOps/s | $\textbf{\color{#35bf28}+5.90\\%}$ | | test_creation_nested_2 | 40.8110μs | 13.2651μs | 75.3861 KOps/s | 72.0200 KOps/s | $\color{#35bf28}+4.67\\%$ | | test_clone | 63.1420μs | 11.4002μs | 87.7179 KOps/s | 86.6148 KOps/s | $\color{#35bf28}+1.27\\%$ | | test_getitem[int] | 24.5700μs | 10.4465μs | 95.7255 KOps/s | 97.0312 KOps/s | $\color{#d91a1a}-1.35\\%$ | | test_getitem[slice_int] | 48.6800μs | 20.5263μs | 48.7181 KOps/s | 50.5767 KOps/s | $\color{#d91a1a}-3.67\\%$ | | test_getitem[range] | 0.1693ms | 38.5670μs | 25.9289 KOps/s | 25.7322 KOps/s | $\color{#35bf28}+0.76\\%$ | | test_getitem[tuple] | 35.3400μs | 17.7234μs | 56.4225 KOps/s | 56.7694 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_getitem[list] | 0.1636ms | 32.9955μs | 30.3072 KOps/s | 29.9815 KOps/s | $\color{#35bf28}+1.09\\%$ | | test_setitem_dim[int] | 41.2410μs | 26.0945μs | 38.3223 KOps/s | 35.5933 KOps/s | $\textbf{\color{#35bf28}+7.67\\%}$ | | test_setitem_dim[slice_int] | 63.1120μs | 46.2015μs | 21.6443 KOps/s | 20.1958 KOps/s | $\textbf{\color{#35bf28}+7.17\\%}$ | | test_setitem_dim[range] | 97.3610μs | 63.7987μs | 15.6743 KOps/s | 15.2698 KOps/s | $\color{#35bf28}+2.65\\%$ | | test_setitem_dim[tuple] | 60.6710μs | 40.4404μs | 24.7278 KOps/s | 23.4816 KOps/s | $\textbf{\color{#35bf28}+5.31\\%}$ | | test_setitem | 68.7120μs | 16.1938μs | 61.7519 KOps/s | 60.1744 KOps/s | $\color{#35bf28}+2.62\\%$ | | test_set | 79.2210μs | 15.8344μs | 63.1537 KOps/s | 62.2872 KOps/s | $\color{#35bf28}+1.39\\%$ | | test_set_shared | 2.7783ms | 98.6563μs | 10.1362 KOps/s | 10.4030 KOps/s | $\color{#d91a1a}-2.56\\%$ | | test_update | 0.1052ms | 18.6447μs | 53.6347 KOps/s | 50.3290 KOps/s | $\textbf{\color{#35bf28}+6.57\\%}$ | | test_update_nested | 61.7410μs | 23.7499μs | 42.1054 KOps/s | 41.1678 KOps/s | $\color{#35bf28}+2.28\\%$ | | test_update__nested | 67.6210μs | 21.7853μs | 45.9025 KOps/s | 45.5876 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_set_nested | 56.5910μs | 16.6364μs | 60.1092 KOps/s | 58.0277 KOps/s | $\color{#35bf28}+3.59\\%$ | | test_set_nested_new | 59.9110μs | 20.2745μs | 49.3231 KOps/s | 49.9066 KOps/s | $\color{#d91a1a}-1.17\\%$ | | test_select | 86.9210μs | 32.7081μs | 30.5735 KOps/s | 31.2831 KOps/s | $\color{#d91a1a}-2.27\\%$ | | test_select_nested | 79.7210μs | 51.9558μs | 19.2471 KOps/s | 18.8360 KOps/s | $\color{#35bf28}+2.18\\%$ | | test_exclude_nested | 97.3320μs | 70.9400μs | 14.0964 KOps/s | 13.9132 KOps/s | $\color{#35bf28}+1.32\\%$ | | test_empty[True] | 0.3327ms | 0.2963ms | 3.3748 KOps/s | 3.4011 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_empty[False] | 2.2291μs | 0.9084μs | 1.1008 MOps/s | 1.0813 MOps/s | $\color{#35bf28}+1.80\\%$ | | test_to | 91.5610μs | 59.4775μs | 16.8131 KOps/s | 16.5893 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_to_nonblocking | 67.5510μs | 35.7592μs | 27.9648 KOps/s | 28.5619 KOps/s | $\color{#d91a1a}-2.09\\%$ | | test_unbind_speed | 0.2989ms | 0.2567ms | 3.8962 KOps/s | 3.9000 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_unbind_speed_stack0 | 0.3054ms | 0.2530ms | 3.9533 KOps/s | 3.9548 KOps/s | $\color{#d91a1a}-0.04\\%$ | | test_unbind_speed_stack1 | 92.4863ms | 0.7900ms | 1.2658 KOps/s | 1.4027 KOps/s | $\textbf{\color{#d91a1a}-9.76\\%}$ | | test_split | 89.9433ms | 1.6152ms | 619.1153 Ops/s | 636.8446 Ops/s | $\color{#d91a1a}-2.78\\%$ | | test_chunk | 1.5246ms | 1.4702ms | 680.2024 Ops/s | 699.8477 Ops/s | $\color{#d91a1a}-2.81\\%$ | | test_creation[device0] | 0.1092ms | 55.4825μs | 18.0237 KOps/s | 17.8530 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_creation_from_tensor | 0.1228ms | 52.2444μs | 19.1408 KOps/s | 19.0150 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_add_one[memmap_tensor0] | 84.8720μs | 7.6474μs | 130.7635 KOps/s | 131.6583 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_contiguous[memmap_tensor0] | 9.2510μs | 0.6132μs | 1.6307 MOps/s | 1.6265 MOps/s | $\color{#35bf28}+0.26\\%$ | | test_stack[memmap_tensor0] | 34.7410μs | 5.0302μs | 198.7992 KOps/s | 200.7747 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_memmaptd_index | 1.1475ms | 0.2604ms | 3.8401 KOps/s | 3.9614 KOps/s | $\color{#d91a1a}-3.06\\%$ | | test_memmaptd_index_astensor | 0.5811ms | 0.3229ms | 3.0966 KOps/s | 3.1842 KOps/s | $\color{#d91a1a}-2.75\\%$ | | test_memmaptd_index_op | 0.9070ms | 0.6300ms | 1.5873 KOps/s | 1.4411 KOps/s | $\textbf{\color{#35bf28}+10.14\\%}$ | | test_serialize_model | 92.3402ms | 90.0529ms | 11.1046 Ops/s | 10.5097 Ops/s | $\textbf{\color{#35bf28}+5.66\\%}$ | | test_serialize_model_pickle | 1.3807s | 1.2351s | 0.8096 Ops/s | 0.8068 Ops/s | $\color{#35bf28}+0.35\\%$ | | test_serialize_weights | 91.6459ms | 87.5927ms | 11.4165 Ops/s | 10.7539 Ops/s | $\textbf{\color{#35bf28}+6.16\\%}$ | | test_serialize_weights_returnearly | 0.2234s | 72.9229ms | 13.7131 Ops/s | 13.9672 Ops/s | $\color{#d91a1a}-1.82\\%$ | | test_serialize_weights_pickle | 1.4150s | 1.2549s | 0.7969 Ops/s | 0.8014 Ops/s | $\color{#d91a1a}-0.56\\%$ | | test_reshape_pytree | 0.2363ms | 25.7526μs | 38.8310 KOps/s | 38.9522 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_reshape_td | 47.3220μs | 30.2714μs | 33.0345 KOps/s | 33.2299 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_view_pytree | 0.2412ms | 25.1831μs | 39.7092 KOps/s | 39.2344 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_view_td | 67.2110μs | 35.3936μs | 28.2537 KOps/s | 27.2185 KOps/s | $\color{#35bf28}+3.80\\%$ | | test_unbind_pytree | 56.4410μs | 31.0273μs | 32.2297 KOps/s | 32.8532 KOps/s | $\color{#d91a1a}-1.90\\%$ | | test_unbind_td | 0.4888ms | 39.8025μs | 25.1241 KOps/s | 25.8700 KOps/s | $\color{#d91a1a}-2.88\\%$ | | test_split_pytree | 0.1649ms | 34.8387μs | 28.7037 KOps/s | 29.3492 KOps/s | $\color{#d91a1a}-2.20\\%$ | | test_split_td | 0.3020ms | 37.6264μs | 26.5771 KOps/s | 26.6318 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_add_pytree | 57.1710μs | 39.2715μs | 25.4638 KOps/s | 25.2075 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_add_td | 0.2976ms | 51.9596μs | 19.2457 KOps/s | 17.8222 KOps/s | $\textbf{\color{#35bf28}+7.99\\%}$ | | test_distributed | 2.3954ms | 80.4175μs | 12.4351 KOps/s | 13.8478 KOps/s | $\textbf{\color{#d91a1a}-10.20\\%}$ | | test_tdmodule | 28.7410μs | 14.0142μs | 71.3562 KOps/s | 70.0340 KOps/s | $\color{#35bf28}+1.89\\%$ | | test_tdmodule_dispatch | 52.4720μs | 28.9793μs | 34.5074 KOps/s | 33.8203 KOps/s | $\color{#35bf28}+2.03\\%$ | | test_tdseq | 29.4510μs | 15.1652μs | 65.9402 KOps/s | 63.2779 KOps/s | $\color{#35bf28}+4.21\\%$ | | test_tdseq_dispatch | 54.1410μs | 31.0361μs | 32.2205 KOps/s | 31.0321 KOps/s | $\color{#35bf28}+3.83\\%$ | | test_instantiation_functorch | 1.5167ms | 1.3838ms | 722.6697 Ops/s | 723.4334 Ops/s | $\color{#d91a1a}-0.11\\%$ | | test_instantiation_td | 93.1720ms | 1.0923ms | 915.5340 Ops/s | 917.0857 Ops/s | $\color{#d91a1a}-0.17\\%$ | | test_exec_functorch | 0.1911ms | 0.1526ms | 6.5511 KOps/s | 6.7213 KOps/s | $\color{#d91a1a}-2.53\\%$ | | test_exec_functional_call | 0.1881ms | 0.1392ms | 7.1817 KOps/s | 7.2056 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_exec_td | 0.1862ms | 0.1395ms | 7.1666 KOps/s | 7.2983 KOps/s | $\color{#d91a1a}-1.80\\%$ | | test_exec_td_decorator | 0.8698ms | 0.2180ms | 4.5878 KOps/s | 4.7422 KOps/s | $\color{#d91a1a}-3.25\\%$ | | test_vmap_mlp_speed[True-True] | 0.6715ms | 0.5786ms | 1.7282 KOps/s | 1.7496 KOps/s | $\color{#d91a1a}-1.22\\%$ | | test_vmap_mlp_speed[True-False] | 0.6273ms | 0.5711ms | 1.7512 KOps/s | 1.7606 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_vmap_mlp_speed[False-True] | 0.8075ms | 0.5090ms | 1.9647 KOps/s | 1.9896 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_vmap_mlp_speed[False-False] | 0.5690ms | 0.5250ms | 1.9048 KOps/s | 1.9934 KOps/s | $\color{#d91a1a}-4.44\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1497ms | 0.6522ms | 1.5332 KOps/s | 1.5527 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.7948ms | 0.6505ms | 1.5372 KOps/s | 1.5630 KOps/s | $\color{#d91a1a}-1.65\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7113ms | 0.5721ms | 1.7480 KOps/s | 1.7803 KOps/s | $\color{#d91a1a}-1.81\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7259ms | 0.5742ms | 1.7416 KOps/s | 1.7783 KOps/s | $\color{#d91a1a}-2.06\\%$ | | test_vmap_transformer_speed[True-True] | 7.9880ms | 7.6457ms | 130.7921 Ops/s | 131.7235 Ops/s | $\color{#d91a1a}-0.71\\%$ | | test_vmap_transformer_speed[True-False] | 7.7911ms | 7.6193ms | 131.2449 Ops/s | 132.3836 Ops/s | $\color{#d91a1a}-0.86\\%$ | | test_vmap_transformer_speed[False-True] | 7.6966ms | 7.5725ms | 132.0577 Ops/s | 133.6274 Ops/s | $\color{#d91a1a}-1.17\\%$ | | test_vmap_transformer_speed[False-False] | 7.8333ms | 7.5586ms | 132.2989 Ops/s | 134.2927 Ops/s | $\color{#d91a1a}-1.48\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 19.4763ms | 18.8542ms | 53.0384 Ops/s | 53.7536 Ops/s | $\color{#d91a1a}-1.33\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 19.0183ms | 18.8596ms | 53.0234 Ops/s | 53.7697 Ops/s | $\color{#d91a1a}-1.39\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 20.3232ms | 18.7365ms | 53.3719 Ops/s | 54.2799 Ops/s | $\color{#d91a1a}-1.67\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 19.5609ms | 18.7892ms | 53.2220 Ops/s | 54.3067 Ops/s | $\color{#d91a1a}-2.00\\%$ | | test_to_module_speed[True] | 1.6526ms | 1.5589ms | 641.4686 Ops/s | 645.1901 Ops/s | $\color{#d91a1a}-0.58\\%$ | | test_to_module_speed[False] | 1.6374ms | 1.5386ms | 649.9312 Ops/s | 660.4961 Ops/s | $\color{#d91a1a}-1.60\\%$ | | test_tc_init | 90.7710μs | 57.4208μs | 17.4153 KOps/s | 18.7136 KOps/s | $\textbf{\color{#d91a1a}-6.94\\%}$ | | test_tc_init_nested | 0.1449ms | 0.1131ms | 8.8391 KOps/s | 9.6077 KOps/s | $\textbf{\color{#d91a1a}-8.00\\%}$ | | test_tc_first_layer_tensor | 18.5110μs | 3.9429μs | 253.6207 KOps/s | 285.8562 KOps/s | $\textbf{\color{#d91a1a}-11.28\\%}$ | | test_tc_first_layer_nontensor | 24.8800μs | 3.9837μs | 251.0216 KOps/s | 282.0045 KOps/s | $\textbf{\color{#d91a1a}-10.99\\%}$ | | test_tc_second_layer_tensor | 4.5102μs | 1.2937μs | 772.9542 KOps/s | 892.1695 KOps/s | $\textbf{\color{#d91a1a}-13.36\\%}$ | | test_tc_second_layer_nontensor | 20.9200μs | 4.5996μs | 217.4088 KOps/s | 247.7385 KOps/s | $\textbf{\color{#d91a1a}-12.24\\%}$ |