pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[Feature] Compile integration - tensorclass #867

Closed vmoens closed 1 month ago

vmoens commented 1 month ago

Stack from ghstack (oldest at bottom):

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 133. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}31$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 65.0820μs | 18.5916μs | 53.7877 KOps/s | 60.1039 KOps/s | $\textbf{\color{#d91a1a}-10.51\\%}$ | | test_plain_set_stack_nested | 42.8090μs | 18.7567μs | 53.3142 KOps/s | 59.3525 KOps/s | $\textbf{\color{#d91a1a}-10.17\\%}$ | | test_plain_set_nested_inplace | 65.8630μs | 20.6885μs | 48.3360 KOps/s | 53.9020 KOps/s | $\textbf{\color{#d91a1a}-10.33\\%}$ | | test_plain_set_stack_nested_inplace | 67.6360μs | 20.7786μs | 48.1265 KOps/s | 53.6500 KOps/s | $\textbf{\color{#d91a1a}-10.30\\%}$ | | test_items | 18.2740μs | 2.7611μs | 362.1717 KOps/s | 381.6470 KOps/s | $\textbf{\color{#d91a1a}-5.10\\%}$ | | test_items_nested | 0.7485ms | 0.3804ms | 2.6289 KOps/s | 2.6897 KOps/s | $\color{#d91a1a}-2.26\\%$ | | test_items_nested_locked | 0.6636ms | 0.3802ms | 2.6305 KOps/s | 2.7103 KOps/s | $\color{#d91a1a}-2.94\\%$ | | test_items_nested_leaf | 0.1549ms | 86.6749μs | 11.5374 KOps/s | 11.2895 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_items_stack_nested | 0.5615ms | 0.3795ms | 2.6353 KOps/s | 2.7054 KOps/s | $\color{#d91a1a}-2.59\\%$ | | test_items_stack_nested_leaf | 0.1481ms | 87.3709μs | 11.4455 KOps/s | 11.4928 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_items_stack_nested_locked | 0.5009ms | 0.3822ms | 2.6162 KOps/s | 2.7207 KOps/s | $\color{#d91a1a}-3.84\\%$ | | test_keys | 29.1040μs | 3.8382μs | 260.5408 KOps/s | 261.9368 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_keys_nested | 0.2478ms | 0.1445ms | 6.9209 KOps/s | 6.9795 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_keys_nested_locked | 2.2301ms | 0.1514ms | 6.6071 KOps/s | 6.6538 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_keys_nested_leaf | 0.2088ms | 0.1233ms | 8.1078 KOps/s | 8.1643 KOps/s | $\color{#d91a1a}-0.69\\%$ | | test_keys_stack_nested | 0.2522ms | 0.1454ms | 6.8768 KOps/s | 6.9070 KOps/s | $\color{#d91a1a}-0.44\\%$ | | test_keys_stack_nested_leaf | 0.2000ms | 0.1223ms | 8.1743 KOps/s | 8.1709 KOps/s | $\color{#35bf28}+0.04\\%$ | | test_keys_stack_nested_locked | 0.2455ms | 0.1493ms | 6.6957 KOps/s | 6.6980 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_values | 11.0473μs | 1.2008μs | 832.7770 KOps/s | 831.1121 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_values_nested | 91.2100μs | 49.4032μs | 20.2416 KOps/s | 20.2398 KOps/s | $+0.01\\%$ | | test_values_nested_locked | 0.1117ms | 49.4476μs | 20.2234 KOps/s | 20.3180 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_values_nested_leaf | 0.1171ms | 44.4841μs | 22.4799 KOps/s | 22.7340 KOps/s | $\color{#d91a1a}-1.12\\%$ | | test_values_stack_nested | 91.2800μs | 49.5008μs | 20.2017 KOps/s | 19.6624 KOps/s | $\color{#35bf28}+2.74\\%$ | | test_values_stack_nested_leaf | 96.0590μs | 44.5317μs | 22.4559 KOps/s | 22.5277 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_values_stack_nested_locked | 98.0720μs | 49.1610μs | 20.3413 KOps/s | 19.8434 KOps/s | $\color{#35bf28}+2.51\\%$ | | test_membership | 2.6950μs | 0.7607μs | 1.3146 MOps/s | 1.0976 MOps/s | $\textbf{\color{#35bf28}+19.77\\%}$ | | test_membership_nested | 37.6200μs | 2.7502μs | 363.6077 KOps/s | 373.5198 KOps/s | $\color{#d91a1a}-2.65\\%$ | | test_membership_nested_leaf | 32.2710μs | 2.7429μs | 364.5807 KOps/s | 372.8088 KOps/s | $\color{#d91a1a}-2.21\\%$ | | test_membership_stacked_nested | 37.3900μs | 3.0108μs | 332.1407 KOps/s | 370.7024 KOps/s | $\textbf{\color{#d91a1a}-10.40\\%}$ | | test_membership_stacked_nested_leaf | 19.1860μs | 2.7598μs | 362.3416 KOps/s | 356.0262 KOps/s | $\color{#35bf28}+1.77\\%$ | | test_membership_nested_last | 18.4740μs | 4.0614μs | 246.2194 KOps/s | 254.1100 KOps/s | $\color{#d91a1a}-3.11\\%$ | | test_membership_nested_leaf_last | 29.0340μs | 4.0257μs | 248.4035 KOps/s | 253.2481 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_membership_stacked_nested_last | 22.1620μs | 4.0088μs | 249.4521 KOps/s | 218.2229 KOps/s | $\textbf{\color{#35bf28}+14.31\\%}$ | | test_membership_stacked_nested_leaf_last | 22.4110μs | 4.0626μs | 246.1484 KOps/s | 216.9998 KOps/s | $\textbf{\color{#35bf28}+13.43\\%}$ | | test_nested_getleaf | 32.6910μs | 10.8991μs | 91.7511 KOps/s | 92.5735 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_nested_get | 26.9300μs | 10.4313μs | 95.8657 KOps/s | 97.7397 KOps/s | $\color{#d91a1a}-1.92\\%$ | | test_stacked_getleaf | 32.9210μs | 10.8780μs | 91.9288 KOps/s | 93.5918 KOps/s | $\color{#d91a1a}-1.78\\%$ | | test_stacked_get | 40.9100μs | 10.2341μs | 97.7123 KOps/s | 97.3055 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_nested_getitemleaf | 34.0630μs | 11.4706μs | 87.1798 KOps/s | 87.8668 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_nested_getitem | 53.1500μs | 10.5747μs | 94.5649 KOps/s | 95.6950 KOps/s | $\color{#d91a1a}-1.18\\%$ | | test_stacked_getitemleaf | 33.1520μs | 11.3990μs | 87.7272 KOps/s | 88.9261 KOps/s | $\color{#d91a1a}-1.35\\%$ | | test_stacked_getitem | 36.9800μs | 10.4942μs | 95.2906 KOps/s | 95.9104 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_lock_nested | 3.4337ms | 0.3426ms | 2.9189 KOps/s | 2.3449 KOps/s | $\textbf{\color{#35bf28}+24.48\\%}$ | | test_lock_stack_nested | 0.6520ms | 0.3173ms | 3.1514 KOps/s | 3.1935 KOps/s | $\color{#d91a1a}-1.32\\%$ | | test_unlock_nested | 0.7779ms | 0.3514ms | 2.8460 KOps/s | 2.8641 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_unlock_stack_nested | 0.5024ms | 0.3240ms | 3.0868 KOps/s | 3.1146 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_flatten_speed | 0.1900ms | 0.1061ms | 9.4294 KOps/s | 9.5097 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_unflatten_speed | 0.9207ms | 0.4362ms | 2.2923 KOps/s | 2.2979 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_common_ops | 1.4053ms | 0.7582ms | 1.3189 KOps/s | 1.3680 KOps/s | $\color{#d91a1a}-3.59\\%$ | | test_creation | 30.1960μs | 2.0287μs | 492.9289 KOps/s | 531.0669 KOps/s | $\textbf{\color{#d91a1a}-7.18\\%}$ | | test_creation_empty | 33.8530μs | 13.3057μs | 75.1555 KOps/s | 101.8889 KOps/s | $\textbf{\color{#d91a1a}-26.24\\%}$ | | test_creation_nested_1 | 84.0900μs | 15.7397μs | 63.5338 KOps/s | 80.4053 KOps/s | $\textbf{\color{#d91a1a}-20.98\\%}$ | | test_creation_nested_2 | 55.3230μs | 19.7678μs | 50.5873 KOps/s | 62.3078 KOps/s | $\textbf{\color{#d91a1a}-18.81\\%}$ | | test_clone | 56.9770μs | 13.0759μs | 76.4767 KOps/s | 74.8029 KOps/s | $\color{#35bf28}+2.24\\%$ | | test_getitem[int] | 37.4600μs | 11.4006μs | 87.7143 KOps/s | 86.2469 KOps/s | $\color{#35bf28}+1.70\\%$ | | test_getitem[slice_int] | 55.6940μs | 23.4482μs | 42.6472 KOps/s | 41.2517 KOps/s | $\color{#35bf28}+3.38\\%$ | | test_getitem[range] | 0.1199ms | 45.7682μs | 21.8492 KOps/s | 21.5263 KOps/s | $\color{#35bf28}+1.50\\%$ | | test_getitem[tuple] | 1.7024ms | 20.0699μs | 49.8258 KOps/s | 51.3352 KOps/s | $\color{#d91a1a}-2.94\\%$ | | test_getitem[list] | 0.1920ms | 39.9330μs | 25.0419 KOps/s | 23.7518 KOps/s | $\textbf{\color{#35bf28}+5.43\\%}$ | | test_setitem_dim[int] | 68.1270μs | 34.6893μs | 28.8273 KOps/s | 32.2517 KOps/s | $\textbf{\color{#d91a1a}-10.62\\%}$ | | test_setitem_dim[slice_int] | 0.1120ms | 62.6524μs | 15.9611 KOps/s | 16.5479 KOps/s | $\color{#d91a1a}-3.55\\%$ | | test_setitem_dim[range] | 0.1363ms | 82.9760μs | 12.0517 KOps/s | 12.4441 KOps/s | $\color{#d91a1a}-3.15\\%$ | | test_setitem_dim[tuple] | 76.1720μs | 51.4030μs | 19.4541 KOps/s | 20.7551 KOps/s | $\textbf{\color{#d91a1a}-6.27\\%}$ | | test_setitem | 0.1121ms | 21.2361μs | 47.0896 KOps/s | 49.9630 KOps/s | $\textbf{\color{#d91a1a}-5.75\\%}$ | | test_set | 0.1208ms | 22.0268μs | 45.3993 KOps/s | 51.7174 KOps/s | $\textbf{\color{#d91a1a}-12.22\\%}$ | | test_set_shared | 2.0433ms | 0.1436ms | 6.9654 KOps/s | 6.8760 KOps/s | $\color{#35bf28}+1.30\\%$ | | test_update | 80.8530μs | 25.0406μs | 39.9351 KOps/s | 45.4167 KOps/s | $\textbf{\color{#d91a1a}-12.07\\%}$ | | test_update_nested | 0.1266ms | 34.4847μs | 28.9983 KOps/s | 31.7628 KOps/s | $\textbf{\color{#d91a1a}-8.70\\%}$ | | test_update__nested | 75.4200μs | 24.6225μs | 40.6132 KOps/s | 39.3765 KOps/s | $\color{#35bf28}+3.14\\%$ | | test_set_nested | 84.0660μs | 22.6521μs | 44.1460 KOps/s | 46.7794 KOps/s | $\textbf{\color{#d91a1a}-5.63\\%}$ | | test_set_nested_new | 0.1259ms | 27.0501μs | 36.9685 KOps/s | 38.8068 KOps/s | $\color{#d91a1a}-4.74\\%$ | | test_select | 0.1031ms | 43.4807μs | 22.9987 KOps/s | 23.8795 KOps/s | $\color{#d91a1a}-3.69\\%$ | | test_select_nested | 0.1627ms | 62.6161μs | 15.9703 KOps/s | 16.6801 KOps/s | $\color{#d91a1a}-4.25\\%$ | | test_exclude_nested | 0.1879ms | 81.3606μs | 12.2910 KOps/s | 12.4620 KOps/s | $\color{#d91a1a}-1.37\\%$ | | test_empty[True] | 1.1172ms | 0.3491ms | 2.8643 KOps/s | 2.8978 KOps/s | $\color{#d91a1a}-1.16\\%$ | | test_empty[False] | 15.0055μs | 1.1604μs | 861.7348 KOps/s | 875.5838 KOps/s | $\color{#d91a1a}-1.58\\%$ | | test_unbind_speed | 0.3427ms | 0.2541ms | 3.9349 KOps/s | 3.8126 KOps/s | $\color{#35bf28}+3.21\\%$ | | test_unbind_speed_stack0 | 0.4614ms | 0.2546ms | 3.9275 KOps/s | 3.9303 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_unbind_speed_stack1 | 77.7239ms | 0.7453ms | 1.3418 KOps/s | 1.3696 KOps/s | $\color{#d91a1a}-2.03\\%$ | | test_split | 77.5947ms | 1.6599ms | 602.4357 Ops/s | 656.9572 Ops/s | $\textbf{\color{#d91a1a}-8.30\\%}$ | | test_chunk | 81.8193ms | 1.6689ms | 599.1989 Ops/s | 609.3530 Ops/s | $\color{#d91a1a}-1.67\\%$ | | test_creation[device0] | 3.7607ms | 86.2583μs | 11.5931 KOps/s | 11.6691 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_creation_from_tensor | 0.2298ms | 86.1900μs | 11.6023 KOps/s | 11.4605 KOps/s | $\color{#35bf28}+1.24\\%$ | | test_add_one[memmap_tensor0] | 0.1649ms | 5.2997μs | 188.6891 KOps/s | 176.1619 KOps/s | $\textbf{\color{#35bf28}+7.11\\%}$ | | test_contiguous[memmap_tensor0] | 18.6850μs | 0.6515μs | 1.5350 MOps/s | 1.5381 MOps/s | $\color{#d91a1a}-0.20\\%$ | | test_stack[memmap_tensor0] | 31.7000μs | 3.6320μs | 275.3335 KOps/s | 266.1275 KOps/s | $\color{#35bf28}+3.46\\%$ | | test_memmaptd_index | 1.0668ms | 0.2582ms | 3.8726 KOps/s | 3.8305 KOps/s | $\color{#35bf28}+1.10\\%$ | | test_memmaptd_index_astensor | 0.7063ms | 0.3343ms | 2.9916 KOps/s | 2.9774 KOps/s | $\color{#35bf28}+0.48\\%$ | | test_memmaptd_index_op | 0.9683ms | 0.6482ms | 1.5426 KOps/s | 1.6280 KOps/s | $\textbf{\color{#d91a1a}-5.24\\%}$ | | test_serialize_model | 0.1823s | 0.1093s | 9.1516 Ops/s | 9.2007 Ops/s | $\color{#d91a1a}-0.53\\%$ | | test_serialize_model_pickle | 0.4539s | 0.3771s | 2.6522 Ops/s | 2.6333 Ops/s | $\color{#35bf28}+0.71\\%$ | | test_serialize_weights | 98.3736ms | 94.3007ms | 10.6044 Ops/s | 10.1348 Ops/s | $\color{#35bf28}+4.63\\%$ | | test_serialize_weights_returnearly | 0.1923s | 0.1333s | 7.5016 Ops/s | 8.3484 Ops/s | $\textbf{\color{#d91a1a}-10.14\\%}$ | | test_serialize_weights_pickle | 1.1069s | 0.5777s | 1.7309 Ops/s | 2.4896 Ops/s | $\textbf{\color{#d91a1a}-30.48\\%}$ | | test_serialize_weights_filesystem | 0.1012s | 91.6662ms | 10.9091 Ops/s | 10.7531 Ops/s | $\color{#35bf28}+1.45\\%$ | | test_serialize_model_filesystem | 0.1735s | 0.1011s | 9.8923 Ops/s | 9.3624 Ops/s | $\textbf{\color{#35bf28}+5.66\\%}$ | | test_reshape_pytree | 55.1330μs | 25.8898μs | 38.6253 KOps/s | 37.5723 KOps/s | $\color{#35bf28}+2.80\\%$ | | test_reshape_td | 90.5190μs | 34.4857μs | 28.9975 KOps/s | 27.8897 KOps/s | $\color{#35bf28}+3.97\\%$ | | test_view_pytree | 72.4250μs | 25.6151μs | 39.0394 KOps/s | 37.6731 KOps/s | $\color{#35bf28}+3.63\\%$ | | test_view_td | 73.3870μs | 40.3232μs | 24.7996 KOps/s | 23.7863 KOps/s | $\color{#35bf28}+4.26\\%$ | | test_unbind_pytree | 60.3130μs | 29.3090μs | 34.1192 KOps/s | 32.7850 KOps/s | $\color{#35bf28}+4.07\\%$ | | test_unbind_td | 0.3723ms | 37.5560μs | 26.6269 KOps/s | 25.9841 KOps/s | $\color{#35bf28}+2.47\\%$ | | test_split_pytree | 67.3060μs | 29.7791μs | 33.5806 KOps/s | 32.8955 KOps/s | $\color{#35bf28}+2.08\\%$ | | test_split_td | 0.1246ms | 41.9532μs | 23.8361 KOps/s | 24.0933 KOps/s | $\color{#d91a1a}-1.07\\%$ | | test_add_pytree | 71.2230μs | 35.5309μs | 28.1445 KOps/s | 27.8887 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_add_td | 0.1340ms | 58.1294μs | 17.2030 KOps/s | 17.9286 KOps/s | $\color{#d91a1a}-4.05\\%$ | | test_distributed | 1.3686ms | 0.1016ms | 9.8454 KOps/s | 9.7397 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_tdmodule | 60.6640μs | 17.9968μs | 55.5655 KOps/s | 62.3366 KOps/s | $\textbf{\color{#d91a1a}-10.86\\%}$ | | test_tdmodule_dispatch | 72.4150μs | 37.4157μs | 26.7267 KOps/s | 29.8031 KOps/s | $\textbf{\color{#d91a1a}-10.32\\%}$ | | test_tdseq | 42.7700μs | 20.5315μs | 48.7056 KOps/s | 55.6603 KOps/s | $\textbf{\color{#d91a1a}-12.49\\%}$ | | test_tdseq_dispatch | 73.8980μs | 41.6080μs | 24.0339 KOps/s | 26.9129 KOps/s | $\textbf{\color{#d91a1a}-10.70\\%}$ | | test_instantiation_functorch | 5.4075ms | 1.3222ms | 756.3361 Ops/s | 757.6667 Ops/s | $\color{#d91a1a}-0.18\\%$ | | test_instantiation_td | 1.5048ms | 1.0164ms | 983.8595 Ops/s | 949.1232 Ops/s | $\color{#35bf28}+3.66\\%$ | | test_exec_functorch | 0.3722ms | 0.1626ms | 6.1487 KOps/s | 6.0141 KOps/s | $\color{#35bf28}+2.24\\%$ | | test_exec_functional_call | 0.2274ms | 0.1482ms | 6.7469 KOps/s | 6.5608 KOps/s | $\color{#35bf28}+2.84\\%$ | | test_exec_td | 0.2297ms | 0.1435ms | 6.9693 KOps/s | 6.5924 KOps/s | $\textbf{\color{#35bf28}+5.72\\%}$ | | test_exec_td_decorator | 0.8551ms | 0.2326ms | 4.2992 KOps/s | 4.0509 KOps/s | $\textbf{\color{#35bf28}+6.13\\%}$ | | test_vmap_mlp_speed[True-True] | 0.7663ms | 0.4890ms | 2.0448 KOps/s | 2.0560 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_vmap_mlp_speed[True-False] | 0.6853ms | 0.4849ms | 2.0621 KOps/s | 2.0644 KOps/s | $\color{#d91a1a}-0.11\\%$ | | test_vmap_mlp_speed[False-True] | 0.7544ms | 0.4066ms | 2.4593 KOps/s | 2.4822 KOps/s | $\color{#d91a1a}-0.92\\%$ | | test_vmap_mlp_speed[False-False] | 0.7437ms | 0.3952ms | 2.5303 KOps/s | 2.3402 KOps/s | $\textbf{\color{#35bf28}+8.12\\%}$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1540ms | 0.5755ms | 1.7377 KOps/s | 1.6921 KOps/s | $\color{#35bf28}+2.69\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8993ms | 0.5731ms | 1.7449 KOps/s | 1.7347 KOps/s | $\color{#35bf28}+0.59\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7500ms | 0.4673ms | 2.1400 KOps/s | 2.1289 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.6316ms | 0.4664ms | 2.1443 KOps/s | 2.1158 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_to_module_speed[True] | 2.8093ms | 1.8435ms | 542.4405 Ops/s | 543.1217 Ops/s | $\color{#d91a1a}-0.13\\%$ | | test_to_module_speed[False] | 1.9117ms | 1.8051ms | 553.9768 Ops/s | 549.8348 Ops/s | $\color{#35bf28}+0.75\\%$ | | test_tc_init | 0.1310ms | 70.2560μs | 14.2337 KOps/s | 18.6225 KOps/s | $\textbf{\color{#d91a1a}-23.57\\%}$ | | test_tc_init_nested | 0.2993ms | 0.1446ms | 6.9146 KOps/s | 9.1841 KOps/s | $\textbf{\color{#d91a1a}-24.71\\%}$ | | test_tc_first_layer_tensor | 28.8740μs | 9.3577μs | 106.8640 KOps/s | 121.5624 KOps/s | $\textbf{\color{#d91a1a}-12.09\\%}$ | | test_tc_first_layer_nontensor | 57.3570μs | 9.2716μs | 107.8558 KOps/s | 121.6275 KOps/s | $\textbf{\color{#d91a1a}-11.32\\%}$ | | test_tc_second_layer_tensor | 28.3430μs | 2.8760μs | 347.7003 KOps/s | 387.0628 KOps/s | $\textbf{\color{#d91a1a}-10.17\\%}$ | | test_tc_second_layer_nontensor | 55.3130μs | 10.5598μs | 94.6991 KOps/s | 109.9197 KOps/s | $\textbf{\color{#d91a1a}-13.85\\%}$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 141. Improved: $\large\color{#35bf28}40$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 0.1396ms | 11.4929μs | 87.0106 KOps/s | 71.6356 KOps/s | $\textbf{\color{#35bf28}+21.46\\%}$ | | test_plain_set_stack_nested | 35.3310μs | 11.4349μs | 87.4516 KOps/s | 71.8978 KOps/s | $\textbf{\color{#35bf28}+21.63\\%}$ | | test_plain_set_nested_inplace | 45.3400μs | 12.4320μs | 80.4374 KOps/s | 67.4402 KOps/s | $\textbf{\color{#35bf28}+19.27\\%}$ | | test_plain_set_stack_nested_inplace | 33.3410μs | 12.4423μs | 80.3708 KOps/s | 67.0966 KOps/s | $\textbf{\color{#35bf28}+19.78\\%}$ | | test_items | 32.2110μs | 4.6473μs | 215.1788 KOps/s | 215.1448 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_items_nested | 0.4458ms | 0.3970ms | 2.5189 KOps/s | 2.5564 KOps/s | $\color{#d91a1a}-1.47\\%$ | | test_items_nested_locked | 0.4557ms | 0.3954ms | 2.5291 KOps/s | 2.5224 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_items_nested_leaf | 0.1049ms | 87.1631μs | 11.4727 KOps/s | 11.5343 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_items_stack_nested | 0.4514ms | 0.3965ms | 2.5218 KOps/s | 2.5264 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_items_stack_nested_leaf | 0.1745ms | 86.8638μs | 11.5123 KOps/s | 11.2658 KOps/s | $\color{#35bf28}+2.19\\%$ | | test_items_stack_nested_locked | 0.4792ms | 0.3929ms | 2.5455 KOps/s | 2.5416 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_keys | 36.4410μs | 4.3975μs | 227.4001 KOps/s | 229.1589 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_keys_nested | 87.1120μs | 67.9027μs | 14.7270 KOps/s | 14.6064 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_keys_nested_locked | 2.2041ms | 74.7010μs | 13.3867 KOps/s | 13.4199 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_keys_nested_leaf | 82.4310μs | 59.2676μs | 16.8726 KOps/s | 16.8773 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_keys_stack_nested | 89.4020μs | 68.0908μs | 14.6863 KOps/s | 14.6491 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_keys_stack_nested_leaf | 85.5120μs | 58.8825μs | 16.9830 KOps/s | 17.1220 KOps/s | $\color{#d91a1a}-0.81\\%$ | | test_keys_stack_nested_locked | 0.1482ms | 72.6267μs | 13.7690 KOps/s | 13.5323 KOps/s | $\color{#35bf28}+1.75\\%$ | | test_values | 10.2837μs | 1.7490μs | 571.7637 KOps/s | 554.8148 KOps/s | $\color{#35bf28}+3.05\\%$ | | test_values_nested | 0.1873ms | 34.3261μs | 29.1324 KOps/s | 28.8808 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_values_nested_locked | 0.1347ms | 36.4754μs | 27.4158 KOps/s | 27.4025 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_values_nested_leaf | 50.2110μs | 30.4107μs | 32.8832 KOps/s | 32.4970 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_values_stack_nested | 0.1925ms | 34.8627μs | 28.6840 KOps/s | 28.2413 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_values_stack_nested_leaf | 0.2048ms | 31.2730μs | 31.9764 KOps/s | 31.6733 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_values_stack_nested_locked | 0.2367ms | 36.8313μs | 27.1508 KOps/s | 26.8296 KOps/s | $\color{#35bf28}+1.20\\%$ | | test_membership | 2.0416μs | 0.5367μs | 1.8633 MOps/s | 1.8778 MOps/s | $\color{#d91a1a}-0.77\\%$ | | test_membership_nested | 43.4510μs | 2.1094μs | 474.0615 KOps/s | 471.9688 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_membership_nested_leaf | 14.1255μs | 2.0584μs | 485.8163 KOps/s | 492.8074 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_membership_stacked_nested | 24.3500μs | 2.1421μs | 466.8267 KOps/s | 476.6440 KOps/s | $\color{#d91a1a}-2.06\\%$ | | test_membership_stacked_nested_leaf | 20.2010μs | 2.1212μs | 471.4375 KOps/s | 484.7750 KOps/s | $\color{#d91a1a}-2.75\\%$ | | test_membership_nested_last | 34.7900μs | 3.0277μs | 330.2891 KOps/s | 332.0043 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_membership_nested_leaf_last | 32.8320μs | 3.0499μs | 327.8809 KOps/s | 332.5329 KOps/s | $\color{#d91a1a}-1.40\\%$ | | test_membership_stacked_nested_last | 38.5200μs | 9.2106μs | 108.5708 KOps/s | 330.6513 KOps/s | $\textbf{\color{#d91a1a}-67.16\\%}$ | | test_membership_stacked_nested_leaf_last | 28.9710μs | 9.1754μs | 108.9876 KOps/s | 330.2339 KOps/s | $\textbf{\color{#d91a1a}-67.00\\%}$ | | test_nested_getleaf | 37.6910μs | 8.0547μs | 124.1510 KOps/s | 123.9900 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_nested_get | 0.1887ms | 7.5814μs | 131.9019 KOps/s | 132.2605 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_stacked_getleaf | 30.4220μs | 8.1081μs | 123.3338 KOps/s | 124.7882 KOps/s | $\color{#d91a1a}-1.17\\%$ | | test_stacked_get | 37.5710μs | 7.5860μs | 131.8211 KOps/s | 132.5042 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_nested_getitemleaf | 0.1900ms | 8.1987μs | 121.9705 KOps/s | 121.3973 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_nested_getitem | 0.1740ms | 7.7046μs | 129.7929 KOps/s | 129.3494 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_stacked_getitemleaf | 36.7610μs | 8.2103μs | 121.7978 KOps/s | 122.1267 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_stacked_getitem | 26.8110μs | 7.7533μs | 128.9767 KOps/s | 128.7514 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_lock_nested | 4.2430ms | 0.3266ms | 3.0615 KOps/s | 3.0218 KOps/s | $\color{#35bf28}+1.31\\%$ | | test_lock_stack_nested | 0.3773ms | 0.2809ms | 3.5596 KOps/s | 3.4048 KOps/s | $\color{#35bf28}+4.55\\%$ | | test_unlock_nested | 0.6917ms | 0.3260ms | 3.0678 KOps/s | 2.9892 KOps/s | $\color{#35bf28}+2.63\\%$ | | test_unlock_stack_nested | 0.3654ms | 0.2894ms | 3.4548 KOps/s | 3.3002 KOps/s | $\color{#35bf28}+4.69\\%$ | | test_flatten_speed | 0.3849ms | 0.1077ms | 9.2850 KOps/s | 9.3635 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_unflatten_speed | 0.3525ms | 0.2938ms | 3.4042 KOps/s | 3.3944 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_common_ops | 1.0609ms | 0.5236ms | 1.9099 KOps/s | 1.6245 KOps/s | $\textbf{\color{#35bf28}+17.57\\%}$ | | test_creation | 34.8210μs | 1.6197μs | 617.4079 KOps/s | 661.8707 KOps/s | $\textbf{\color{#d91a1a}-6.72\\%}$ | | test_creation_empty | 36.4810μs | 6.2309μs | 160.4916 KOps/s | 91.5987 KOps/s | $\textbf{\color{#35bf28}+75.21\\%}$ | | test_creation_nested_1 | 27.5810μs | 8.0447μs | 124.3050 KOps/s | 79.1728 KOps/s | $\textbf{\color{#35bf28}+57.00\\%}$ | | test_creation_nested_2 | 35.8900μs | 10.2795μs | 97.2808 KOps/s | 66.8390 KOps/s | $\textbf{\color{#35bf28}+45.54\\%}$ | | test_clone | 56.1010μs | 10.6438μs | 93.9512 KOps/s | 85.0308 KOps/s | $\textbf{\color{#35bf28}+10.49\\%}$ | | test_getitem[int] | 24.9200μs | 10.0509μs | 99.4941 KOps/s | 94.4713 KOps/s | $\textbf{\color{#35bf28}+5.32\\%}$ | | test_getitem[slice_int] | 49.4610μs | 19.8844μs | 50.2908 KOps/s | 48.7200 KOps/s | $\color{#35bf28}+3.22\\%$ | | test_getitem[range] | 0.1662ms | 35.6903μs | 28.0188 KOps/s | 25.0839 KOps/s | $\textbf{\color{#35bf28}+11.70\\%}$ | | test_getitem[tuple] | 48.9210μs | 17.5470μs | 56.9899 KOps/s | 56.0424 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_getitem[list] | 0.1813ms | 31.6961μs | 31.5496 KOps/s | 30.0977 KOps/s | $\color{#35bf28}+4.82\\%$ | | test_setitem_dim[int] | 41.1000μs | 21.9518μs | 45.5544 KOps/s | 34.4444 KOps/s | $\textbf{\color{#35bf28}+32.25\\%}$ | | test_setitem_dim[slice_int] | 0.1090ms | 43.3751μs | 23.0547 KOps/s | 20.0302 KOps/s | $\textbf{\color{#35bf28}+15.10\\%}$ | | test_setitem_dim[range] | 0.1295ms | 58.3050μs | 17.1512 KOps/s | 15.3802 KOps/s | $\textbf{\color{#35bf28}+11.51\\%}$ | | test_setitem_dim[tuple] | 0.1696ms | 36.8677μs | 27.1240 KOps/s | 23.1989 KOps/s | $\textbf{\color{#35bf28}+16.92\\%}$ | | test_setitem | 65.3010μs | 14.0520μs | 71.1642 KOps/s | 55.5168 KOps/s | $\textbf{\color{#35bf28}+28.19\\%}$ | | test_set | 0.1536ms | 13.5463μs | 73.8209 KOps/s | 57.1062 KOps/s | $\textbf{\color{#35bf28}+29.27\\%}$ | | test_set_shared | 2.8544ms | 95.2899μs | 10.4943 KOps/s | 10.1878 KOps/s | $\color{#35bf28}+3.01\\%$ | | test_update | 0.1364ms | 15.1567μs | 65.9776 KOps/s | 46.3604 KOps/s | $\textbf{\color{#35bf28}+42.31\\%}$ | | test_update_nested | 0.1159ms | 20.3475μs | 49.1460 KOps/s | 37.2367 KOps/s | $\textbf{\color{#35bf28}+31.98\\%}$ | | test_update__nested | 0.1585ms | 20.7936μs | 48.0918 KOps/s | 44.2953 KOps/s | $\textbf{\color{#35bf28}+8.57\\%}$ | | test_set_nested | 0.1405ms | 14.7077μs | 67.9915 KOps/s | 54.1322 KOps/s | $\textbf{\color{#35bf28}+25.60\\%}$ | | test_set_nested_new | 0.1191ms | 17.4165μs | 57.4167 KOps/s | 47.8076 KOps/s | $\textbf{\color{#35bf28}+20.10\\%}$ | | test_select | 0.1967ms | 28.5889μs | 34.9786 KOps/s | 29.8094 KOps/s | $\textbf{\color{#35bf28}+17.34\\%}$ | | test_select_nested | 0.1622ms | 53.0990μs | 18.8327 KOps/s | 18.8853 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_exclude_nested | 0.1337ms | 72.9543μs | 13.7072 KOps/s | 13.6251 KOps/s | $\color{#35bf28}+0.60\\%$ | | test_empty[True] | 0.4992ms | 0.3051ms | 3.2779 KOps/s | 3.2631 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_empty[False] | 18.6164μs | 0.8592μs | 1.1639 MOps/s | 1.1528 MOps/s | $\color{#35bf28}+0.96\\%$ | | test_to | 88.0320μs | 56.7246μs | 17.6290 KOps/s | 17.2861 KOps/s | $\color{#35bf28}+1.98\\%$ | | test_to_nonblocking | 0.1945ms | 34.5491μs | 28.9443 KOps/s | 28.3627 KOps/s | $\color{#35bf28}+2.05\\%$ | | test_unbind_speed | 0.3630ms | 0.2513ms | 3.9788 KOps/s | 3.9528 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_unbind_speed_stack0 | 0.3378ms | 0.2461ms | 4.0635 KOps/s | 3.9437 KOps/s | $\color{#35bf28}+3.04\\%$ | | test_unbind_speed_stack1 | 91.6806ms | 0.7684ms | 1.3013 KOps/s | 1.3956 KOps/s | $\textbf{\color{#d91a1a}-6.76\\%}$ | | test_split | 89.6339ms | 1.5829ms | 631.7648 Ops/s | 629.9184 Ops/s | $\color{#35bf28}+0.29\\%$ | | test_chunk | 1.5403ms | 1.4523ms | 688.5478 Ops/s | 687.2097 Ops/s | $\color{#35bf28}+0.19\\%$ | | test_creation[device0] | 0.2082ms | 55.1446μs | 18.1341 KOps/s | 17.6883 KOps/s | $\color{#35bf28}+2.52\\%$ | | test_creation_from_tensor | 0.2107ms | 51.7802μs | 19.3124 KOps/s | 17.0396 KOps/s | $\textbf{\color{#35bf28}+13.34\\%}$ | | test_add_one[memmap_tensor0] | 84.5710μs | 6.6346μs | 150.7254 KOps/s | 138.9737 KOps/s | $\textbf{\color{#35bf28}+8.46\\%}$ | | test_contiguous[memmap_tensor0] | 13.8000μs | 0.6010μs | 1.6640 MOps/s | 1.7054 MOps/s | $\color{#d91a1a}-2.42\\%$ | | test_stack[memmap_tensor0] | 29.3510μs | 4.5118μs | 221.6392 KOps/s | 205.9065 KOps/s | $\textbf{\color{#35bf28}+7.64\\%}$ | | test_memmaptd_index | 0.9138ms | 0.2506ms | 3.9910 KOps/s | 3.3157 KOps/s | $\textbf{\color{#35bf28}+20.37\\%}$ | | test_memmaptd_index_astensor | 0.6208ms | 0.3135ms | 3.1899 KOps/s | 3.0690 KOps/s | $\color{#35bf28}+3.94\\%$ | | test_memmaptd_index_op | 0.8946ms | 0.5499ms | 1.8185 KOps/s | 1.4968 KOps/s | $\textbf{\color{#35bf28}+21.49\\%}$ | | test_serialize_model | 93.9036ms | 90.6639ms | 11.0297 Ops/s | 10.3704 Ops/s | $\textbf{\color{#35bf28}+6.36\\%}$ | | test_serialize_model_pickle | 1.3683s | 1.2382s | 0.8076 Ops/s | 0.8087 Ops/s | $\color{#d91a1a}-0.13\\%$ | | test_serialize_weights | 0.1820s | 99.0749ms | 10.0934 Ops/s | 9.5156 Ops/s | $\textbf{\color{#35bf28}+6.07\\%}$ | | test_serialize_weights_returnearly | 0.3105s | 79.8463ms | 12.5241 Ops/s | 15.6341 Ops/s | $\textbf{\color{#d91a1a}-19.89\\%}$ | | test_serialize_weights_pickle | 1.3534s | 1.2482s | 0.8012 Ops/s | 0.7968 Ops/s | $\color{#35bf28}+0.55\\%$ | | test_reshape_pytree | 0.2134ms | 25.1009μs | 39.8393 KOps/s | 38.3564 KOps/s | $\color{#35bf28}+3.87\\%$ | | test_reshape_td | 0.1714ms | 30.3590μs | 32.9391 KOps/s | 31.7543 KOps/s | $\color{#35bf28}+3.73\\%$ | | test_view_pytree | 0.1895ms | 25.0471μs | 39.9248 KOps/s | 38.9289 KOps/s | $\color{#35bf28}+2.56\\%$ | | test_view_td | 0.1680ms | 37.1387μs | 26.9261 KOps/s | 26.7998 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_unbind_pytree | 0.2068ms | 30.4163μs | 32.8771 KOps/s | 32.2755 KOps/s | $\color{#35bf28}+1.86\\%$ | | test_unbind_td | 0.4378ms | 39.2734μs | 25.4625 KOps/s | 25.3157 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_split_pytree | 0.1320ms | 33.4560μs | 29.8900 KOps/s | 29.1561 KOps/s | $\color{#35bf28}+2.52\\%$ | | test_split_td | 0.1539ms | 37.0608μs | 26.9827 KOps/s | 26.4573 KOps/s | $\color{#35bf28}+1.99\\%$ | | test_add_pytree | 0.1825ms | 36.4741μs | 27.4167 KOps/s | 25.7183 KOps/s | $\textbf{\color{#35bf28}+6.60\\%}$ | | test_add_td | 99.2620μs | 43.8939μs | 22.7822 KOps/s | 17.9484 KOps/s | $\textbf{\color{#35bf28}+26.93\\%}$ | | test_distributed | 0.2146ms | 68.1992μs | 14.6629 KOps/s | 13.5374 KOps/s | $\textbf{\color{#35bf28}+8.31\\%}$ | | test_tdmodule | 0.1362ms | 12.5500μs | 79.6816 KOps/s | 63.1938 KOps/s | $\textbf{\color{#35bf28}+26.09\\%}$ | | test_tdmodule_dispatch | 40.9510μs | 25.3872μs | 39.3900 KOps/s | 31.2468 KOps/s | $\textbf{\color{#35bf28}+26.06\\%}$ | | test_tdseq | 66.3810μs | 13.7472μs | 72.7423 KOps/s | 59.2638 KOps/s | $\textbf{\color{#35bf28}+22.74\\%}$ | | test_tdseq_dispatch | 43.8410μs | 27.6454μs | 36.1724 KOps/s | 29.2711 KOps/s | $\textbf{\color{#35bf28}+23.58\\%}$ | | test_instantiation_functorch | 1.4835ms | 1.3943ms | 717.2142 Ops/s | 702.8950 Ops/s | $\color{#35bf28}+2.04\\%$ | | test_instantiation_td | 1.4672ms | 0.9732ms | 1.0275 KOps/s | 999.6086 Ops/s | $\color{#35bf28}+2.79\\%$ | | test_exec_functorch | 0.2726ms | 0.1431ms | 6.9872 KOps/s | 6.6192 KOps/s | $\textbf{\color{#35bf28}+5.56\\%}$ | | test_exec_functional_call | 0.2301ms | 0.1343ms | 7.4466 KOps/s | 7.2529 KOps/s | $\color{#35bf28}+2.67\\%$ | | test_exec_td | 0.2524ms | 0.1318ms | 7.5855 KOps/s | 7.2715 KOps/s | $\color{#35bf28}+4.32\\%$ | | test_exec_td_decorator | 0.3368ms | 0.2110ms | 4.7404 KOps/s | 4.6455 KOps/s | $\color{#35bf28}+2.04\\%$ | | test_vmap_mlp_speed[True-True] | 0.7482ms | 0.5730ms | 1.7453 KOps/s | 1.7216 KOps/s | $\color{#35bf28}+1.38\\%$ | | test_vmap_mlp_speed[True-False] | 0.7357ms | 0.5728ms | 1.7458 KOps/s | 1.7268 KOps/s | $\color{#35bf28}+1.10\\%$ | | test_vmap_mlp_speed[False-True] | 0.6770ms | 0.5117ms | 1.9544 KOps/s | 1.9661 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_vmap_mlp_speed[False-False] | 0.6718ms | 0.5133ms | 1.9483 KOps/s | 1.9409 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.0613ms | 0.6533ms | 1.5307 KOps/s | 1.5468 KOps/s | $\color{#d91a1a}-1.04\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8080ms | 0.6366ms | 1.5708 KOps/s | 1.5502 KOps/s | $\color{#35bf28}+1.33\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7457ms | 0.5688ms | 1.7581 KOps/s | 1.7465 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7352ms | 0.5696ms | 1.7556 KOps/s | 1.7108 KOps/s | $\color{#35bf28}+2.62\\%$ | | test_vmap_transformer_speed[True-True] | 7.7651ms | 7.6005ms | 131.5709 Ops/s | 130.8909 Ops/s | $\color{#35bf28}+0.52\\%$ | | test_vmap_transformer_speed[True-False] | 8.0624ms | 7.6156ms | 131.3100 Ops/s | 130.8339 Ops/s | $\color{#35bf28}+0.36\\%$ | | test_vmap_transformer_speed[False-True] | 8.4686ms | 7.8585ms | 127.2503 Ops/s | 132.2207 Ops/s | $\color{#d91a1a}-3.76\\%$ | | test_vmap_transformer_speed[False-False] | 8.0216ms | 7.7175ms | 129.5755 Ops/s | 132.2325 Ops/s | $\color{#d91a1a}-2.01\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 19.1691ms | 18.8658ms | 53.0060 Ops/s | 54.2440 Ops/s | $\color{#d91a1a}-2.28\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 19.7256ms | 18.9278ms | 52.8322 Ops/s | 54.2074 Ops/s | $\color{#d91a1a}-2.54\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 19.2277ms | 18.7272ms | 53.3984 Ops/s | 54.4142 Ops/s | $\color{#d91a1a}-1.87\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 19.3755ms | 18.7629ms | 53.2968 Ops/s | 54.4801 Ops/s | $\color{#d91a1a}-2.17\\%$ | | test_to_module_speed[True] | 2.0892ms | 1.5789ms | 633.3585 Ops/s | 643.1202 Ops/s | $\color{#d91a1a}-1.52\\%$ | | test_to_module_speed[False] | 1.9562ms | 1.5550ms | 643.0749 Ops/s | 652.2678 Ops/s | $\color{#d91a1a}-1.41\\%$ | | test_tc_init | 0.4275ms | 51.0873μs | 19.5743 KOps/s | 16.9895 KOps/s | $\textbf{\color{#35bf28}+15.21\\%}$ | | test_tc_init_nested | 0.2371ms | 0.1033ms | 9.6789 KOps/s | 8.3636 KOps/s | $\textbf{\color{#35bf28}+15.73\\%}$ | | test_tc_first_layer_tensor | 0.1225ms | 3.9968μs | 250.2031 KOps/s | 282.7272 KOps/s | $\textbf{\color{#d91a1a}-11.50\\%}$ | | test_tc_first_layer_nontensor | 0.2453ms | 4.0307μs | 248.0934 KOps/s | 281.5379 KOps/s | $\textbf{\color{#d91a1a}-11.88\\%}$ | | test_tc_second_layer_tensor | 36.2282μs | 1.3214μs | 756.7736 KOps/s | 882.2355 KOps/s | $\textbf{\color{#d91a1a}-14.22\\%}$ | | test_tc_second_layer_nontensor | 0.1757ms | 4.6268μs | 216.1311 KOps/s | 245.7605 KOps/s | $\textbf{\color{#d91a1a}-12.06\\%}$ |