pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
832 stars 74 forks source link

[Feature] Compile - tensorclass compatibility #882

Closed vmoens closed 3 months ago

vmoens commented 3 months ago

Stack from ghstack (oldest at bottom):

github-actions[bot] commented 3 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 133. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}14$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 41.1170μs | 18.5890μs | 53.7952 KOps/s | 55.4261 KOps/s | $\color{#d91a1a}-2.94\\%$ | | test_plain_set_stack_nested | 47.0280μs | 18.9038μs | 52.8994 KOps/s | 50.4211 KOps/s | $\color{#35bf28}+4.92\\%$ | | test_plain_set_nested_inplace | 67.2750μs | 20.6969μs | 48.3165 KOps/s | 49.1865 KOps/s | $\color{#d91a1a}-1.77\\%$ | | test_plain_set_stack_nested_inplace | 72.6140μs | 20.5495μs | 48.6629 KOps/s | 49.6198 KOps/s | $\color{#d91a1a}-1.93\\%$ | | test_items | 16.9610μs | 2.6396μs | 378.8460 KOps/s | 392.0118 KOps/s | $\color{#d91a1a}-3.36\\%$ | | test_items_nested | 2.2345ms | 0.3730ms | 2.6809 KOps/s | 2.6540 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_items_nested_locked | 0.5305ms | 0.3706ms | 2.6981 KOps/s | 2.7264 KOps/s | $\color{#d91a1a}-1.04\\%$ | | test_items_nested_leaf | 0.1603ms | 85.7794μs | 11.6578 KOps/s | 11.5462 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_items_stack_nested | 0.5487ms | 0.3754ms | 2.6640 KOps/s | 2.7268 KOps/s | $\color{#d91a1a}-2.30\\%$ | | test_items_stack_nested_leaf | 0.1605ms | 87.0092μs | 11.4930 KOps/s | 11.6208 KOps/s | $\color{#d91a1a}-1.10\\%$ | | test_items_stack_nested_locked | 0.8169ms | 0.3790ms | 2.6385 KOps/s | 2.7395 KOps/s | $\color{#d91a1a}-3.68\\%$ | | test_keys | 0.1105ms | 4.2346μs | 236.1494 KOps/s | 217.2154 KOps/s | $\textbf{\color{#35bf28}+8.72\\%}$ | | test_keys_nested | 0.2506ms | 0.1457ms | 6.8618 KOps/s | 6.9305 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_keys_nested_locked | 0.6740ms | 0.1511ms | 6.6198 KOps/s | 6.5846 KOps/s | $\color{#35bf28}+0.53\\%$ | | test_keys_nested_leaf | 0.2213ms | 0.1233ms | 8.1131 KOps/s | 8.1110 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_keys_stack_nested | 0.2339ms | 0.1448ms | 6.9062 KOps/s | 6.8977 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_keys_stack_nested_leaf | 0.2229ms | 0.1232ms | 8.1146 KOps/s | 8.1304 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_keys_stack_nested_locked | 0.4618ms | 0.1503ms | 6.6517 KOps/s | 6.6377 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_values | 10.6675μs | 1.1817μs | 846.2389 KOps/s | 888.8857 KOps/s | $\color{#d91a1a}-4.80\\%$ | | test_values_nested | 0.1070ms | 49.7063μs | 20.1182 KOps/s | 20.2426 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_values_nested_locked | 0.1154ms | 49.2208μs | 20.3166 KOps/s | 19.8622 KOps/s | $\color{#35bf28}+2.29\\%$ | | test_values_nested_leaf | 0.1155ms | 44.7747μs | 22.3341 KOps/s | 22.4624 KOps/s | $\color{#d91a1a}-0.57\\%$ | | test_values_stack_nested | 0.1022ms | 51.2425μs | 19.5151 KOps/s | 20.2883 KOps/s | $\color{#d91a1a}-3.81\\%$ | | test_values_stack_nested_leaf | 85.9000μs | 44.3935μs | 22.5258 KOps/s | 22.4965 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_values_stack_nested_locked | 98.8240μs | 51.4953μs | 19.4193 KOps/s | 20.3545 KOps/s | $\color{#d91a1a}-4.59\\%$ | | test_membership | 26.9700μs | 0.8916μs | 1.1215 MOps/s | 1.3998 MOps/s | $\textbf{\color{#d91a1a}-19.88\\%}$ | | test_membership_nested | 30.8380μs | 2.6648μs | 375.2613 KOps/s | 363.4403 KOps/s | $\color{#35bf28}+3.25\\%$ | | test_membership_nested_leaf | 21.0890μs | 2.7166μs | 368.1136 KOps/s | 366.9289 KOps/s | $\color{#35bf28}+0.32\\%$ | | test_membership_stacked_nested | 23.7640μs | 2.6530μs | 376.9310 KOps/s | 363.5982 KOps/s | $\color{#35bf28}+3.67\\%$ | | test_membership_stacked_nested_leaf | 18.0840μs | 2.6939μs | 371.2123 KOps/s | 368.7125 KOps/s | $\color{#35bf28}+0.68\\%$ | | test_membership_nested_last | 36.2770μs | 3.9581μs | 252.6476 KOps/s | 249.5506 KOps/s | $\color{#35bf28}+1.24\\%$ | | test_membership_nested_leaf_last | 19.7160μs | 3.9881μs | 250.7467 KOps/s | 249.7668 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_membership_stacked_nested_last | 31.2680μs | 4.5466μs | 219.9431 KOps/s | 251.7478 KOps/s | $\textbf{\color{#d91a1a}-12.63\\%}$ | | test_membership_stacked_nested_leaf_last | 52.2970μs | 4.5843μs | 218.1373 KOps/s | 249.1957 KOps/s | $\textbf{\color{#d91a1a}-12.46\\%}$ | | test_nested_getleaf | 42.0590μs | 10.9642μs | 91.2061 KOps/s | 92.3531 KOps/s | $\color{#d91a1a}-1.24\\%$ | | test_nested_get | 50.1330μs | 10.4249μs | 95.9239 KOps/s | 98.5080 KOps/s | $\color{#d91a1a}-2.62\\%$ | | test_stacked_getleaf | 36.5580μs | 10.8997μs | 91.7457 KOps/s | 93.9805 KOps/s | $\color{#d91a1a}-2.38\\%$ | | test_stacked_get | 35.0760μs | 10.2582μs | 97.4831 KOps/s | 98.9593 KOps/s | $\color{#d91a1a}-1.49\\%$ | | test_nested_getitemleaf | 30.4870μs | 11.3080μs | 88.4330 KOps/s | 88.7168 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_nested_getitem | 0.1193ms | 10.5966μs | 94.3696 KOps/s | 96.8241 KOps/s | $\color{#d91a1a}-2.54\\%$ | | test_stacked_getitemleaf | 32.7000μs | 11.3130μs | 88.3941 KOps/s | 88.8381 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_stacked_getitem | 40.2950μs | 10.3639μs | 96.4889 KOps/s | 96.7847 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_lock_nested | 0.8743ms | 0.4633ms | 2.1584 KOps/s | 2.1353 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_lock_stack_nested | 0.9225ms | 0.4314ms | 2.3183 KOps/s | 2.3052 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_unlock_nested | 0.8646ms | 0.3837ms | 2.6061 KOps/s | 2.2068 KOps/s | $\textbf{\color{#35bf28}+18.09\\%}$ | | test_unlock_stack_nested | 0.6887ms | 0.3438ms | 2.9087 KOps/s | 2.8587 KOps/s | $\color{#35bf28}+1.75\\%$ | | test_flatten_speed | 0.6324ms | 0.1051ms | 9.5174 KOps/s | 9.5568 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_unflatten_speed | 1.0108ms | 0.4433ms | 2.2556 KOps/s | 2.2891 KOps/s | $\color{#d91a1a}-1.46\\%$ | | test_common_ops | 4.8669ms | 0.8314ms | 1.2028 KOps/s | 1.2428 KOps/s | $\color{#d91a1a}-3.22\\%$ | | test_creation | 0.1032ms | 2.4176μs | 413.6366 KOps/s | 432.9047 KOps/s | $\color{#d91a1a}-4.45\\%$ | | test_creation_empty | 45.8260μs | 13.2240μs | 75.6202 KOps/s | 78.9336 KOps/s | $\color{#d91a1a}-4.20\\%$ | | test_creation_nested_1 | 58.1780μs | 16.4242μs | 60.8857 KOps/s | 62.5851 KOps/s | $\color{#d91a1a}-2.72\\%$ | | test_creation_nested_2 | 67.8560μs | 20.2307μs | 49.4298 KOps/s | 51.1970 KOps/s | $\color{#d91a1a}-3.45\\%$ | | test_clone | 0.1071ms | 13.0327μs | 76.7302 KOps/s | 74.7032 KOps/s | $\color{#35bf28}+2.71\\%$ | | test_getitem[int] | 42.6290μs | 11.7089μs | 85.4050 KOps/s | 85.6745 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_getitem[slice_int] | 59.6210μs | 24.1382μs | 41.4280 KOps/s | 42.4525 KOps/s | $\color{#d91a1a}-2.41\\%$ | | test_getitem[range] | 0.2802ms | 47.2438μs | 21.1668 KOps/s | 22.3256 KOps/s | $\textbf{\color{#d91a1a}-5.19\\%}$ | | test_getitem[tuple] | 50.0130μs | 19.8379μs | 50.4086 KOps/s | 51.9597 KOps/s | $\color{#d91a1a}-2.99\\%$ | | test_getitem[list] | 0.3610ms | 41.1971μs | 24.2735 KOps/s | 24.8605 KOps/s | $\color{#d91a1a}-2.36\\%$ | | test_setitem_dim[int] | 86.5710μs | 36.6883μs | 27.2567 KOps/s | 28.6899 KOps/s | $\color{#d91a1a}-5.00\\%$ | | test_setitem_dim[slice_int] | 0.1262ms | 64.0246μs | 15.6190 KOps/s | 15.8952 KOps/s | $\color{#d91a1a}-1.74\\%$ | | test_setitem_dim[range] | 0.1300ms | 84.1439μs | 11.8844 KOps/s | 12.0544 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_setitem_dim[tuple] | 87.9340μs | 52.1819μs | 19.1637 KOps/s | 19.3380 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_setitem | 0.1175ms | 21.0265μs | 47.5591 KOps/s | 46.9194 KOps/s | $\color{#35bf28}+1.36\\%$ | | test_set | 0.1265ms | 20.6264μs | 48.4815 KOps/s | 48.4159 KOps/s | $\color{#35bf28}+0.14\\%$ | | test_set_shared | 1.6311ms | 0.1673ms | 5.9786 KOps/s | 5.9238 KOps/s | $\color{#35bf28}+0.93\\%$ | | test_update | 0.1879ms | 24.4710μs | 40.8648 KOps/s | 41.2466 KOps/s | $\color{#d91a1a}-0.93\\%$ | | test_update_nested | 0.1455ms | 33.3237μs | 30.0087 KOps/s | 29.8121 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_update__nested | 0.1106ms | 25.1612μs | 39.7437 KOps/s | 39.5094 KOps/s | $\color{#35bf28}+0.59\\%$ | | test_set_nested | 0.1531ms | 22.7857μs | 43.8873 KOps/s | 43.8483 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_set_nested_new | 0.1343ms | 27.5675μs | 36.2747 KOps/s | 36.6207 KOps/s | $\color{#d91a1a}-0.95\\%$ | | test_select | 0.1674ms | 43.4993μs | 22.9889 KOps/s | 23.4945 KOps/s | $\color{#d91a1a}-2.15\\%$ | | test_select_nested | 0.1213ms | 60.8845μs | 16.4246 KOps/s | 16.4609 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_exclude_nested | 0.1812ms | 80.6000μs | 12.4070 KOps/s | 12.4216 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_empty[True] | 0.4670ms | 0.3469ms | 2.8827 KOps/s | 2.9137 KOps/s | $\color{#d91a1a}-1.06\\%$ | | test_empty[False] | 7.4538μs | 1.2636μs | 791.4072 KOps/s | 797.4495 KOps/s | $\color{#d91a1a}-0.76\\%$ | | test_unbind_speed | 0.5084ms | 0.2823ms | 3.5425 KOps/s | 3.5615 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_unbind_speed_stack0 | 0.4246ms | 0.2717ms | 3.6807 KOps/s | 3.5543 KOps/s | $\color{#35bf28}+3.56\\%$ | | test_unbind_speed_stack1 | 79.2189ms | 0.7640ms | 1.3090 KOps/s | 1.2839 KOps/s | $\color{#35bf28}+1.95\\%$ | | test_split | 76.2580ms | 1.6504ms | 605.9219 Ops/s | 670.2860 Ops/s | $\textbf{\color{#d91a1a}-9.60\\%}$ | | test_chunk | 77.6195ms | 1.6599ms | 602.4580 Ops/s | 619.5921 Ops/s | $\color{#d91a1a}-2.77\\%$ | | test_creation[device0] | 0.2082ms | 92.7146μs | 10.7858 KOps/s | 10.4681 KOps/s | $\color{#35bf28}+3.03\\%$ | | test_creation_from_tensor | 4.0686ms | 95.6449μs | 10.4553 KOps/s | 10.2115 KOps/s | $\color{#35bf28}+2.39\\%$ | | test_add_one[memmap_tensor0] | 0.1795ms | 5.5595μs | 179.8716 KOps/s | 182.8270 KOps/s | $\color{#d91a1a}-1.62\\%$ | | test_contiguous[memmap_tensor0] | 21.7510μs | 0.6325μs | 1.5810 MOps/s | 1.5928 MOps/s | $\color{#d91a1a}-0.75\\%$ | | test_stack[memmap_tensor0] | 44.0820μs | 3.6928μs | 270.8004 KOps/s | 276.1779 KOps/s | $\color{#d91a1a}-1.95\\%$ | | test_memmaptd_index | 1.0625ms | 0.2645ms | 3.7806 KOps/s | 3.8825 KOps/s | $\color{#d91a1a}-2.62\\%$ | | test_memmaptd_index_astensor | 0.5856ms | 0.3350ms | 2.9847 KOps/s | 3.0108 KOps/s | $\color{#d91a1a}-0.87\\%$ | | test_memmaptd_index_op | 0.9146ms | 0.6499ms | 1.5386 KOps/s | 1.5548 KOps/s | $\color{#d91a1a}-1.04\\%$ | | test_serialize_model | 0.1275s | 0.1223s | 8.1759 Ops/s | 7.1484 Ops/s | $\textbf{\color{#35bf28}+14.37\\%}$ | | test_serialize_model_pickle | 0.4453s | 0.3901s | 2.5633 Ops/s | 2.4871 Ops/s | $\color{#35bf28}+3.06\\%$ | | test_serialize_weights | 0.1967s | 0.1341s | 7.4597 Ops/s | 8.0480 Ops/s | $\textbf{\color{#d91a1a}-7.31\\%}$ | | test_serialize_weights_returnearly | 0.1862s | 0.1699s | 5.8855 Ops/s | 5.6251 Ops/s | $\color{#35bf28}+4.63\\%$ | | test_serialize_weights_pickle | 0.4779s | 0.4128s | 2.4225 Ops/s | 2.3751 Ops/s | $\color{#35bf28}+2.00\\%$ | | test_serialize_weights_filesystem | 0.1475s | 0.1435s | 6.9699 Ops/s | 7.0671 Ops/s | $\color{#d91a1a}-1.38\\%$ | | test_serialize_model_filesystem | 0.1537s | 0.1507s | 6.6370 Ops/s | 6.5676 Ops/s | $\color{#35bf28}+1.06\\%$ | | test_reshape_pytree | 95.1870μs | 25.4823μs | 39.2429 KOps/s | 38.5837 KOps/s | $\color{#35bf28}+1.71\\%$ | | test_reshape_td | 0.1244ms | 34.3210μs | 29.1366 KOps/s | 28.7214 KOps/s | $\color{#35bf28}+1.45\\%$ | | test_view_pytree | 76.5020μs | 25.6813μs | 38.9388 KOps/s | 38.3590 KOps/s | $\color{#35bf28}+1.51\\%$ | | test_view_td | 91.1700μs | 39.1584μs | 25.5373 KOps/s | 24.6672 KOps/s | $\color{#35bf28}+3.53\\%$ | | test_unbind_pytree | 0.1011ms | 29.6094μs | 33.7730 KOps/s | 33.8963 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_unbind_td | 0.3591ms | 41.2604μs | 24.2363 KOps/s | 24.1274 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_split_pytree | 80.4100μs | 29.7283μs | 33.6380 KOps/s | 33.8553 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_split_td | 0.5098ms | 42.1016μs | 23.7521 KOps/s | 24.1775 KOps/s | $\color{#d91a1a}-1.76\\%$ | | test_add_pytree | 78.5760μs | 35.3420μs | 28.2950 KOps/s | 28.4040 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_add_td | 0.1335ms | 60.2769μs | 16.5901 KOps/s | 16.7226 KOps/s | $\color{#d91a1a}-0.79\\%$ | | test_distributed | 0.2660ms | 0.1295ms | 7.7208 KOps/s | 7.4305 KOps/s | $\color{#35bf28}+3.91\\%$ | | test_tdmodule | 85.8800μs | 17.8089μs | 56.1516 KOps/s | 57.2739 KOps/s | $\color{#d91a1a}-1.96\\%$ | | test_tdmodule_dispatch | 69.1790μs | 38.2876μs | 26.1181 KOps/s | 27.8117 KOps/s | $\textbf{\color{#d91a1a}-6.09\\%}$ | | test_tdseq | 49.0720μs | 19.9053μs | 50.2378 KOps/s | 51.2352 KOps/s | $\color{#d91a1a}-1.95\\%$ | | test_tdseq_dispatch | 71.5230μs | 42.3573μs | 23.6087 KOps/s | 24.6007 KOps/s | $\color{#d91a1a}-4.03\\%$ | | test_instantiation_functorch | 2.0638ms | 1.3264ms | 753.9160 Ops/s | 737.8022 Ops/s | $\color{#35bf28}+2.18\\%$ | | test_instantiation_td | 2.4848ms | 1.0271ms | 973.6399 Ops/s | 887.9638 Ops/s | $\textbf{\color{#35bf28}+9.65\\%}$ | | test_exec_functorch | 0.3898ms | 0.1718ms | 5.8203 KOps/s | 6.1583 KOps/s | $\textbf{\color{#d91a1a}-5.49\\%}$ | | test_exec_functional_call | 0.2894ms | 0.1479ms | 6.7600 KOps/s | 6.4803 KOps/s | $\color{#35bf28}+4.32\\%$ | | test_exec_td | 0.2791ms | 0.1531ms | 6.5320 KOps/s | 6.7685 KOps/s | $\color{#d91a1a}-3.49\\%$ | | test_exec_td_decorator | 0.6850ms | 0.2356ms | 4.2454 KOps/s | 4.2881 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_vmap_mlp_speed[True-True] | 0.7499ms | 0.5006ms | 1.9977 KOps/s | 2.0141 KOps/s | $\color{#d91a1a}-0.81\\%$ | | test_vmap_mlp_speed[True-False] | 0.7471ms | 0.4962ms | 2.0155 KOps/s | 2.0369 KOps/s | $\color{#d91a1a}-1.05\\%$ | | test_vmap_mlp_speed[False-True] | 0.6883ms | 0.3987ms | 2.5082 KOps/s | 2.4655 KOps/s | $\color{#35bf28}+1.73\\%$ | | test_vmap_mlp_speed[False-False] | 0.6996ms | 0.4007ms | 2.4956 KOps/s | 2.4697 KOps/s | $\color{#35bf28}+1.05\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.2531ms | 0.5851ms | 1.7092 KOps/s | 1.7126 KOps/s | $\color{#d91a1a}-0.20\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.7661ms | 0.5817ms | 1.7191 KOps/s | 1.7084 KOps/s | $\color{#35bf28}+0.62\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7909ms | 0.4775ms | 2.0941 KOps/s | 2.0883 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7379ms | 0.4742ms | 2.1090 KOps/s | 2.0743 KOps/s | $\color{#35bf28}+1.68\\%$ | | test_to_module_speed[True] | 1.9603ms | 1.8133ms | 551.4772 Ops/s | 546.3937 Ops/s | $\color{#35bf28}+0.93\\%$ | | test_to_module_speed[False] | 2.3258ms | 1.7822ms | 561.1191 Ops/s | 550.6449 Ops/s | $\color{#35bf28}+1.90\\%$ | | test_tc_init | 0.1134ms | 45.7457μs | 21.8600 KOps/s | 25.8977 KOps/s | $\textbf{\color{#d91a1a}-15.59\\%}$ | | test_tc_init_nested | 0.1699ms | 91.1193μs | 10.9746 KOps/s | 12.8109 KOps/s | $\textbf{\color{#d91a1a}-14.33\\%}$ | | test_tc_first_layer_tensor | 58.7290μs | 9.2043μs | 108.6445 KOps/s | 120.8621 KOps/s | $\textbf{\color{#d91a1a}-10.11\\%}$ | | test_tc_first_layer_nontensor | 32.8110μs | 9.1637μs | 109.1260 KOps/s | 120.4470 KOps/s | $\textbf{\color{#d91a1a}-9.40\\%}$ | | test_tc_second_layer_tensor | 41.1960μs | 2.8368μs | 352.5154 KOps/s | 391.6604 KOps/s | $\textbf{\color{#d91a1a}-9.99\\%}$ | | test_tc_second_layer_nontensor | 51.9470μs | 10.2739μs | 97.3342 KOps/s | 106.2061 KOps/s | $\textbf{\color{#d91a1a}-8.35\\%}$ |
github-actions[bot] commented 3 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 141. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 78.4920μs | 12.5118μs | 79.9245 KOps/s | 75.7568 KOps/s | $\textbf{\color{#35bf28}+5.50\\%}$ | | test_plain_set_stack_nested | 0.1035ms | 12.6947μs | 78.7730 KOps/s | 76.0204 KOps/s | $\color{#35bf28}+3.62\\%$ | | test_plain_set_nested_inplace | 42.1010μs | 13.6621μs | 73.1953 KOps/s | 70.3745 KOps/s | $\color{#35bf28}+4.01\\%$ | | test_plain_set_stack_nested_inplace | 31.1900μs | 13.6561μs | 73.2273 KOps/s | 70.5577 KOps/s | $\color{#35bf28}+3.78\\%$ | | test_items | 21.7810μs | 4.7603μs | 210.0729 KOps/s | 209.3041 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_items_nested | 0.5054ms | 0.4024ms | 2.4848 KOps/s | 2.5246 KOps/s | $\color{#d91a1a}-1.57\\%$ | | test_items_nested_locked | 0.5935ms | 0.4034ms | 2.4787 KOps/s | 2.4946 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_items_nested_leaf | 0.2345ms | 85.8378μs | 11.6499 KOps/s | 11.5173 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_items_stack_nested | 0.4519ms | 0.4013ms | 2.4917 KOps/s | 2.5135 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_items_stack_nested_leaf | 0.2714ms | 86.2013μs | 11.6008 KOps/s | 11.4087 KOps/s | $\color{#35bf28}+1.68\\%$ | | test_items_stack_nested_locked | 0.4715ms | 0.4030ms | 2.4813 KOps/s | 2.4902 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_keys | 17.4410μs | 4.3696μs | 228.8539 KOps/s | 227.8169 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_keys_nested | 0.1047ms | 67.3765μs | 14.8420 KOps/s | 14.4676 KOps/s | $\color{#35bf28}+2.59\\%$ | | test_keys_nested_locked | 0.7625ms | 73.1832μs | 13.6643 KOps/s | 13.4555 KOps/s | $\color{#35bf28}+1.55\\%$ | | test_keys_nested_leaf | 87.1720μs | 57.5700μs | 17.3702 KOps/s | 17.3422 KOps/s | $\color{#35bf28}+0.16\\%$ | | test_keys_stack_nested | 0.1009ms | 67.8143μs | 14.7462 KOps/s | 14.8907 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_keys_stack_nested_leaf | 95.5110μs | 56.5338μs | 17.6885 KOps/s | 17.1243 KOps/s | $\color{#35bf28}+3.29\\%$ | | test_keys_stack_nested_locked | 0.1269ms | 71.6087μs | 13.9648 KOps/s | 13.5747 KOps/s | $\color{#35bf28}+2.87\\%$ | | test_values | 7.4733μs | 1.7767μs | 562.8457 KOps/s | 560.1742 KOps/s | $\color{#35bf28}+0.48\\%$ | | test_values_nested | 54.3910μs | 33.8827μs | 29.5136 KOps/s | 28.7828 KOps/s | $\color{#35bf28}+2.54\\%$ | | test_values_nested_locked | 70.5320μs | 35.7615μs | 27.9630 KOps/s | 27.1477 KOps/s | $\color{#35bf28}+3.00\\%$ | | test_values_nested_leaf | 0.1370ms | 30.0789μs | 33.2459 KOps/s | 32.1363 KOps/s | $\color{#35bf28}+3.45\\%$ | | test_values_stack_nested | 0.1301ms | 34.0329μs | 29.3833 KOps/s | 28.1075 KOps/s | $\color{#35bf28}+4.54\\%$ | | test_values_stack_nested_leaf | 51.9510μs | 30.4465μs | 32.8445 KOps/s | 31.6257 KOps/s | $\color{#35bf28}+3.85\\%$ | | test_values_stack_nested_locked | 0.1626ms | 36.0327μs | 27.7526 KOps/s | 26.8645 KOps/s | $\color{#35bf28}+3.31\\%$ | | test_membership | 1.8350μs | 0.5373μs | 1.8611 MOps/s | 1.8816 MOps/s | $\color{#d91a1a}-1.09\\%$ | | test_membership_nested | 17.0200μs | 2.0761μs | 481.6732 KOps/s | 484.0788 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_membership_nested_leaf | 9.5950μs | 2.0149μs | 496.3148 KOps/s | 498.0365 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_membership_stacked_nested | 17.9500μs | 2.0907μs | 478.2996 KOps/s | 477.3000 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_membership_stacked_nested_leaf | 16.0300μs | 2.0880μs | 478.9353 KOps/s | 483.8790 KOps/s | $\color{#d91a1a}-1.02\\%$ | | test_membership_nested_last | 17.1000μs | 3.0155μs | 331.6151 KOps/s | 334.0297 KOps/s | $\color{#d91a1a}-0.72\\%$ | | test_membership_nested_leaf_last | 0.2009ms | 2.9681μs | 336.9135 KOps/s | 331.8967 KOps/s | $\color{#35bf28}+1.51\\%$ | | test_membership_stacked_nested_last | 72.0410μs | 3.4183μs | 292.5401 KOps/s | 291.0557 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_membership_stacked_nested_leaf_last | 98.9220μs | 3.4236μs | 292.0919 KOps/s | 292.5499 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_nested_getleaf | 33.2410μs | 8.0239μs | 124.6283 KOps/s | 123.9154 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_nested_get | 19.5210μs | 7.5773μs | 131.9738 KOps/s | 131.9285 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_stacked_getleaf | 33.8000μs | 8.0396μs | 124.3845 KOps/s | 123.8000 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_stacked_get | 22.1310μs | 7.5415μs | 132.5992 KOps/s | 132.5021 KOps/s | $\color{#35bf28}+0.07\\%$ | | test_nested_getitemleaf | 22.8100μs | 8.1945μs | 122.0336 KOps/s | 122.1271 KOps/s | $\color{#d91a1a}-0.08\\%$ | | test_nested_getitem | 21.6100μs | 7.6999μs | 129.8716 KOps/s | 130.0097 KOps/s | $\color{#d91a1a}-0.11\\%$ | | test_stacked_getitemleaf | 32.6010μs | 8.2287μs | 121.5251 KOps/s | 122.3144 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_stacked_getitem | 23.0610μs | 7.6868μs | 130.0930 KOps/s | 130.3581 KOps/s | $\color{#d91a1a}-0.20\\%$ | | test_lock_nested | 9.6803ms | 0.4293ms | 2.3294 KOps/s | 2.3519 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_lock_stack_nested | 0.4621ms | 0.3890ms | 2.5707 KOps/s | 2.5536 KOps/s | $\color{#35bf28}+0.67\\%$ | | test_unlock_nested | 0.8300ms | 0.3428ms | 2.9173 KOps/s | 2.8960 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_unlock_stack_nested | 0.3727ms | 0.3105ms | 3.2208 KOps/s | 3.2075 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_flatten_speed | 0.3933ms | 0.1057ms | 9.4618 KOps/s | 9.4876 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_unflatten_speed | 0.5006ms | 0.2921ms | 3.4229 KOps/s | 3.4340 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_common_ops | 0.9497ms | 0.5658ms | 1.7674 KOps/s | 1.7029 KOps/s | $\color{#35bf28}+3.79\\%$ | | test_creation | 35.6610μs | 1.9506μs | 512.6646 KOps/s | 538.7516 KOps/s | $\color{#d91a1a}-4.84\\%$ | | test_creation_empty | 29.2600μs | 8.7541μs | 114.2317 KOps/s | 102.0831 KOps/s | $\textbf{\color{#35bf28}+11.90\\%}$ | | test_creation_nested_1 | 30.9100μs | 10.6353μs | 94.0269 KOps/s | 84.8531 KOps/s | $\textbf{\color{#35bf28}+10.81\\%}$ | | test_creation_nested_2 | 28.4600μs | 13.0172μs | 76.8217 KOps/s | 70.9614 KOps/s | $\textbf{\color{#35bf28}+8.26\\%}$ | | test_clone | 91.4620μs | 11.0417μs | 90.5658 KOps/s | 88.6636 KOps/s | $\color{#35bf28}+2.15\\%$ | | test_getitem[int] | 25.5300μs | 10.2578μs | 97.4864 KOps/s | 97.3175 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_getitem[slice_int] | 0.1213ms | 20.2695μs | 49.3351 KOps/s | 49.4720 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_getitem[range] | 0.2435ms | 37.8344μs | 26.4310 KOps/s | 26.5985 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_getitem[tuple] | 36.4500μs | 17.7440μs | 56.3571 KOps/s | 57.0150 KOps/s | $\color{#d91a1a}-1.15\\%$ | | test_getitem[list] | 0.2655ms | 32.3529μs | 30.9091 KOps/s | 30.7658 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_setitem_dim[int] | 42.1310μs | 23.6833μs | 42.2239 KOps/s | 38.2694 KOps/s | $\textbf{\color{#35bf28}+10.33\\%}$ | | test_setitem_dim[slice_int] | 73.6410μs | 45.1344μs | 22.1560 KOps/s | 20.8670 KOps/s | $\textbf{\color{#35bf28}+6.18\\%}$ | | test_setitem_dim[range] | 0.1061ms | 62.0848μs | 16.1070 KOps/s | 15.5197 KOps/s | $\color{#35bf28}+3.78\\%$ | | test_setitem_dim[tuple] | 0.1467ms | 39.2671μs | 25.4666 KOps/s | 24.5486 KOps/s | $\color{#35bf28}+3.74\\%$ | | test_setitem | 90.1010μs | 15.7266μs | 63.5866 KOps/s | 61.2210 KOps/s | $\color{#35bf28}+3.86\\%$ | | test_set | 99.3320μs | 15.1360μs | 66.0676 KOps/s | 63.8574 KOps/s | $\color{#35bf28}+3.46\\%$ | | test_set_shared | 3.0625ms | 0.1014ms | 9.8581 KOps/s | 10.2343 KOps/s | $\color{#d91a1a}-3.68\\%$ | | test_update | 98.2620μs | 17.4419μs | 57.3331 KOps/s | 52.0450 KOps/s | $\textbf{\color{#35bf28}+10.16\\%}$ | | test_update_nested | 0.1090ms | 23.1195μs | 43.2535 KOps/s | 41.7386 KOps/s | $\color{#35bf28}+3.63\\%$ | | test_update__nested | 97.7520μs | 20.4971μs | 48.7874 KOps/s | 46.7755 KOps/s | $\color{#35bf28}+4.30\\%$ | | test_set_nested | 0.1086ms | 15.9571μs | 62.6682 KOps/s | 59.8245 KOps/s | $\color{#35bf28}+4.75\\%$ | | test_set_nested_new | 99.1820μs | 18.8219μs | 53.1297 KOps/s | 51.7153 KOps/s | $\color{#35bf28}+2.73\\%$ | | test_select | 0.1159ms | 31.4979μs | 31.7481 KOps/s | 30.1584 KOps/s | $\textbf{\color{#35bf28}+5.27\\%}$ | | test_select_nested | 92.2120μs | 52.7568μs | 18.9549 KOps/s | 19.0010 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_exclude_nested | 0.1429ms | 72.4181μs | 13.8087 KOps/s | 13.7605 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_empty[True] | 0.4063ms | 0.2985ms | 3.3505 KOps/s | 3.3351 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_empty[False] | 16.9113μs | 0.9355μs | 1.0690 MOps/s | 1.0822 MOps/s | $\color{#d91a1a}-1.23\\%$ | | test_to | 89.1230μs | 59.1288μs | 16.9122 KOps/s | 17.0539 KOps/s | $\color{#d91a1a}-0.83\\%$ | | test_to_nonblocking | 0.1840ms | 36.8469μs | 27.1393 KOps/s | 26.4599 KOps/s | $\color{#35bf28}+2.57\\%$ | | test_unbind_speed | 0.3283ms | 0.2638ms | 3.7906 KOps/s | 3.7153 KOps/s | $\color{#35bf28}+2.03\\%$ | | test_unbind_speed_stack0 | 0.3610ms | 0.2634ms | 3.7966 KOps/s | 3.7670 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_unbind_speed_stack1 | 94.3900ms | 0.8021ms | 1.2467 KOps/s | 1.2368 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_split | 93.1856ms | 1.5563ms | 642.5634 Ops/s | 627.1538 Ops/s | $\color{#35bf28}+2.46\\%$ | | test_chunk | 1.5399ms | 1.4148ms | 706.8164 Ops/s | 691.2008 Ops/s | $\color{#35bf28}+2.26\\%$ | | test_creation[device0] | 0.1896ms | 55.4836μs | 18.0233 KOps/s | 17.2059 KOps/s | $\color{#35bf28}+4.75\\%$ | | test_creation_from_tensor | 0.1948ms | 55.2455μs | 18.1010 KOps/s | 18.2142 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_add_one[memmap_tensor0] | 97.6820μs | 6.8050μs | 146.9501 KOps/s | 145.9794 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_contiguous[memmap_tensor0] | 11.7800μs | 0.6052μs | 1.6524 MOps/s | 1.7039 MOps/s | $\color{#d91a1a}-3.02\\%$ | | test_stack[memmap_tensor0] | 32.2910μs | 4.3144μs | 231.7793 KOps/s | 233.3979 KOps/s | $\color{#d91a1a}-0.69\\%$ | | test_memmaptd_index | 1.0943ms | 0.2548ms | 3.9250 KOps/s | 3.7821 KOps/s | $\color{#35bf28}+3.78\\%$ | | test_memmaptd_index_astensor | 0.6243ms | 0.3188ms | 3.1366 KOps/s | 3.0705 KOps/s | $\color{#35bf28}+2.15\\%$ | | test_memmaptd_index_op | 94.1474ms | 0.6546ms | 1.5277 KOps/s | 1.6012 KOps/s | $\color{#d91a1a}-4.59\\%$ | | test_serialize_model | 94.2990ms | 90.2210ms | 11.0839 Ops/s | 10.3109 Ops/s | $\textbf{\color{#35bf28}+7.50\\%}$ | | test_serialize_model_pickle | 1.3480s | 1.2351s | 0.8096 Ops/s | 0.7186 Ops/s | $\textbf{\color{#35bf28}+12.66\\%}$ | | test_serialize_weights | 0.1864s | 99.6104ms | 10.0391 Ops/s | 10.7579 Ops/s | $\textbf{\color{#d91a1a}-6.68\\%}$ | | test_serialize_weights_returnearly | 0.2958s | 79.5005ms | 12.5785 Ops/s | 13.8962 Ops/s | $\textbf{\color{#d91a1a}-9.48\\%}$ | | test_serialize_weights_pickle | 1.3523s | 1.2487s | 0.8008 Ops/s | 0.8010 Ops/s | $\color{#d91a1a}-0.02\\%$ | | test_reshape_pytree | 0.2389ms | 25.3708μs | 39.4154 KOps/s | 39.2822 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_reshape_td | 99.1620μs | 30.7905μs | 32.4775 KOps/s | 32.8166 KOps/s | $\color{#d91a1a}-1.03\\%$ | | test_view_pytree | 0.1437ms | 25.0124μs | 39.9802 KOps/s | 39.3006 KOps/s | $\color{#35bf28}+1.73\\%$ | | test_view_td | 0.2455ms | 37.9981μs | 26.3171 KOps/s | 27.3142 KOps/s | $\color{#d91a1a}-3.65\\%$ | | test_unbind_pytree | 0.1770ms | 32.2356μs | 31.0216 KOps/s | 32.3281 KOps/s | $\color{#d91a1a}-4.04\\%$ | | test_unbind_td | 0.6119ms | 41.9748μs | 23.8238 KOps/s | 25.1636 KOps/s | $\textbf{\color{#d91a1a}-5.32\\%}$ | | test_split_pytree | 63.5910μs | 35.4142μs | 28.2373 KOps/s | 29.3353 KOps/s | $\color{#d91a1a}-3.74\\%$ | | test_split_td | 0.2516ms | 39.8755μs | 25.0781 KOps/s | 26.6280 KOps/s | $\textbf{\color{#d91a1a}-5.82\\%}$ | | test_add_pytree | 0.1660ms | 37.1620μs | 26.9092 KOps/s | 26.4577 KOps/s | $\color{#35bf28}+1.71\\%$ | | test_add_td | 0.2472ms | 46.9774μs | 21.2868 KOps/s | 20.1175 KOps/s | $\textbf{\color{#35bf28}+5.81\\%}$ | | test_distributed | 0.2681ms | 74.0026μs | 13.5130 KOps/s | 14.6156 KOps/s | $\textbf{\color{#d91a1a}-7.54\\%}$ | | test_tdmodule | 59.6210μs | 14.2998μs | 69.9310 KOps/s | 66.7691 KOps/s | $\color{#35bf28}+4.74\\%$ | | test_tdmodule_dispatch | 45.5110μs | 29.6399μs | 33.7383 KOps/s | 35.3126 KOps/s | $\color{#d91a1a}-4.46\\%$ | | test_tdseq | 0.1008ms | 15.3914μs | 64.9715 KOps/s | 65.2193 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_tdseq_dispatch | 56.8810μs | 32.2853μs | 30.9739 KOps/s | 31.6974 KOps/s | $\color{#d91a1a}-2.28\\%$ | | test_instantiation_functorch | 1.6262ms | 1.3770ms | 726.2266 Ops/s | 721.3382 Ops/s | $\color{#35bf28}+0.68\\%$ | | test_instantiation_td | 1.4452ms | 0.9739ms | 1.0268 KOps/s | 914.9997 Ops/s | $\textbf{\color{#35bf28}+12.22\\%}$ | | test_exec_functorch | 0.2546ms | 0.1444ms | 6.9269 KOps/s | 6.7972 KOps/s | $\color{#35bf28}+1.91\\%$ | | test_exec_functional_call | 0.3243ms | 0.1291ms | 7.7443 KOps/s | 7.5294 KOps/s | $\color{#35bf28}+2.85\\%$ | | test_exec_td | 0.1567ms | 0.1258ms | 7.9484 KOps/s | 7.5727 KOps/s | $\color{#35bf28}+4.96\\%$ | | test_exec_td_decorator | 0.6016ms | 0.1965ms | 5.0883 KOps/s | 4.9476 KOps/s | $\color{#35bf28}+2.84\\%$ | | test_vmap_mlp_speed[True-True] | 0.7708ms | 0.5672ms | 1.7631 KOps/s | 1.7371 KOps/s | $\color{#35bf28}+1.50\\%$ | | test_vmap_mlp_speed[True-False] | 0.8013ms | 0.5661ms | 1.7664 KOps/s | 1.7653 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_vmap_mlp_speed[False-True] | 0.7103ms | 0.5149ms | 1.9422 KOps/s | 1.9887 KOps/s | $\color{#d91a1a}-2.34\\%$ | | test_vmap_mlp_speed[False-False] | 0.7012ms | 0.4990ms | 2.0039 KOps/s | 1.9940 KOps/s | $\color{#35bf28}+0.50\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1047ms | 0.6408ms | 1.5605 KOps/s | 1.5450 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8552ms | 0.6410ms | 1.5602 KOps/s | 1.5576 KOps/s | $\color{#35bf28}+0.16\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7536ms | 0.5583ms | 1.7912 KOps/s | 1.6886 KOps/s | $\textbf{\color{#35bf28}+6.08\\%}$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7910ms | 0.5615ms | 1.7808 KOps/s | 1.7233 KOps/s | $\color{#35bf28}+3.34\\%$ | | test_vmap_transformer_speed[True-True] | 8.0670ms | 7.5948ms | 131.6694 Ops/s | 129.3257 Ops/s | $\color{#35bf28}+1.81\\%$ | | test_vmap_transformer_speed[True-False] | 7.7453ms | 7.4968ms | 133.3897 Ops/s | 129.5591 Ops/s | $\color{#35bf28}+2.96\\%$ | | test_vmap_transformer_speed[False-True] | 7.6446ms | 7.4405ms | 134.4004 Ops/s | 130.3973 Ops/s | $\color{#35bf28}+3.07\\%$ | | test_vmap_transformer_speed[False-False] | 8.0195ms | 7.5994ms | 131.5900 Ops/s | 130.5478 Ops/s | $\color{#35bf28}+0.80\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 18.9022ms | 18.5444ms | 53.9245 Ops/s | 52.3234 Ops/s | $\color{#35bf28}+3.06\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 19.0490ms | 18.6123ms | 53.7280 Ops/s | 52.4073 Ops/s | $\color{#35bf28}+2.52\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 18.6452ms | 18.3531ms | 54.4867 Ops/s | 52.9137 Ops/s | $\color{#35bf28}+2.97\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 19.0367ms | 18.3893ms | 54.3796 Ops/s | 52.9993 Ops/s | $\color{#35bf28}+2.60\\%$ | | test_to_module_speed[True] | 1.6022ms | 1.4842ms | 673.7843 Ops/s | 655.7122 Ops/s | $\color{#35bf28}+2.76\\%$ | | test_to_module_speed[False] | 1.5901ms | 1.4612ms | 684.3782 Ops/s | 662.0518 Ops/s | $\color{#35bf28}+3.37\\%$ | | test_tc_init | 54.0410μs | 35.9928μs | 27.7833 KOps/s | 28.1650 KOps/s | $\color{#d91a1a}-1.36\\%$ | | test_tc_init_nested | 0.1028ms | 71.4880μs | 13.9884 KOps/s | 14.2762 KOps/s | $\color{#d91a1a}-2.02\\%$ | | test_tc_first_layer_tensor | 18.8800μs | 3.9748μs | 251.5840 KOps/s | 277.3163 KOps/s | $\textbf{\color{#d91a1a}-9.28\\%}$ | | test_tc_first_layer_nontensor | 0.1664ms | 4.0099μs | 249.3857 KOps/s | 273.6621 KOps/s | $\textbf{\color{#d91a1a}-8.87\\%}$ | | test_tc_second_layer_tensor | 45.2413μs | 1.2895μs | 775.5228 KOps/s | 811.1156 KOps/s | $\color{#d91a1a}-4.39\\%$ | | test_tc_second_layer_nontensor | 21.0410μs | 4.5790μs | 218.3893 KOps/s | 240.6797 KOps/s | $\textbf{\color{#d91a1a}-9.26\\%}$ |