pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
821 stars 67 forks source link

[Feature] Compile integration - tensorclass #875

Closed vmoens closed 3 months ago

vmoens commented 3 months ago

Stack from ghstack (oldest at bottom):

github-actions[bot] commented 3 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 133. Improved: $\large\color{#35bf28}34$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 34.4140μs | 16.4675μs | 60.7256 KOps/s | 54.3959 KOps/s | $\textbf{\color{#35bf28}+11.64\\%}$ | | test_plain_set_stack_nested | 42.1690μs | 16.5791μs | 60.3170 KOps/s | 51.3634 KOps/s | $\textbf{\color{#35bf28}+17.43\\%}$ | | test_plain_set_nested_inplace | 52.4680μs | 18.6186μs | 53.7097 KOps/s | 49.6582 KOps/s | $\textbf{\color{#35bf28}+8.16\\%}$ | | test_plain_set_stack_nested_inplace | 49.1020μs | 18.2944μs | 54.6615 KOps/s | 49.7451 KOps/s | $\textbf{\color{#35bf28}+9.88\\%}$ | | test_items | 17.9530μs | 2.6572μs | 376.3369 KOps/s | 338.4623 KOps/s | $\textbf{\color{#35bf28}+11.19\\%}$ | | test_items_nested | 2.1908ms | 0.3741ms | 2.6731 KOps/s | 2.7370 KOps/s | $\color{#d91a1a}-2.33\\%$ | | test_items_nested_locked | 0.6586ms | 0.3733ms | 2.6787 KOps/s | 2.7114 KOps/s | $\color{#d91a1a}-1.21\\%$ | | test_items_nested_leaf | 0.1480ms | 84.4477μs | 11.8416 KOps/s | 11.9579 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_items_stack_nested | 0.6153ms | 0.3781ms | 2.6450 KOps/s | 2.7040 KOps/s | $\color{#d91a1a}-2.18\\%$ | | test_items_stack_nested_leaf | 0.1865ms | 85.4769μs | 11.6991 KOps/s | 11.5912 KOps/s | $\color{#35bf28}+0.93\\%$ | | test_items_stack_nested_locked | 0.4757ms | 0.3770ms | 2.6527 KOps/s | 2.6405 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_keys | 27.2000μs | 3.8753μs | 258.0451 KOps/s | 244.1896 KOps/s | $\textbf{\color{#35bf28}+5.67\\%}$ | | test_keys_nested | 0.3039ms | 0.1399ms | 7.1469 KOps/s | 6.6410 KOps/s | $\textbf{\color{#35bf28}+7.62\\%}$ | | test_keys_nested_locked | 1.9427ms | 0.1477ms | 6.7698 KOps/s | 6.4451 KOps/s | $\textbf{\color{#35bf28}+5.04\\%}$ | | test_keys_nested_leaf | 0.2139ms | 0.1212ms | 8.2535 KOps/s | 8.0951 KOps/s | $\color{#35bf28}+1.96\\%$ | | test_keys_stack_nested | 0.2194ms | 0.1399ms | 7.1460 KOps/s | 6.9202 KOps/s | $\color{#35bf28}+3.26\\%$ | | test_keys_stack_nested_leaf | 0.2422ms | 0.1200ms | 8.3349 KOps/s | 8.0944 KOps/s | $\color{#35bf28}+2.97\\%$ | | test_keys_stack_nested_locked | 0.8863ms | 0.1493ms | 6.6968 KOps/s | 6.7067 KOps/s | $\color{#d91a1a}-0.15\\%$ | | test_values | 8.9218μs | 1.1642μs | 858.9820 KOps/s | 873.2900 KOps/s | $\color{#d91a1a}-1.64\\%$ | | test_values_nested | 0.1245ms | 48.4571μs | 20.6368 KOps/s | 20.1631 KOps/s | $\color{#35bf28}+2.35\\%$ | | test_values_nested_locked | 85.9310μs | 48.6512μs | 20.5545 KOps/s | 20.3924 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_values_nested_leaf | 0.4502ms | 43.7510μs | 22.8566 KOps/s | 22.4894 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_values_stack_nested | 0.1026ms | 49.8221μs | 20.0714 KOps/s | 19.8942 KOps/s | $\color{#35bf28}+0.89\\%$ | | test_values_stack_nested_leaf | 84.9180μs | 43.7644μs | 22.8496 KOps/s | 22.3921 KOps/s | $\color{#35bf28}+2.04\\%$ | | test_values_stack_nested_locked | 92.8630μs | 49.7495μs | 20.1007 KOps/s | 19.8898 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_membership | 3.2741μs | 0.7261μs | 1.3773 MOps/s | 1.3336 MOps/s | $\color{#35bf28}+3.28\\%$ | | test_membership_nested | 31.3890μs | 2.6521μs | 377.0581 KOps/s | 363.4941 KOps/s | $\color{#35bf28}+3.73\\%$ | | test_membership_nested_leaf | 28.5530μs | 2.6749μs | 373.8443 KOps/s | 365.7053 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_membership_stacked_nested | 20.7590μs | 2.6469μs | 377.7968 KOps/s | 367.4779 KOps/s | $\color{#35bf28}+2.81\\%$ | | test_membership_stacked_nested_leaf | 30.3270μs | 2.6586μs | 376.1341 KOps/s | 365.5630 KOps/s | $\color{#35bf28}+2.89\\%$ | | test_membership_nested_last | 29.5750μs | 3.9438μs | 253.5655 KOps/s | 244.8806 KOps/s | $\color{#35bf28}+3.55\\%$ | | test_membership_nested_leaf_last | 31.6690μs | 3.9385μs | 253.9068 KOps/s | 243.7689 KOps/s | $\color{#35bf28}+4.16\\%$ | | test_membership_stacked_nested_last | 36.3070μs | 6.9251μs | 144.4023 KOps/s | 194.1298 KOps/s | $\textbf{\color{#d91a1a}-25.62\\%}$ | | test_membership_stacked_nested_leaf_last | 26.1790μs | 6.9358μs | 144.1798 KOps/s | 192.7246 KOps/s | $\textbf{\color{#d91a1a}-25.19\\%}$ | | test_nested_getleaf | 40.9660μs | 10.7092μs | 93.3778 KOps/s | 92.5310 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_nested_get | 98.7240μs | 10.4316μs | 95.8627 KOps/s | 97.6467 KOps/s | $\color{#d91a1a}-1.83\\%$ | | test_stacked_getleaf | 57.8980μs | 10.7478μs | 93.0426 KOps/s | 92.0996 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_stacked_get | 34.0440μs | 10.1431μs | 98.5888 KOps/s | 97.7334 KOps/s | $\color{#35bf28}+0.88\\%$ | | test_nested_getitemleaf | 37.0490μs | 11.1630μs | 89.5817 KOps/s | 88.2453 KOps/s | $\color{#35bf28}+1.51\\%$ | | test_nested_getitem | 42.0290μs | 10.3500μs | 96.6179 KOps/s | 95.9187 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_stacked_getitemleaf | 39.7140μs | 11.1574μs | 89.6267 KOps/s | 87.6762 KOps/s | $\color{#35bf28}+2.22\\%$ | | test_stacked_getitem | 37.0090μs | 10.1611μs | 98.4144 KOps/s | 96.2214 KOps/s | $\color{#35bf28}+2.28\\%$ | | test_lock_nested | 4.8729ms | 0.4569ms | 2.1885 KOps/s | 2.2508 KOps/s | $\color{#d91a1a}-2.77\\%$ | | test_lock_stack_nested | 0.7441ms | 0.4059ms | 2.4638 KOps/s | 2.4198 KOps/s | $\color{#35bf28}+1.82\\%$ | | test_unlock_nested | 0.9858ms | 0.3575ms | 2.7972 KOps/s | 2.3420 KOps/s | $\textbf{\color{#35bf28}+19.44\\%}$ | | test_unlock_stack_nested | 0.4812ms | 0.3190ms | 3.1348 KOps/s | 3.0756 KOps/s | $\color{#35bf28}+1.93\\%$ | | test_flatten_speed | 0.2066ms | 0.1039ms | 9.6241 KOps/s | 9.5839 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_unflatten_speed | 0.9735ms | 0.4361ms | 2.2929 KOps/s | 2.3088 KOps/s | $\color{#d91a1a}-0.69\\%$ | | test_common_ops | 3.7232ms | 0.7308ms | 1.3683 KOps/s | 1.2380 KOps/s | $\textbf{\color{#35bf28}+10.53\\%}$ | | test_creation | 75.4510μs | 2.4538μs | 407.5387 KOps/s | 419.6484 KOps/s | $\color{#d91a1a}-2.89\\%$ | | test_creation_empty | 35.1160μs | 9.8901μs | 101.1113 KOps/s | 77.6672 KOps/s | $\textbf{\color{#35bf28}+30.19\\%}$ | | test_creation_nested_1 | 42.6190μs | 12.8651μs | 77.7298 KOps/s | 62.6453 KOps/s | $\textbf{\color{#35bf28}+24.08\\%}$ | | test_creation_nested_2 | 47.9590μs | 16.6960μs | 59.8946 KOps/s | 49.6993 KOps/s | $\textbf{\color{#35bf28}+20.51\\%}$ | | test_clone | 60.6530μs | 13.1202μs | 76.2183 KOps/s | 76.3115 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_getitem[int] | 31.2580μs | 11.6203μs | 86.0564 KOps/s | 79.2391 KOps/s | $\textbf{\color{#35bf28}+8.60\\%}$ | | test_getitem[slice_int] | 57.2260μs | 24.2278μs | 41.2749 KOps/s | 42.8965 KOps/s | $\color{#d91a1a}-3.78\\%$ | | test_getitem[range] | 0.1718ms | 46.2485μs | 21.6223 KOps/s | 21.9731 KOps/s | $\color{#d91a1a}-1.60\\%$ | | test_getitem[tuple] | 97.8430μs | 19.5687μs | 51.1020 KOps/s | 52.0950 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_getitem[list] | 0.2269ms | 41.7929μs | 23.9275 KOps/s | 24.7115 KOps/s | $\color{#d91a1a}-3.17\\%$ | | test_setitem_dim[int] | 55.0230μs | 30.4950μs | 32.7923 KOps/s | 28.5901 KOps/s | $\textbf{\color{#35bf28}+14.70\\%}$ | | test_setitem_dim[slice_int] | 0.1183ms | 60.0659μs | 16.6484 KOps/s | 15.6814 KOps/s | $\textbf{\color{#35bf28}+6.17\\%}$ | | test_setitem_dim[range] | 0.1085ms | 78.7577μs | 12.6972 KOps/s | 11.8729 KOps/s | $\textbf{\color{#35bf28}+6.94\\%}$ | | test_setitem_dim[tuple] | 81.1610μs | 47.1335μs | 21.2163 KOps/s | 18.8615 KOps/s | $\textbf{\color{#35bf28}+12.48\\%}$ | | test_setitem | 0.1058ms | 19.1703μs | 52.1641 KOps/s | 47.2467 KOps/s | $\textbf{\color{#35bf28}+10.41\\%}$ | | test_set | 68.1170μs | 18.5267μs | 53.9761 KOps/s | 47.6846 KOps/s | $\textbf{\color{#35bf28}+13.19\\%}$ | | test_set_shared | 7.0444ms | 0.1667ms | 5.9975 KOps/s | 6.0233 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_update | 0.1276ms | 20.8326μs | 48.0017 KOps/s | 40.6608 KOps/s | $\textbf{\color{#35bf28}+18.05\\%}$ | | test_update_nested | 0.1050ms | 29.9733μs | 33.3630 KOps/s | 29.8437 KOps/s | $\textbf{\color{#35bf28}+11.79\\%}$ | | test_update__nested | 75.2800μs | 24.9176μs | 40.1324 KOps/s | 40.3055 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_set_nested | 74.0280μs | 20.3152μs | 49.2243 KOps/s | 44.0691 KOps/s | $\textbf{\color{#35bf28}+11.70\\%}$ | | test_set_nested_new | 0.1049ms | 24.6184μs | 40.6200 KOps/s | 36.4519 KOps/s | $\textbf{\color{#35bf28}+11.43\\%}$ | | test_select | 95.7880μs | 41.1570μs | 24.2972 KOps/s | 23.0992 KOps/s | $\textbf{\color{#35bf28}+5.19\\%}$ | | test_select_nested | 0.1235ms | 60.9795μs | 16.3989 KOps/s | 16.3599 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_exclude_nested | 0.1908ms | 80.9686μs | 12.3505 KOps/s | 12.3349 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_empty[True] | 0.5544ms | 0.3438ms | 2.9082 KOps/s | 2.9177 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_empty[False] | 5.9437μs | 1.2693μs | 787.8599 KOps/s | 773.7220 KOps/s | $\color{#35bf28}+1.83\\%$ | | test_unbind_speed | 0.4412ms | 0.2548ms | 3.9252 KOps/s | 3.8675 KOps/s | $\color{#35bf28}+1.49\\%$ | | test_unbind_speed_stack0 | 0.4648ms | 0.2544ms | 3.9311 KOps/s | 3.9212 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_unbind_speed_stack1 | 76.0783ms | 0.7331ms | 1.3640 KOps/s | 1.3567 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_split | 75.9069ms | 1.6332ms | 612.2780 Ops/s | 656.5086 Ops/s | $\textbf{\color{#d91a1a}-6.74\\%}$ | | test_chunk | 74.0673ms | 1.6334ms | 612.2239 Ops/s | 608.6777 Ops/s | $\color{#35bf28}+0.58\\%$ | | test_creation[device0] | 4.1376ms | 94.9705μs | 10.5296 KOps/s | 10.7713 KOps/s | $\color{#d91a1a}-2.24\\%$ | | test_creation_from_tensor | 0.2884ms | 93.7640μs | 10.6651 KOps/s | 10.3958 KOps/s | $\color{#35bf28}+2.59\\%$ | | test_add_one[memmap_tensor0] | 0.1624ms | 5.2126μs | 191.8428 KOps/s | 182.2776 KOps/s | $\textbf{\color{#35bf28}+5.25\\%}$ | | test_contiguous[memmap_tensor0] | 14.2370μs | 0.6228μs | 1.6057 MOps/s | 1.5365 MOps/s | $\color{#35bf28}+4.51\\%$ | | test_stack[memmap_tensor0] | 42.8000μs | 3.6522μs | 273.8076 KOps/s | 281.3519 KOps/s | $\color{#d91a1a}-2.68\\%$ | | test_memmaptd_index | 0.9974ms | 0.2552ms | 3.9179 KOps/s | 3.8752 KOps/s | $\color{#35bf28}+1.10\\%$ | | test_memmaptd_index_astensor | 0.7448ms | 0.3296ms | 3.0339 KOps/s | 3.0050 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_memmaptd_index_op | 0.9139ms | 0.5786ms | 1.7283 KOps/s | 1.5781 KOps/s | $\textbf{\color{#35bf28}+9.52\\%}$ | | test_serialize_model | 0.1286s | 0.1227s | 8.1478 Ops/s | 7.3120 Ops/s | $\textbf{\color{#35bf28}+11.43\\%}$ | | test_serialize_model_pickle | 0.4477s | 0.3888s | 2.5721 Ops/s | 2.5245 Ops/s | $\color{#35bf28}+1.88\\%$ | | test_serialize_weights | 0.2028s | 0.1313s | 7.6172 Ops/s | 7.8775 Ops/s | $\color{#d91a1a}-3.30\\%$ | | test_serialize_weights_returnearly | 0.1786s | 0.1632s | 6.1268 Ops/s | 6.2497 Ops/s | $\color{#d91a1a}-1.97\\%$ | | test_serialize_weights_pickle | 0.4295s | 0.3916s | 2.5534 Ops/s | 2.5352 Ops/s | $\color{#35bf28}+0.72\\%$ | | test_serialize_weights_filesystem | 0.2128s | 0.1519s | 6.5813 Ops/s | 7.0344 Ops/s | $\textbf{\color{#d91a1a}-6.44\\%}$ | | test_serialize_model_filesystem | 0.1685s | 0.1550s | 6.4535 Ops/s | 6.5609 Ops/s | $\color{#d91a1a}-1.64\\%$ | | test_reshape_pytree | 59.8310μs | 26.3917μs | 37.8906 KOps/s | 38.0700 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_reshape_td | 81.2910μs | 34.4267μs | 29.0473 KOps/s | 28.9656 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_view_pytree | 77.4540μs | 26.4394μs | 37.8223 KOps/s | 39.3375 KOps/s | $\color{#d91a1a}-3.85\\%$ | | test_view_td | 83.6460μs | 40.2862μs | 24.8224 KOps/s | 25.5403 KOps/s | $\color{#d91a1a}-2.81\\%$ | | test_unbind_pytree | 68.6380μs | 29.6763μs | 33.6969 KOps/s | 33.9664 KOps/s | $\color{#d91a1a}-0.79\\%$ | | test_unbind_td | 0.3913ms | 38.3272μs | 26.0911 KOps/s | 26.0463 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_split_pytree | 73.4570μs | 29.7918μs | 33.5662 KOps/s | 33.9039 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_split_td | 0.1226ms | 40.6437μs | 24.6040 KOps/s | 24.5988 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_add_pytree | 77.6140μs | 35.0117μs | 28.5619 KOps/s | 28.9135 KOps/s | $\color{#d91a1a}-1.22\\%$ | | test_add_td | 0.1138ms | 52.5716μs | 19.0217 KOps/s | 16.7657 KOps/s | $\textbf{\color{#35bf28}+13.46\\%}$ | | test_distributed | 0.2700ms | 0.1295ms | 7.7235 KOps/s | 7.7057 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_tdmodule | 44.5530μs | 15.8468μs | 63.1041 KOps/s | 55.8044 KOps/s | $\textbf{\color{#35bf28}+13.08\\%}$ | | test_tdmodule_dispatch | 56.8170μs | 33.7565μs | 29.6239 KOps/s | 26.8762 KOps/s | $\textbf{\color{#35bf28}+10.22\\%}$ | | test_tdseq | 36.6180μs | 17.8414μs | 56.0494 KOps/s | 49.9455 KOps/s | $\textbf{\color{#35bf28}+12.22\\%}$ | | test_tdseq_dispatch | 58.8500μs | 37.9759μs | 26.3325 KOps/s | 23.8244 KOps/s | $\textbf{\color{#35bf28}+10.53\\%}$ | | test_instantiation_functorch | 2.7347ms | 1.3333ms | 750.0026 Ops/s | 753.4207 Ops/s | $\color{#d91a1a}-0.45\\%$ | | test_instantiation_td | 1.4984ms | 1.0288ms | 971.9822 Ops/s | 970.4868 Ops/s | $\color{#35bf28}+0.15\\%$ | | test_exec_functorch | 0.2818ms | 0.1732ms | 5.7735 KOps/s | 6.1469 KOps/s | $\textbf{\color{#d91a1a}-6.08\\%}$ | | test_exec_functional_call | 0.3395ms | 0.1537ms | 6.5077 KOps/s | 6.8368 KOps/s | $\color{#d91a1a}-4.81\\%$ | | test_exec_td | 0.3545ms | 0.1468ms | 6.8118 KOps/s | 6.8109 KOps/s | $\color{#35bf28}+0.01\\%$ | | test_exec_td_decorator | 0.3283ms | 0.2330ms | 4.2925 KOps/s | 4.2590 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_vmap_mlp_speed[True-True] | 0.7862ms | 0.4725ms | 2.1165 KOps/s | 2.0319 KOps/s | $\color{#35bf28}+4.16\\%$ | | test_vmap_mlp_speed[True-False] | 0.5829ms | 0.4699ms | 2.1282 KOps/s | 1.9754 KOps/s | $\textbf{\color{#35bf28}+7.73\\%}$ | | test_vmap_mlp_speed[False-True] | 0.7064ms | 0.3896ms | 2.5668 KOps/s | 2.5162 KOps/s | $\color{#35bf28}+2.01\\%$ | | test_vmap_mlp_speed[False-False] | 0.5969ms | 0.3900ms | 2.5640 KOps/s | 2.4674 KOps/s | $\color{#35bf28}+3.92\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1133ms | 0.5615ms | 1.7810 KOps/s | 1.7233 KOps/s | $\color{#35bf28}+3.35\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8685ms | 0.5839ms | 1.7127 KOps/s | 1.7127 KOps/s | $-0.00\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7446ms | 0.4651ms | 2.1501 KOps/s | 2.1143 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8304ms | 0.4666ms | 2.1434 KOps/s | 2.1119 KOps/s | $\color{#35bf28}+1.49\\%$ | | test_to_module_speed[True] | 6.3871ms | 1.8027ms | 554.7157 Ops/s | 558.5276 Ops/s | $\color{#d91a1a}-0.68\\%$ | | test_to_module_speed[False] | 2.7958ms | 1.7675ms | 565.7557 Ops/s | 563.8900 Ops/s | $\color{#35bf28}+0.33\\%$ | | test_tc_init | 0.1419ms | 59.4660μs | 16.8163 KOps/s | 16.3882 KOps/s | $\color{#35bf28}+2.61\\%$ | | test_tc_init_nested | 0.2202ms | 0.1193ms | 8.3788 KOps/s | 8.3665 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_tc_first_layer_tensor | 36.1880μs | 9.0506μs | 110.4899 KOps/s | 122.4821 KOps/s | $\textbf{\color{#d91a1a}-9.79\\%}$ | | test_tc_first_layer_nontensor | 38.4120μs | 9.0390μs | 110.6316 KOps/s | 121.9371 KOps/s | $\textbf{\color{#d91a1a}-9.27\\%}$ | | test_tc_second_layer_tensor | 28.1620μs | 2.8081μs | 356.1105 KOps/s | 398.6720 KOps/s | $\textbf{\color{#d91a1a}-10.68\\%}$ | | test_tc_second_layer_nontensor | 33.7940μs | 10.2340μs | 97.7137 KOps/s | 108.6570 KOps/s | $\textbf{\color{#d91a1a}-10.07\\%}$ |
github-actions[bot] commented 3 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 141. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}14$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 74.4820μs | 12.5984μs | 79.3750 KOps/s | 73.7210 KOps/s | $\textbf{\color{#35bf28}+7.67\\%}$ | | test_plain_set_stack_nested | 28.6010μs | 12.6404μs | 79.1116 KOps/s | 74.2282 KOps/s | $\textbf{\color{#35bf28}+6.58\\%}$ | | test_plain_set_nested_inplace | 37.1110μs | 13.6385μs | 73.3218 KOps/s | 69.5090 KOps/s | $\textbf{\color{#35bf28}+5.49\\%}$ | | test_plain_set_stack_nested_inplace | 39.8110μs | 13.6146μs | 73.4504 KOps/s | 68.7554 KOps/s | $\textbf{\color{#35bf28}+6.83\\%}$ | | test_items | 16.7300μs | 4.5855μs | 218.0773 KOps/s | 214.8578 KOps/s | $\color{#35bf28}+1.50\\%$ | | test_items_nested | 0.4323ms | 0.3923ms | 2.5489 KOps/s | 2.5377 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_items_nested_locked | 0.4334ms | 0.3910ms | 2.5573 KOps/s | 2.5136 KOps/s | $\color{#35bf28}+1.74\\%$ | | test_items_nested_leaf | 0.1098ms | 88.1207μs | 11.3481 KOps/s | 11.5912 KOps/s | $\color{#d91a1a}-2.10\\%$ | | test_items_stack_nested | 0.4441ms | 0.3946ms | 2.5343 KOps/s | 2.5042 KOps/s | $\color{#35bf28}+1.20\\%$ | | test_items_stack_nested_leaf | 0.1168ms | 88.2095μs | 11.3366 KOps/s | 11.5388 KOps/s | $\color{#d91a1a}-1.75\\%$ | | test_items_stack_nested_locked | 0.4296ms | 0.3961ms | 2.5248 KOps/s | 2.5223 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_keys | 27.1000μs | 4.3320μs | 230.8429 KOps/s | 229.3233 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_keys_nested | 98.7020μs | 70.3751μs | 14.2096 KOps/s | 14.6762 KOps/s | $\color{#d91a1a}-3.18\\%$ | | test_keys_nested_locked | 2.2525ms | 77.1639μs | 12.9594 KOps/s | 13.4496 KOps/s | $\color{#d91a1a}-3.64\\%$ | | test_keys_nested_leaf | 82.2710μs | 60.8866μs | 16.4240 KOps/s | 17.4444 KOps/s | $\textbf{\color{#d91a1a}-5.85\\%}$ | | test_keys_stack_nested | 95.1630μs | 69.8204μs | 14.3225 KOps/s | 14.8903 KOps/s | $\color{#d91a1a}-3.81\\%$ | | test_keys_stack_nested_leaf | 84.3620μs | 60.8396μs | 16.4366 KOps/s | 16.8549 KOps/s | $\color{#d91a1a}-2.48\\%$ | | test_keys_stack_nested_locked | 98.1520μs | 76.3905μs | 13.0906 KOps/s | 13.4817 KOps/s | $\color{#d91a1a}-2.90\\%$ | | test_values | 16.4470μs | 1.7678μs | 565.6904 KOps/s | 568.7644 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_values_nested | 57.3610μs | 33.9766μs | 29.4320 KOps/s | 29.2340 KOps/s | $\color{#35bf28}+0.68\\%$ | | test_values_nested_locked | 58.5710μs | 36.1609μs | 27.6542 KOps/s | 27.7711 KOps/s | $\color{#d91a1a}-0.42\\%$ | | test_values_nested_leaf | 54.2510μs | 30.3086μs | 32.9940 KOps/s | 33.0714 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_values_stack_nested | 54.8310μs | 35.1531μs | 28.4470 KOps/s | 28.4671 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_values_stack_nested_leaf | 62.6810μs | 31.1737μs | 32.0783 KOps/s | 32.0883 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_values_stack_nested_locked | 57.1310μs | 36.9674μs | 27.0509 KOps/s | 26.9216 KOps/s | $\color{#35bf28}+0.48\\%$ | | test_membership | 1.6440μs | 0.5360μs | 1.8656 MOps/s | 1.8599 MOps/s | $\color{#35bf28}+0.31\\%$ | | test_membership_nested | 17.8900μs | 2.0852μs | 479.5598 KOps/s | 483.5996 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_membership_nested_leaf | 16.6600μs | 2.0010μs | 499.7587 KOps/s | 504.3146 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_membership_stacked_nested | 32.0210μs | 2.0984μs | 476.5454 KOps/s | 481.3177 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_membership_stacked_nested_leaf | 16.6910μs | 2.1025μs | 475.6164 KOps/s | 484.7531 KOps/s | $\color{#d91a1a}-1.88\\%$ | | test_membership_nested_last | 17.7510μs | 2.9675μs | 336.9820 KOps/s | 340.7612 KOps/s | $\color{#d91a1a}-1.11\\%$ | | test_membership_nested_leaf_last | 32.1010μs | 2.9458μs | 339.4658 KOps/s | 338.2991 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_membership_stacked_nested_last | 27.9600μs | 3.6929μs | 270.7915 KOps/s | 293.5660 KOps/s | $\textbf{\color{#d91a1a}-7.76\\%}$ | | test_membership_stacked_nested_leaf_last | 33.6710μs | 3.7244μs | 268.5018 KOps/s | 298.7595 KOps/s | $\textbf{\color{#d91a1a}-10.13\\%}$ | | test_nested_getleaf | 24.5910μs | 8.0314μs | 124.5109 KOps/s | 124.8360 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_nested_get | 24.5900μs | 7.5547μs | 132.3675 KOps/s | 133.2645 KOps/s | $\color{#d91a1a}-0.67\\%$ | | test_stacked_getleaf | 26.0510μs | 8.0461μs | 124.2836 KOps/s | 124.1100 KOps/s | $\color{#35bf28}+0.14\\%$ | | test_stacked_get | 26.6710μs | 7.5118μs | 133.1236 KOps/s | 132.6638 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_nested_getitemleaf | 38.2210μs | 8.2175μs | 121.6915 KOps/s | 122.2513 KOps/s | $\color{#d91a1a}-0.46\\%$ | | test_nested_getitem | 29.1800μs | 7.7358μs | 129.2685 KOps/s | 129.6621 KOps/s | $\color{#d91a1a}-0.30\\%$ | | test_stacked_getitemleaf | 37.3010μs | 8.2118μs | 121.7754 KOps/s | 121.8070 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_stacked_getitem | 37.9910μs | 7.7280μs | 129.3992 KOps/s | 129.6450 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_lock_nested | 4.6920ms | 0.4155ms | 2.4066 KOps/s | 2.4294 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_lock_stack_nested | 0.4314ms | 0.3810ms | 2.6250 KOps/s | 2.6330 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_unlock_nested | 89.3351ms | 0.4205ms | 2.3782 KOps/s | 3.0109 KOps/s | $\textbf{\color{#d91a1a}-21.01\\%}$ | | test_unlock_stack_nested | 0.3540ms | 0.2962ms | 3.3760 KOps/s | 3.3760 KOps/s | $+0.00\\%$ | | test_flatten_speed | 0.3629ms | 0.1093ms | 9.1511 KOps/s | 9.2122 KOps/s | $\color{#d91a1a}-0.66\\%$ | | test_unflatten_speed | 0.3387ms | 0.2961ms | 3.3776 KOps/s | 3.3873 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_common_ops | 1.0306ms | 0.5861ms | 1.7062 KOps/s | 1.6154 KOps/s | $\textbf{\color{#35bf28}+5.62\\%}$ | | test_creation | 15.9510μs | 1.9010μs | 526.0489 KOps/s | 529.2819 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_creation_empty | 28.0600μs | 9.1232μs | 109.6104 KOps/s | 94.6674 KOps/s | $\textbf{\color{#35bf28}+15.78\\%}$ | | test_creation_nested_1 | 40.9010μs | 10.8102μs | 92.5051 KOps/s | 81.5611 KOps/s | $\textbf{\color{#35bf28}+13.42\\%}$ | | test_creation_nested_2 | 27.9800μs | 13.3469μs | 74.9237 KOps/s | 67.1603 KOps/s | $\textbf{\color{#35bf28}+11.56\\%}$ | | test_clone | 78.5020μs | 10.9830μs | 91.0497 KOps/s | 91.6684 KOps/s | $\color{#d91a1a}-0.67\\%$ | | test_getitem[int] | 33.2510μs | 10.0677μs | 99.3279 KOps/s | 99.8477 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_getitem[slice_int] | 40.5910μs | 19.5475μs | 51.1574 KOps/s | 52.1048 KOps/s | $\color{#d91a1a}-1.82\\%$ | | test_getitem[range] | 0.1791ms | 36.1640μs | 27.6518 KOps/s | 25.0391 KOps/s | $\textbf{\color{#35bf28}+10.43\\%}$ | | test_getitem[tuple] | 1.8676ms | 17.4701μs | 57.2408 KOps/s | 57.7333 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_getitem[list] | 0.1590ms | 33.4109μs | 29.9304 KOps/s | 31.6215 KOps/s | $\textbf{\color{#d91a1a}-5.35\\%}$ | | test_setitem_dim[int] | 42.6810μs | 25.0366μs | 39.9416 KOps/s | 37.0336 KOps/s | $\textbf{\color{#35bf28}+7.85\\%}$ | | test_setitem_dim[slice_int] | 79.2420μs | 46.1348μs | 21.6756 KOps/s | 20.7687 KOps/s | $\color{#35bf28}+4.37\\%$ | | test_setitem_dim[range] | 83.6620μs | 61.9183μs | 16.1503 KOps/s | 14.6062 KOps/s | $\textbf{\color{#35bf28}+10.57\\%}$ | | test_setitem_dim[tuple] | 55.8610μs | 40.3312μs | 24.7947 KOps/s | 24.0932 KOps/s | $\color{#35bf28}+2.91\\%$ | | test_setitem | 59.5910μs | 15.9705μs | 62.6155 KOps/s | 62.2322 KOps/s | $\color{#35bf28}+0.62\\%$ | | test_set | 68.9520μs | 15.0249μs | 66.5564 KOps/s | 61.7203 KOps/s | $\textbf{\color{#35bf28}+7.84\\%}$ | | test_set_shared | 2.7366ms | 95.5700μs | 10.4635 KOps/s | 10.4514 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_update | 0.1063ms | 18.1598μs | 55.0668 KOps/s | 48.8038 KOps/s | $\textbf{\color{#35bf28}+12.83\\%}$ | | test_update_nested | 89.4810μs | 22.5683μs | 44.3098 KOps/s | 40.4001 KOps/s | $\textbf{\color{#35bf28}+9.68\\%}$ | | test_update__nested | 94.1620μs | 21.1115μs | 47.3675 KOps/s | 46.6766 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_set_nested | 81.0220μs | 16.1916μs | 61.7606 KOps/s | 58.5873 KOps/s | $\textbf{\color{#35bf28}+5.42\\%}$ | | test_set_nested_new | 90.6320μs | 19.3069μs | 51.7949 KOps/s | 51.0726 KOps/s | $\color{#35bf28}+1.41\\%$ | | test_select | 82.6120μs | 32.4420μs | 30.8243 KOps/s | 30.4033 KOps/s | $\color{#35bf28}+1.38\\%$ | | test_select_nested | 75.3520μs | 52.6874μs | 18.9799 KOps/s | 18.7947 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_exclude_nested | 95.2920μs | 71.5061μs | 13.9848 KOps/s | 13.6771 KOps/s | $\color{#35bf28}+2.25\\%$ | | test_empty[True] | 0.3550ms | 0.3009ms | 3.3229 KOps/s | 3.3728 KOps/s | $\color{#d91a1a}-1.48\\%$ | | test_empty[False] | 3.0090μs | 0.9326μs | 1.0722 MOps/s | 1.1136 MOps/s | $\color{#d91a1a}-3.71\\%$ | | test_to | 87.5020μs | 59.5488μs | 16.7930 KOps/s | 17.0915 KOps/s | $\color{#d91a1a}-1.75\\%$ | | test_to_nonblocking | 63.8710μs | 35.0471μs | 28.5330 KOps/s | 26.2995 KOps/s | $\textbf{\color{#35bf28}+8.49\\%}$ | | test_unbind_speed | 0.2655ms | 0.2477ms | 4.0369 KOps/s | 4.0407 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_unbind_speed_stack0 | 0.2837ms | 0.2484ms | 4.0261 KOps/s | 3.9899 KOps/s | $\color{#35bf28}+0.91\\%$ | | test_unbind_speed_stack1 | 92.7662ms | 0.8566ms | 1.1675 KOps/s | 1.4040 KOps/s | $\textbf{\color{#d91a1a}-16.85\\%}$ | | test_split | 90.8728ms | 1.5712ms | 636.4545 Ops/s | 642.1592 Ops/s | $\color{#d91a1a}-0.89\\%$ | | test_chunk | 1.4969ms | 1.4346ms | 697.0595 Ops/s | 704.3125 Ops/s | $\color{#d91a1a}-1.03\\%$ | | test_creation[device0] | 0.1278ms | 54.7625μs | 18.2607 KOps/s | 18.7428 KOps/s | $\color{#d91a1a}-2.57\\%$ | | test_creation_from_tensor | 0.1321ms | 51.8780μs | 19.2760 KOps/s | 17.7647 KOps/s | $\textbf{\color{#35bf28}+8.51\\%}$ | | test_add_one[memmap_tensor0] | 0.1077ms | 6.7908μs | 147.2573 KOps/s | 147.9224 KOps/s | $\color{#d91a1a}-0.45\\%$ | | test_contiguous[memmap_tensor0] | 23.3810μs | 0.6065μs | 1.6488 MOps/s | 1.6307 MOps/s | $\color{#35bf28}+1.11\\%$ | | test_stack[memmap_tensor0] | 31.8510μs | 4.5465μs | 219.9490 KOps/s | 220.4937 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_memmaptd_index | 1.1039ms | 0.2555ms | 3.9132 KOps/s | 4.0099 KOps/s | $\color{#d91a1a}-2.41\\%$ | | test_memmaptd_index_astensor | 0.5867ms | 0.3199ms | 3.1262 KOps/s | 3.1639 KOps/s | $\color{#d91a1a}-1.19\\%$ | | test_memmaptd_index_op | 0.8546ms | 0.5994ms | 1.6683 KOps/s | 1.6078 KOps/s | $\color{#35bf28}+3.76\\%$ | | test_serialize_model | 0.1875s | 0.1010s | 9.8999 Ops/s | 10.4354 Ops/s | $\textbf{\color{#d91a1a}-5.13\\%}$ | | test_serialize_model_pickle | 1.3672s | 1.2379s | 0.8078 Ops/s | 0.8088 Ops/s | $\color{#d91a1a}-0.11\\%$ | | test_serialize_weights | 0.1814s | 98.2829ms | 10.1747 Ops/s | 10.7707 Ops/s | $\textbf{\color{#d91a1a}-5.53\\%}$ | | test_serialize_weights_returnearly | 73.5301ms | 64.8370ms | 15.4233 Ops/s | 15.5434 Ops/s | $\color{#d91a1a}-0.77\\%$ | | test_serialize_weights_pickle | 1.4604s | 1.2611s | 0.7930 Ops/s | 0.8012 Ops/s | $\color{#d91a1a}-1.02\\%$ | | test_reshape_pytree | 0.2634ms | 24.8794μs | 40.1940 KOps/s | 40.6798 KOps/s | $\color{#d91a1a}-1.19\\%$ | | test_reshape_td | 52.5910μs | 29.9256μs | 33.4162 KOps/s | 33.1813 KOps/s | $\color{#35bf28}+0.71\\%$ | | test_view_pytree | 50.3110μs | 24.6542μs | 40.5611 KOps/s | 40.5705 KOps/s | $\color{#d91a1a}-0.02\\%$ | | test_view_td | 0.2594ms | 35.9623μs | 27.8069 KOps/s | 26.9417 KOps/s | $\color{#35bf28}+3.21\\%$ | | test_unbind_pytree | 50.7210μs | 30.2011μs | 33.1114 KOps/s | 33.1899 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_unbind_td | 0.4432ms | 38.1285μs | 26.2271 KOps/s | 26.6465 KOps/s | $\color{#d91a1a}-1.57\\%$ | | test_split_pytree | 0.1705ms | 33.6893μs | 29.6830 KOps/s | 30.4113 KOps/s | $\color{#d91a1a}-2.39\\%$ | | test_split_td | 92.9831ms | 44.6554μs | 22.3937 KOps/s | 27.8826 KOps/s | $\textbf{\color{#d91a1a}-19.69\\%}$ | | test_add_pytree | 76.8610μs | 36.4327μs | 27.4479 KOps/s | 27.1033 KOps/s | $\color{#35bf28}+1.27\\%$ | | test_add_td | 0.2608ms | 49.5496μs | 20.1818 KOps/s | 17.8575 KOps/s | $\textbf{\color{#35bf28}+13.02\\%}$ | | test_distributed | 3.9420ms | 86.4146μs | 11.5721 KOps/s | 14.0893 KOps/s | $\textbf{\color{#d91a1a}-17.87\\%}$ | | test_tdmodule | 29.5010μs | 14.4363μs | 69.2697 KOps/s | 67.2857 KOps/s | $\color{#35bf28}+2.95\\%$ | | test_tdmodule_dispatch | 45.4910μs | 28.3829μs | 35.2325 KOps/s | 33.1208 KOps/s | $\textbf{\color{#35bf28}+6.38\\%}$ | | test_tdseq | 30.8410μs | 15.2621μs | 65.5220 KOps/s | 62.1846 KOps/s | $\textbf{\color{#35bf28}+5.37\\%}$ | | test_tdseq_dispatch | 50.9910μs | 31.5380μs | 31.7078 KOps/s | 29.9642 KOps/s | $\textbf{\color{#35bf28}+5.82\\%}$ | | test_instantiation_functorch | 1.5700ms | 1.3709ms | 729.4500 Ops/s | 733.1496 Ops/s | $\color{#d91a1a}-0.50\\%$ | | test_instantiation_td | 1.4614ms | 0.9958ms | 1.0042 KOps/s | 1.0149 KOps/s | $\color{#d91a1a}-1.05\\%$ | | test_exec_functorch | 0.1873ms | 0.1451ms | 6.8934 KOps/s | 6.8541 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_exec_functional_call | 0.1558ms | 0.1350ms | 7.4059 KOps/s | 7.3587 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_exec_td | 0.1691ms | 0.1347ms | 7.4241 KOps/s | 7.4472 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_exec_td_decorator | 0.3145ms | 0.2104ms | 4.7522 KOps/s | 4.8443 KOps/s | $\color{#d91a1a}-1.90\\%$ | | test_vmap_mlp_speed[True-True] | 0.6893ms | 0.5833ms | 1.7143 KOps/s | 1.7060 KOps/s | $\color{#35bf28}+0.49\\%$ | | test_vmap_mlp_speed[True-False] | 0.7373ms | 0.5813ms | 1.7203 KOps/s | 1.7112 KOps/s | $\color{#35bf28}+0.53\\%$ | | test_vmap_mlp_speed[False-True] | 0.6440ms | 0.5128ms | 1.9500 KOps/s | 1.9373 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_vmap_mlp_speed[False-False] | 0.5951ms | 0.5123ms | 1.9519 KOps/s | 1.9411 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.2816ms | 0.6595ms | 1.5162 KOps/s | 1.5131 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8287ms | 0.6584ms | 1.5187 KOps/s | 1.5193 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7266ms | 0.5765ms | 1.7346 KOps/s | 1.7284 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.6720ms | 0.5778ms | 1.7306 KOps/s | 1.7296 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_vmap_transformer_speed[True-True] | 7.9990ms | 7.7125ms | 129.6598 Ops/s | 128.4310 Ops/s | $\color{#35bf28}+0.96\\%$ | | test_vmap_transformer_speed[True-False] | 7.9269ms | 7.6968ms | 129.9246 Ops/s | 129.5687 Ops/s | $\color{#35bf28}+0.27\\%$ | | test_vmap_transformer_speed[False-True] | 7.9030ms | 7.6454ms | 130.7979 Ops/s | 130.3464 Ops/s | $\color{#35bf28}+0.35\\%$ | | test_vmap_transformer_speed[False-False] | 8.0197ms | 7.6382ms | 130.9209 Ops/s | 130.2008 Ops/s | $\color{#35bf28}+0.55\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 19.3936ms | 19.0981ms | 52.3613 Ops/s | 52.2442 Ops/s | $\color{#35bf28}+0.22\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 19.2001ms | 19.0763ms | 52.4209 Ops/s | 52.2215 Ops/s | $\color{#35bf28}+0.38\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 19.5586ms | 18.9562ms | 52.7532 Ops/s | 52.7864 Ops/s | $\color{#d91a1a}-0.06\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 19.3133ms | 18.9471ms | 52.7785 Ops/s | 52.8875 Ops/s | $\color{#d91a1a}-0.21\\%$ | | test_to_module_speed[True] | 1.6976ms | 1.5580ms | 641.8566 Ops/s | 648.5742 Ops/s | $\color{#d91a1a}-1.04\\%$ | | test_to_module_speed[False] | 1.6754ms | 1.5600ms | 641.0359 Ops/s | 651.7286 Ops/s | $\color{#d91a1a}-1.64\\%$ | | test_tc_init | 79.7810μs | 54.8136μs | 18.2436 KOps/s | 18.1856 KOps/s | $\color{#35bf28}+0.32\\%$ | | test_tc_init_nested | 0.1469ms | 0.1099ms | 9.0975 KOps/s | 9.3950 KOps/s | $\color{#d91a1a}-3.17\\%$ | | test_tc_first_layer_tensor | 15.9000μs | 3.9797μs | 251.2732 KOps/s | 282.4882 KOps/s | $\textbf{\color{#d91a1a}-11.05\\%}$ | | test_tc_first_layer_nontensor | 18.3910μs | 4.0094μs | 249.4161 KOps/s | 280.5995 KOps/s | $\textbf{\color{#d91a1a}-11.11\\%}$ | | test_tc_second_layer_tensor | 6.8027μs | 1.2929μs | 773.4569 KOps/s | 904.3410 KOps/s | $\textbf{\color{#d91a1a}-14.47\\%}$ | | test_tc_second_layer_nontensor | 20.9400μs | 4.6011μs | 217.3372 KOps/s | 246.2485 KOps/s | $\textbf{\color{#d91a1a}-11.74\\%}$ |