pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[Feature] Compile integration - basics #873

Closed vmoens closed 1 month ago

vmoens commented 1 month ago

Stack from ghstack (oldest at bottom):

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 133. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}34$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 32.8720μs | 17.4771μs | 57.2177 KOps/s | 61.2515 KOps/s | $\textbf{\color{#d91a1a}-6.59\\%}$ | | test_plain_set_stack_nested | 45.8550μs | 17.7098μs | 56.4658 KOps/s | 60.8682 KOps/s | $\textbf{\color{#d91a1a}-7.23\\%}$ | | test_plain_set_nested_inplace | 53.5310μs | 19.0008μs | 52.6294 KOps/s | 53.2167 KOps/s | $\color{#d91a1a}-1.10\\%$ | | test_plain_set_stack_nested_inplace | 66.8660μs | 19.3887μs | 51.5763 KOps/s | 53.6166 KOps/s | $\color{#d91a1a}-3.81\\%$ | | test_items | 28.7840μs | 2.6121μs | 382.8330 KOps/s | 376.4148 KOps/s | $\color{#35bf28}+1.71\\%$ | | test_items_nested | 1.4008ms | 0.3700ms | 2.7030 KOps/s | 3.6909 KOps/s | $\textbf{\color{#d91a1a}-26.76\\%}$ | | test_items_nested_locked | 0.7319ms | 0.3633ms | 2.7523 KOps/s | 3.6402 KOps/s | $\textbf{\color{#d91a1a}-24.39\\%}$ | | test_items_nested_leaf | 0.1640ms | 86.7041μs | 11.5335 KOps/s | 12.7101 KOps/s | $\textbf{\color{#d91a1a}-9.26\\%}$ | | test_items_stack_nested | 0.5215ms | 0.3646ms | 2.7426 KOps/s | 3.6162 KOps/s | $\textbf{\color{#d91a1a}-24.16\\%}$ | | test_items_stack_nested_leaf | 0.1636ms | 89.2254μs | 11.2076 KOps/s | 12.3519 KOps/s | $\textbf{\color{#d91a1a}-9.26\\%}$ | | test_items_stack_nested_locked | 0.5178ms | 0.3588ms | 2.7868 KOps/s | 3.5474 KOps/s | $\textbf{\color{#d91a1a}-21.44\\%}$ | | test_keys | 32.8310μs | 3.9775μs | 251.4161 KOps/s | 253.0892 KOps/s | $\color{#d91a1a}-0.66\\%$ | | test_keys_nested | 0.3124ms | 0.1451ms | 6.8941 KOps/s | 7.2917 KOps/s | $\textbf{\color{#d91a1a}-5.45\\%}$ | | test_keys_nested_locked | 0.8174ms | 0.1506ms | 6.6418 KOps/s | 7.0151 KOps/s | $\textbf{\color{#d91a1a}-5.32\\%}$ | | test_keys_nested_leaf | 0.4746ms | 0.1268ms | 7.8874 KOps/s | 8.5663 KOps/s | $\textbf{\color{#d91a1a}-7.92\\%}$ | | test_keys_stack_nested | 0.2724ms | 0.1462ms | 6.8393 KOps/s | 7.2285 KOps/s | $\textbf{\color{#d91a1a}-5.38\\%}$ | | test_keys_stack_nested_leaf | 0.2043ms | 0.1249ms | 8.0070 KOps/s | 8.5452 KOps/s | $\textbf{\color{#d91a1a}-6.30\\%}$ | | test_keys_stack_nested_locked | 0.2862ms | 0.1509ms | 6.6267 KOps/s | 7.0682 KOps/s | $\textbf{\color{#d91a1a}-6.25\\%}$ | | test_values | 20.5946μs | 1.1486μs | 870.6495 KOps/s | 848.5147 KOps/s | $\color{#35bf28}+2.61\\%$ | | test_values_nested | 0.1446ms | 49.9872μs | 20.0051 KOps/s | 19.7542 KOps/s | $\color{#35bf28}+1.27\\%$ | | test_values_nested_locked | 0.1039ms | 50.1024μs | 19.9591 KOps/s | 19.7321 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_values_nested_leaf | 0.1142ms | 45.7962μs | 21.8359 KOps/s | 21.8871 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_values_stack_nested | 0.1101ms | 50.4643μs | 19.8160 KOps/s | 19.4860 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_values_stack_nested_leaf | 83.8570μs | 45.0547μs | 22.1953 KOps/s | 21.8376 KOps/s | $\color{#35bf28}+1.64\\%$ | | test_values_stack_nested_locked | 0.1053ms | 50.8842μs | 19.6525 KOps/s | 19.4347 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_membership | 14.8780μs | 0.9195μs | 1.0876 MOps/s | 717.4081 KOps/s | $\textbf{\color{#35bf28}+51.60\\%}$ | | test_membership_nested | 67.1760μs | 2.8021μs | 356.8765 KOps/s | 279.8171 KOps/s | $\textbf{\color{#35bf28}+27.54\\%}$ | | test_membership_nested_leaf | 35.2460μs | 2.8077μs | 356.1680 KOps/s | 268.0623 KOps/s | $\textbf{\color{#35bf28}+32.87\\%}$ | | test_membership_stacked_nested | 25.5180μs | 2.8331μs | 352.9747 KOps/s | 294.9379 KOps/s | $\textbf{\color{#35bf28}+19.68\\%}$ | | test_membership_stacked_nested_leaf | 23.0230μs | 2.7983μs | 357.3594 KOps/s | 290.3922 KOps/s | $\textbf{\color{#35bf28}+23.06\\%}$ | | test_membership_nested_last | 68.1170μs | 4.2544μs | 235.0524 KOps/s | 240.9788 KOps/s | $\color{#d91a1a}-2.46\\%$ | | test_membership_nested_leaf_last | 34.1440μs | 4.0892μs | 244.5448 KOps/s | 240.2106 KOps/s | $\color{#35bf28}+1.80\\%$ | | test_membership_stacked_nested_last | 29.1250μs | 4.1147μs | 243.0285 KOps/s | 210.9622 KOps/s | $\textbf{\color{#35bf28}+15.20\\%}$ | | test_membership_stacked_nested_leaf_last | 27.0310μs | 4.1219μs | 242.6064 KOps/s | 206.1544 KOps/s | $\textbf{\color{#35bf28}+17.68\\%}$ | | test_nested_getleaf | 0.1127ms | 10.8346μs | 92.2967 KOps/s | 94.8189 KOps/s | $\color{#d91a1a}-2.66\\%$ | | test_nested_get | 66.4140μs | 10.4863μs | 95.3628 KOps/s | 101.0504 KOps/s | $\textbf{\color{#d91a1a}-5.63\\%}$ | | test_stacked_getleaf | 52.0680μs | 10.9420μs | 91.3912 KOps/s | 95.5650 KOps/s | $\color{#d91a1a}-4.37\\%$ | | test_stacked_get | 74.2690μs | 10.3931μs | 96.2177 KOps/s | 102.5190 KOps/s | $\textbf{\color{#d91a1a}-6.15\\%}$ | | test_nested_getitemleaf | 42.3500μs | 11.5532μs | 86.5560 KOps/s | 90.6239 KOps/s | $\color{#d91a1a}-4.49\\%$ | | test_nested_getitem | 83.5970μs | 10.6192μs | 94.1686 KOps/s | 94.4883 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_stacked_getitemleaf | 40.8670μs | 11.5477μs | 86.5971 KOps/s | 91.1734 KOps/s | $\textbf{\color{#d91a1a}-5.02\\%}$ | | test_stacked_getitem | 38.8130μs | 10.5105μs | 95.1430 KOps/s | 99.0854 KOps/s | $\color{#d91a1a}-3.98\\%$ | | test_lock_nested | 7.9600ms | 0.4499ms | 2.2228 KOps/s | 2.9693 KOps/s | $\textbf{\color{#d91a1a}-25.14\\%}$ | | test_lock_stack_nested | 0.7689ms | 0.4135ms | 2.4184 KOps/s | 3.2918 KOps/s | $\textbf{\color{#d91a1a}-26.53\\%}$ | | test_unlock_nested | 0.9453ms | 0.3680ms | 2.7175 KOps/s | 2.8754 KOps/s | $\textbf{\color{#d91a1a}-5.49\\%}$ | | test_unlock_stack_nested | 0.5885ms | 0.3258ms | 3.0696 KOps/s | 3.2148 KOps/s | $\color{#d91a1a}-4.52\\%$ | | test_flatten_speed | 0.6177ms | 0.1065ms | 9.3903 KOps/s | 10.0906 KOps/s | $\textbf{\color{#d91a1a}-6.94\\%}$ | | test_unflatten_speed | 0.5469ms | 0.4412ms | 2.2665 KOps/s | 2.4549 KOps/s | $\textbf{\color{#d91a1a}-7.67\\%}$ | | test_common_ops | 4.6099ms | 0.7772ms | 1.2867 KOps/s | 1.3144 KOps/s | $\color{#d91a1a}-2.10\\%$ | | test_creation | 38.7530μs | 2.3107μs | 432.7706 KOps/s | 510.9189 KOps/s | $\textbf{\color{#d91a1a}-15.30\\%}$ | | test_creation_empty | 0.1155ms | 10.7336μs | 93.1653 KOps/s | 99.3621 KOps/s | $\textbf{\color{#d91a1a}-6.24\\%}$ | | test_creation_nested_1 | 65.9730μs | 13.4958μs | 74.0974 KOps/s | 79.9115 KOps/s | $\textbf{\color{#d91a1a}-7.28\\%}$ | | test_creation_nested_2 | 48.7410μs | 17.5886μs | 56.8550 KOps/s | 62.1635 KOps/s | $\textbf{\color{#d91a1a}-8.54\\%}$ | | test_clone | 71.6240μs | 13.0892μs | 76.3989 KOps/s | 74.3647 KOps/s | $\color{#35bf28}+2.74\\%$ | | test_getitem[int] | 72.1890μs | 11.5463μs | 86.6076 KOps/s | 89.6487 KOps/s | $\color{#d91a1a}-3.39\\%$ | | test_getitem[slice_int] | 91.1710μs | 22.5293μs | 44.3866 KOps/s | 42.1818 KOps/s | $\textbf{\color{#35bf28}+5.23\\%}$ | | test_getitem[range] | 0.2860ms | 45.9112μs | 21.7812 KOps/s | 16.5187 KOps/s | $\textbf{\color{#35bf28}+31.86\\%}$ | | test_getitem[tuple] | 68.6780μs | 18.8697μs | 52.9950 KOps/s | 53.7373 KOps/s | $\color{#d91a1a}-1.38\\%$ | | test_getitem[list] | 0.2403ms | 40.8145μs | 24.5011 KOps/s | 23.8236 KOps/s | $\color{#35bf28}+2.84\\%$ | | test_setitem_dim[int] | 56.5260μs | 31.8421μs | 31.4049 KOps/s | 29.7642 KOps/s | $\textbf{\color{#35bf28}+5.51\\%}$ | | test_setitem_dim[slice_int] | 0.1049ms | 59.5285μs | 16.7987 KOps/s | 16.3443 KOps/s | $\color{#35bf28}+2.78\\%$ | | test_setitem_dim[range] | 0.1438ms | 79.1333μs | 12.6369 KOps/s | 12.1233 KOps/s | $\color{#35bf28}+4.24\\%$ | | test_setitem_dim[tuple] | 98.7950μs | 48.2419μs | 20.7289 KOps/s | 20.0569 KOps/s | $\color{#35bf28}+3.35\\%$ | | test_setitem | 0.1470ms | 19.8099μs | 50.4799 KOps/s | 51.0801 KOps/s | $\color{#d91a1a}-1.18\\%$ | | test_set | 0.1103ms | 19.1651μs | 52.1783 KOps/s | 52.1816 KOps/s | $-0.01\\%$ | | test_set_shared | 4.0453ms | 0.1728ms | 5.7862 KOps/s | 5.8300 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_update | 0.1614ms | 21.7723μs | 45.9299 KOps/s | 46.0581 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_update_nested | 0.1800ms | 30.6729μs | 32.6021 KOps/s | 32.5926 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_update__nested | 0.1336ms | 25.2615μs | 39.5859 KOps/s | 39.3320 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_set_nested | 0.1732ms | 22.0589μs | 45.3333 KOps/s | 47.5716 KOps/s | $\color{#d91a1a}-4.71\\%$ | | test_set_nested_new | 0.1688ms | 26.2546μs | 38.0885 KOps/s | 40.6054 KOps/s | $\textbf{\color{#d91a1a}-6.20\\%}$ | | test_select | 0.2261ms | 41.9554μs | 23.8349 KOps/s | 24.8514 KOps/s | $\color{#d91a1a}-4.09\\%$ | | test_select_nested | 0.1147ms | 60.6015μs | 16.5012 KOps/s | 17.7955 KOps/s | $\textbf{\color{#d91a1a}-7.27\\%}$ | | test_exclude_nested | 0.1553ms | 80.4019μs | 12.4375 KOps/s | 8.6169 KOps/s | $\textbf{\color{#35bf28}+44.34\\%}$ | | test_empty[True] | 0.7437ms | 0.3418ms | 2.9256 KOps/s | 2.5821 KOps/s | $\textbf{\color{#35bf28}+13.31\\%}$ | | test_empty[False] | 9.8283μs | 1.2759μs | 783.7536 KOps/s | 944.5989 KOps/s | $\textbf{\color{#d91a1a}-17.03\\%}$ | | test_unbind_speed | 0.4509ms | 0.2547ms | 3.9257 KOps/s | 4.1297 KOps/s | $\color{#d91a1a}-4.94\\%$ | | test_unbind_speed_stack0 | 0.5516ms | 0.2562ms | 3.9032 KOps/s | 4.1550 KOps/s | $\textbf{\color{#d91a1a}-6.06\\%}$ | | test_unbind_speed_stack1 | 86.6201ms | 0.7648ms | 1.3076 KOps/s | 1.3877 KOps/s | $\textbf{\color{#d91a1a}-5.77\\%}$ | | test_split | 89.3062ms | 1.6445ms | 608.0692 Ops/s | 619.8400 Ops/s | $\color{#d91a1a}-1.90\\%$ | | test_chunk | 90.4619ms | 1.6542ms | 604.5170 Ops/s | 619.4825 Ops/s | $\color{#d91a1a}-2.42\\%$ | | test_creation[device0] | 0.6804ms | 0.1034ms | 9.6744 KOps/s | 10.3747 KOps/s | $\textbf{\color{#d91a1a}-6.75\\%}$ | | test_creation_from_tensor | 4.5599ms | 99.3451μs | 10.0659 KOps/s | 10.1016 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_add_one[memmap_tensor0] | 0.1902ms | 5.5351μs | 180.6663 KOps/s | 178.7428 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_contiguous[memmap_tensor0] | 10.3900μs | 0.6370μs | 1.5700 MOps/s | 1.5875 MOps/s | $\color{#d91a1a}-1.10\\%$ | | test_stack[memmap_tensor0] | 45.2250μs | 3.5691μs | 280.1864 KOps/s | 273.3184 KOps/s | $\color{#35bf28}+2.51\\%$ | | test_memmaptd_index | 1.1657ms | 0.2545ms | 3.9286 KOps/s | 3.9158 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_memmaptd_index_astensor | 0.7863ms | 0.3343ms | 2.9911 KOps/s | 3.0342 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_memmaptd_index_op | 0.8878ms | 0.6024ms | 1.6601 KOps/s | 1.6277 KOps/s | $\color{#35bf28}+1.99\\%$ | | test_serialize_model | 0.1392s | 0.1265s | 7.9046 Ops/s | 7.0243 Ops/s | $\textbf{\color{#35bf28}+12.53\\%}$ | | test_serialize_model_pickle | 0.4491s | 0.3943s | 2.5359 Ops/s | 2.5095 Ops/s | $\color{#35bf28}+1.05\\%$ | | test_serialize_weights | 0.2113s | 0.1378s | 7.2570 Ops/s | 7.8372 Ops/s | $\textbf{\color{#d91a1a}-7.40\\%}$ | | test_serialize_weights_returnearly | 0.1817s | 0.1693s | 5.9073 Ops/s | 5.6454 Ops/s | $\color{#35bf28}+4.64\\%$ | | test_serialize_weights_pickle | 0.5365s | 0.4162s | 2.4026 Ops/s | 2.5414 Ops/s | $\textbf{\color{#d91a1a}-5.46\\%}$ | | test_serialize_weights_filesystem | 0.1570s | 0.1465s | 6.8273 Ops/s | 6.9303 Ops/s | $\color{#d91a1a}-1.49\\%$ | | test_serialize_model_filesystem | 0.1683s | 0.1559s | 6.4133 Ops/s | 6.4086 Ops/s | $\color{#35bf28}+0.07\\%$ | | test_reshape_pytree | 61.1740μs | 25.8398μs | 38.7000 KOps/s | 38.1033 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_reshape_td | 87.6540μs | 34.9548μs | 28.6084 KOps/s | 29.5524 KOps/s | $\color{#d91a1a}-3.19\\%$ | | test_view_pytree | 62.8380μs | 25.5096μs | 39.2009 KOps/s | 38.8442 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_view_td | 0.1069ms | 39.4633μs | 25.3400 KOps/s | 24.9719 KOps/s | $\color{#35bf28}+1.47\\%$ | | test_unbind_pytree | 85.4000μs | 29.8489μs | 33.5021 KOps/s | 33.2138 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_unbind_td | 0.3644ms | 37.8348μs | 26.4307 KOps/s | 27.7253 KOps/s | $\color{#d91a1a}-4.67\\%$ | | test_split_pytree | 73.6780μs | 29.0023μs | 34.4801 KOps/s | 33.7285 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_split_td | 0.5810ms | 41.1597μs | 24.2956 KOps/s | 25.1112 KOps/s | $\color{#d91a1a}-3.25\\%$ | | test_add_pytree | 83.7770μs | 35.3233μs | 28.3099 KOps/s | 28.2959 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_add_td | 0.1926ms | 56.4045μs | 17.7291 KOps/s | 18.0751 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_distributed | 0.7352ms | 0.1368ms | 7.3110 KOps/s | 7.3987 KOps/s | $\color{#d91a1a}-1.19\\%$ | | test_tdmodule | 0.1122ms | 18.5959μs | 53.7753 KOps/s | 47.1125 KOps/s | $\textbf{\color{#35bf28}+14.14\\%}$ | | test_tdmodule_dispatch | 73.8790μs | 36.5460μs | 27.3628 KOps/s | 27.3882 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_tdseq | 43.9630μs | 21.3638μs | 46.8081 KOps/s | 49.0479 KOps/s | $\color{#d91a1a}-4.57\\%$ | | test_tdseq_dispatch | 67.1860μs | 40.8499μs | 24.4799 KOps/s | 24.9388 KOps/s | $\color{#d91a1a}-1.84\\%$ | | test_instantiation_functorch | 2.1759ms | 1.3396ms | 746.5043 Ops/s | 735.0840 Ops/s | $\color{#35bf28}+1.55\\%$ | | test_instantiation_td | 1.7092ms | 1.0418ms | 959.8845 Ops/s | 872.7689 Ops/s | $\textbf{\color{#35bf28}+9.98\\%}$ | | test_exec_functorch | 0.3346ms | 0.1729ms | 5.7831 KOps/s | 6.0738 KOps/s | $\color{#d91a1a}-4.79\\%$ | | test_exec_functional_call | 0.2885ms | 0.1491ms | 6.7062 KOps/s | 6.5358 KOps/s | $\color{#35bf28}+2.61\\%$ | | test_exec_td | 0.2730ms | 0.1485ms | 6.7342 KOps/s | 6.8460 KOps/s | $\color{#d91a1a}-1.63\\%$ | | test_exec_td_decorator | 0.8978ms | 0.2242ms | 4.4596 KOps/s | 4.4326 KOps/s | $\color{#35bf28}+0.61\\%$ | | test_vmap_mlp_speed[True-True] | 0.7841ms | 0.5085ms | 1.9665 KOps/s | 2.0108 KOps/s | $\color{#d91a1a}-2.20\\%$ | | test_vmap_mlp_speed[True-False] | 0.7741ms | 0.4972ms | 2.0112 KOps/s | 2.0145 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_vmap_mlp_speed[False-True] | 0.7230ms | 0.4111ms | 2.4324 KOps/s | 2.4603 KOps/s | $\color{#d91a1a}-1.13\\%$ | | test_vmap_mlp_speed[False-False] | 0.6608ms | 0.4096ms | 2.4414 KOps/s | 2.4564 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1190ms | 0.5877ms | 1.7014 KOps/s | 1.7389 KOps/s | $\color{#d91a1a}-2.16\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8447ms | 0.5870ms | 1.7036 KOps/s | 1.7512 KOps/s | $\color{#d91a1a}-2.71\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.6940ms | 0.4796ms | 2.0849 KOps/s | 2.1339 KOps/s | $\color{#d91a1a}-2.30\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7427ms | 0.4800ms | 2.0833 KOps/s | 2.1394 KOps/s | $\color{#d91a1a}-2.62\\%$ | | test_to_module_speed[True] | 2.7318ms | 1.7247ms | 579.7987 Ops/s | 597.8706 Ops/s | $\color{#d91a1a}-3.02\\%$ | | test_to_module_speed[False] | 1.8223ms | 1.6863ms | 593.0070 Ops/s | 603.4485 Ops/s | $\color{#d91a1a}-1.73\\%$ | | test_tc_init | 0.1153ms | 51.9076μs | 19.2650 KOps/s | 17.3719 KOps/s | $\textbf{\color{#35bf28}+10.90\\%}$ | | test_tc_init_nested | 0.2286ms | 0.1039ms | 9.6202 KOps/s | 8.9263 KOps/s | $\textbf{\color{#35bf28}+7.77\\%}$ | | test_tc_first_layer_tensor | 52.4880μs | 8.3622μs | 119.5859 KOps/s | 123.2444 KOps/s | $\color{#d91a1a}-2.97\\%$ | | test_tc_first_layer_nontensor | 43.1410μs | 8.2010μs | 121.9370 KOps/s | 123.8217 KOps/s | $\color{#d91a1a}-1.52\\%$ | | test_tc_second_layer_tensor | 32.4210μs | 2.5383μs | 393.9710 KOps/s | 401.7927 KOps/s | $\color{#d91a1a}-1.95\\%$ | | test_tc_second_layer_nontensor | 44.2420μs | 9.3762μs | 106.6530 KOps/s | 111.2849 KOps/s | $\color{#d91a1a}-4.16\\%$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 141. Improved: $\large\color{#35bf28}63$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 38.0600μs | 12.0434μs | 83.0328 KOps/s | 77.2823 KOps/s | $\textbf{\color{#35bf28}+7.44\\%}$ | | test_plain_set_stack_nested | 28.0700μs | 11.9725μs | 83.5248 KOps/s | 77.6199 KOps/s | $\textbf{\color{#35bf28}+7.61\\%}$ | | test_plain_set_nested_inplace | 38.4310μs | 12.9292μs | 77.3440 KOps/s | 69.7522 KOps/s | $\textbf{\color{#35bf28}+10.88\\%}$ | | test_plain_set_stack_nested_inplace | 32.0310μs | 12.9034μs | 77.4991 KOps/s | 70.4287 KOps/s | $\textbf{\color{#35bf28}+10.04\\%}$ | | test_items | 21.0400μs | 4.5810μs | 218.2938 KOps/s | 212.3579 KOps/s | $\color{#35bf28}+2.80\\%$ | | test_items_nested | 0.4278ms | 0.3810ms | 2.6247 KOps/s | 2.9695 KOps/s | $\textbf{\color{#d91a1a}-11.61\\%}$ | | test_items_nested_locked | 0.4350ms | 0.3878ms | 2.5784 KOps/s | 2.9638 KOps/s | $\textbf{\color{#d91a1a}-13.00\\%}$ | | test_items_nested_leaf | 0.1013ms | 85.4393μs | 11.7042 KOps/s | 12.2073 KOps/s | $\color{#d91a1a}-4.12\\%$ | | test_items_stack_nested | 0.4381ms | 0.3817ms | 2.6196 KOps/s | 2.9302 KOps/s | $\textbf{\color{#d91a1a}-10.60\\%}$ | | test_items_stack_nested_leaf | 0.1067ms | 85.8325μs | 11.6506 KOps/s | 11.9414 KOps/s | $\color{#d91a1a}-2.44\\%$ | | test_items_stack_nested_locked | 0.4528ms | 0.3895ms | 2.5677 KOps/s | 2.9714 KOps/s | $\textbf{\color{#d91a1a}-13.59\\%}$ | | test_keys | 22.7300μs | 4.3564μs | 229.5473 KOps/s | 232.0780 KOps/s | $\color{#d91a1a}-1.09\\%$ | | test_keys_nested | 85.6510μs | 66.8811μs | 14.9519 KOps/s | 14.5408 KOps/s | $\color{#35bf28}+2.83\\%$ | | test_keys_nested_locked | 0.6924ms | 73.2901μs | 13.6444 KOps/s | 13.5186 KOps/s | $\color{#35bf28}+0.93\\%$ | | test_keys_nested_leaf | 75.5010μs | 57.2388μs | 17.4707 KOps/s | 16.7451 KOps/s | $\color{#35bf28}+4.33\\%$ | | test_keys_stack_nested | 89.9220μs | 67.0219μs | 14.9205 KOps/s | 14.5973 KOps/s | $\color{#35bf28}+2.21\\%$ | | test_keys_stack_nested_leaf | 81.2820μs | 57.3498μs | 17.4369 KOps/s | 16.6827 KOps/s | $\color{#35bf28}+4.52\\%$ | | test_keys_stack_nested_locked | 0.1040ms | 73.2528μs | 13.6513 KOps/s | 13.7224 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_values | 7.2200μs | 1.7483μs | 571.9785 KOps/s | 552.6381 KOps/s | $\color{#35bf28}+3.50\\%$ | | test_values_nested | 48.9710μs | 33.9206μs | 29.4806 KOps/s | 28.1469 KOps/s | $\color{#35bf28}+4.74\\%$ | | test_values_nested_locked | 57.6500μs | 35.9970μs | 27.7801 KOps/s | 26.5642 KOps/s | $\color{#35bf28}+4.58\\%$ | | test_values_nested_leaf | 44.0000μs | 30.2264μs | 33.0837 KOps/s | 31.6917 KOps/s | $\color{#35bf28}+4.39\\%$ | | test_values_stack_nested | 53.3310μs | 34.1346μs | 29.2958 KOps/s | 27.1932 KOps/s | $\textbf{\color{#35bf28}+7.73\\%}$ | | test_values_stack_nested_leaf | 56.0810μs | 30.4342μs | 32.8577 KOps/s | 30.4713 KOps/s | $\textbf{\color{#35bf28}+7.83\\%}$ | | test_values_stack_nested_locked | 55.2310μs | 35.9452μs | 27.8202 KOps/s | 25.9461 KOps/s | $\textbf{\color{#35bf28}+7.22\\%}$ | | test_membership | 1.5685μs | 0.5358μs | 1.8663 MOps/s | 1.3936 MOps/s | $\textbf{\color{#35bf28}+33.92\\%}$ | | test_membership_nested | 16.6610μs | 1.9966μs | 500.8417 KOps/s | 406.9975 KOps/s | $\textbf{\color{#35bf28}+23.06\\%}$ | | test_membership_nested_leaf | 11.8150μs | 1.9350μs | 516.7912 KOps/s | 405.7514 KOps/s | $\textbf{\color{#35bf28}+27.37\\%}$ | | test_membership_stacked_nested | 19.6200μs | 1.9779μs | 505.5829 KOps/s | 404.0225 KOps/s | $\textbf{\color{#35bf28}+25.14\\%}$ | | test_membership_stacked_nested_leaf | 18.3800μs | 2.0022μs | 499.4457 KOps/s | 408.9073 KOps/s | $\textbf{\color{#35bf28}+22.14\\%}$ | | test_membership_nested_last | 16.8310μs | 2.8908μs | 345.9255 KOps/s | 334.8212 KOps/s | $\color{#35bf28}+3.32\\%$ | | test_membership_nested_leaf_last | 20.8400μs | 2.8759μs | 347.7206 KOps/s | 336.0709 KOps/s | $\color{#35bf28}+3.47\\%$ | | test_membership_stacked_nested_last | 27.7100μs | 2.8859μs | 346.5176 KOps/s | 291.6034 KOps/s | $\textbf{\color{#35bf28}+18.83\\%}$ | | test_membership_stacked_nested_leaf_last | 25.0910μs | 2.8714μs | 348.2675 KOps/s | 292.1165 KOps/s | $\textbf{\color{#35bf28}+19.22\\%}$ | | test_nested_getleaf | 29.3200μs | 7.9258μs | 126.1699 KOps/s | 120.5780 KOps/s | $\color{#35bf28}+4.64\\%$ | | test_nested_get | 28.2900μs | 7.4523μs | 134.1859 KOps/s | 128.4757 KOps/s | $\color{#35bf28}+4.44\\%$ | | test_stacked_getleaf | 23.2700μs | 7.9360μs | 126.0075 KOps/s | 121.2897 KOps/s | $\color{#35bf28}+3.89\\%$ | | test_stacked_get | 21.2310μs | 7.4597μs | 134.0541 KOps/s | 128.0174 KOps/s | $\color{#35bf28}+4.72\\%$ | | test_nested_getitemleaf | 25.8900μs | 8.1058μs | 123.3683 KOps/s | 118.3807 KOps/s | $\color{#35bf28}+4.21\\%$ | | test_nested_getitem | 29.1910μs | 7.6129μs | 131.3562 KOps/s | 126.0315 KOps/s | $\color{#35bf28}+4.22\\%$ | | test_stacked_getitemleaf | 23.4910μs | 8.1148μs | 123.2324 KOps/s | 118.3903 KOps/s | $\color{#35bf28}+4.09\\%$ | | test_stacked_getitem | 23.4200μs | 7.6288μs | 131.0826 KOps/s | 125.6186 KOps/s | $\color{#35bf28}+4.35\\%$ | | test_lock_nested | 9.6160ms | 0.4148ms | 2.4109 KOps/s | 2.4911 KOps/s | $\color{#d91a1a}-3.22\\%$ | | test_lock_stack_nested | 0.4092ms | 0.3716ms | 2.6913 KOps/s | 3.4448 KOps/s | $\textbf{\color{#d91a1a}-21.87\\%}$ | | test_unlock_nested | 0.7578ms | 0.3237ms | 3.0890 KOps/s | 2.4906 KOps/s | $\textbf{\color{#35bf28}+24.03\\%}$ | | test_unlock_stack_nested | 0.3348ms | 0.2900ms | 3.4484 KOps/s | 3.3192 KOps/s | $\color{#35bf28}+3.89\\%$ | | test_flatten_speed | 0.4066ms | 0.1047ms | 9.5541 KOps/s | 9.7568 KOps/s | $\color{#d91a1a}-2.08\\%$ | | test_unflatten_speed | 0.3427ms | 0.2862ms | 3.4942 KOps/s | 3.4755 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_common_ops | 0.9385ms | 0.5388ms | 1.8559 KOps/s | 1.7168 KOps/s | $\textbf{\color{#35bf28}+8.10\\%}$ | | test_creation | 31.6400μs | 1.7823μs | 561.0833 KOps/s | 634.0592 KOps/s | $\textbf{\color{#d91a1a}-11.51\\%}$ | | test_creation_empty | 24.9700μs | 7.5710μs | 132.0823 KOps/s | 116.0610 KOps/s | $\textbf{\color{#35bf28}+13.80\\%}$ | | test_creation_nested_1 | 26.6700μs | 9.3591μs | 106.8481 KOps/s | 97.1585 KOps/s | $\textbf{\color{#35bf28}+9.97\\%}$ | | test_creation_nested_2 | 30.5600μs | 11.7584μs | 85.0456 KOps/s | 78.9109 KOps/s | $\textbf{\color{#35bf28}+7.77\\%}$ | | test_clone | 61.1310μs | 10.6533μs | 93.8680 KOps/s | 87.6932 KOps/s | $\textbf{\color{#35bf28}+7.04\\%}$ | | test_getitem[int] | 24.1810μs | 9.9473μs | 100.5302 KOps/s | 96.0134 KOps/s | $\color{#35bf28}+4.70\\%$ | | test_getitem[slice_int] | 42.7910μs | 18.8472μs | 53.0583 KOps/s | 51.5822 KOps/s | $\color{#35bf28}+2.86\\%$ | | test_getitem[range] | 0.2061ms | 34.5155μs | 28.9725 KOps/s | 21.5310 KOps/s | $\textbf{\color{#35bf28}+34.56\\%}$ | | test_getitem[tuple] | 36.1610μs | 16.8882μs | 59.2128 KOps/s | 56.1880 KOps/s | $\textbf{\color{#35bf28}+5.38\\%}$ | | test_getitem[list] | 0.2097ms | 31.1502μs | 32.1025 KOps/s | 29.9360 KOps/s | $\textbf{\color{#35bf28}+7.24\\%}$ | | test_setitem_dim[int] | 43.0400μs | 23.9558μs | 41.7436 KOps/s | 36.4308 KOps/s | $\textbf{\color{#35bf28}+14.58\\%}$ | | test_setitem_dim[slice_int] | 62.6110μs | 44.4149μs | 22.5150 KOps/s | 20.3974 KOps/s | $\textbf{\color{#35bf28}+10.38\\%}$ | | test_setitem_dim[range] | 79.2410μs | 60.3612μs | 16.5669 KOps/s | 15.2593 KOps/s | $\textbf{\color{#35bf28}+8.57\\%}$ | | test_setitem_dim[tuple] | 58.5310μs | 39.2512μs | 25.4769 KOps/s | 23.8010 KOps/s | $\textbf{\color{#35bf28}+7.04\\%}$ | | test_setitem | 92.5010μs | 14.4447μs | 69.2293 KOps/s | 61.4884 KOps/s | $\textbf{\color{#35bf28}+12.59\\%}$ | | test_set | 69.0310μs | 14.1216μs | 70.8133 KOps/s | 62.9203 KOps/s | $\textbf{\color{#35bf28}+12.54\\%}$ | | test_set_shared | 2.8854ms | 95.1492μs | 10.5098 KOps/s | 10.0762 KOps/s | $\color{#35bf28}+4.30\\%$ | | test_update | 89.5220μs | 16.2322μs | 61.6059 KOps/s | 53.7057 KOps/s | $\textbf{\color{#35bf28}+14.71\\%}$ | | test_update_nested | 86.2010μs | 21.1902μs | 47.1915 KOps/s | 43.2339 KOps/s | $\textbf{\color{#35bf28}+9.15\\%}$ | | test_update__nested | 92.6020μs | 20.5392μs | 48.6873 KOps/s | 45.8051 KOps/s | $\textbf{\color{#35bf28}+6.29\\%}$ | | test_set_nested | 86.9410μs | 15.1017μs | 66.2178 KOps/s | 60.3015 KOps/s | $\textbf{\color{#35bf28}+9.81\\%}$ | | test_set_nested_new | 93.8810μs | 17.8548μs | 56.0075 KOps/s | 52.4291 KOps/s | $\textbf{\color{#35bf28}+6.83\\%}$ | | test_select | 0.1177ms | 30.5637μs | 32.7186 KOps/s | 32.0709 KOps/s | $\color{#35bf28}+2.02\\%$ | | test_select_nested | 67.0320μs | 51.9869μs | 19.2356 KOps/s | 19.0804 KOps/s | $\color{#35bf28}+0.81\\%$ | | test_exclude_nested | 89.0110μs | 70.3999μs | 14.2046 KOps/s | 9.3712 KOps/s | $\textbf{\color{#35bf28}+51.58\\%}$ | | test_empty[True] | 0.3344ms | 0.2954ms | 3.3850 KOps/s | 2.9557 KOps/s | $\textbf{\color{#35bf28}+14.52\\%}$ | | test_empty[False] | 2.2360μs | 0.9136μs | 1.0946 MOps/s | 1.2599 MOps/s | $\textbf{\color{#d91a1a}-13.12\\%}$ | | test_to | 89.1320μs | 58.3272μs | 17.1446 KOps/s | 17.1343 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_to_nonblocking | 63.3810μs | 33.3119μs | 30.0193 KOps/s | 28.4114 KOps/s | $\textbf{\color{#35bf28}+5.66\\%}$ | | test_unbind_speed | 0.2829ms | 0.2443ms | 4.0928 KOps/s | 3.9485 KOps/s | $\color{#35bf28}+3.66\\%$ | | test_unbind_speed_stack0 | 0.2990ms | 0.2439ms | 4.1002 KOps/s | 3.9249 KOps/s | $\color{#35bf28}+4.46\\%$ | | test_unbind_speed_stack1 | 94.7690ms | 0.7742ms | 1.2917 KOps/s | 1.2962 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_split | 92.9166ms | 1.5445ms | 647.4766 Ops/s | 615.3895 Ops/s | $\textbf{\color{#35bf28}+5.21\\%}$ | | test_chunk | 92.5447ms | 1.5473ms | 646.3030 Ops/s | 616.6188 Ops/s | $\color{#35bf28}+4.81\\%$ | | test_creation[device0] | 0.1275ms | 53.5494μs | 18.6743 KOps/s | 17.5519 KOps/s | $\textbf{\color{#35bf28}+6.39\\%}$ | | test_creation_from_tensor | 0.1587ms | 50.2313μs | 19.9079 KOps/s | 17.7843 KOps/s | $\textbf{\color{#35bf28}+11.94\\%}$ | | test_add_one[memmap_tensor0] | 91.7920μs | 6.7952μs | 147.1629 KOps/s | 142.0964 KOps/s | $\color{#35bf28}+3.57\\%$ | | test_contiguous[memmap_tensor0] | 10.8500μs | 0.5702μs | 1.7537 MOps/s | 1.5439 MOps/s | $\textbf{\color{#35bf28}+13.59\\%}$ | | test_stack[memmap_tensor0] | 32.4210μs | 4.4439μs | 225.0285 KOps/s | 202.6916 KOps/s | $\textbf{\color{#35bf28}+11.02\\%}$ | | test_memmaptd_index | 1.0955ms | 0.2480ms | 4.0328 KOps/s | 3.7906 KOps/s | $\textbf{\color{#35bf28}+6.39\\%}$ | | test_memmaptd_index_astensor | 0.5836ms | 0.3068ms | 3.2596 KOps/s | 3.0631 KOps/s | $\textbf{\color{#35bf28}+6.41\\%}$ | | test_memmaptd_index_op | 0.8795ms | 0.5730ms | 1.7451 KOps/s | 1.6134 KOps/s | $\textbf{\color{#35bf28}+8.17\\%}$ | | test_serialize_model | 93.4373ms | 89.4819ms | 11.1754 Ops/s | 10.4962 Ops/s | $\textbf{\color{#35bf28}+6.47\\%}$ | | test_serialize_model_pickle | 1.3481s | 1.2358s | 0.8092 Ops/s | 0.8077 Ops/s | $\color{#35bf28}+0.18\\%$ | | test_serialize_weights | 91.6922ms | 88.7361ms | 11.2694 Ops/s | 9.6004 Ops/s | $\textbf{\color{#35bf28}+17.38\\%}$ | | test_serialize_weights_returnearly | 0.2931s | 76.9442ms | 12.9964 Ops/s | 13.5092 Ops/s | $\color{#d91a1a}-3.80\\%$ | | test_serialize_weights_pickle | 1.3517s | 1.2371s | 0.8084 Ops/s | 0.8010 Ops/s | $\color{#35bf28}+0.92\\%$ | | test_reshape_pytree | 53.8310μs | 24.7612μs | 40.3858 KOps/s | 39.5560 KOps/s | $\color{#35bf28}+2.10\\%$ | | test_reshape_td | 53.8620μs | 28.5219μs | 35.0608 KOps/s | 32.1831 KOps/s | $\textbf{\color{#35bf28}+8.94\\%}$ | | test_view_pytree | 0.2230ms | 24.6153μs | 40.6251 KOps/s | 39.5828 KOps/s | $\color{#35bf28}+2.63\\%$ | | test_view_td | 69.4810μs | 33.2772μs | 30.0506 KOps/s | 26.6071 KOps/s | $\textbf{\color{#35bf28}+12.94\\%}$ | | test_unbind_pytree | 0.2436ms | 29.6668μs | 33.7077 KOps/s | 31.9727 KOps/s | $\textbf{\color{#35bf28}+5.43\\%}$ | | test_unbind_td | 0.4445ms | 36.4633μs | 27.4249 KOps/s | 25.6046 KOps/s | $\textbf{\color{#35bf28}+7.11\\%}$ | | test_split_pytree | 0.2404ms | 32.5455μs | 30.7262 KOps/s | 28.6380 KOps/s | $\textbf{\color{#35bf28}+7.29\\%}$ | | test_split_td | 0.1499ms | 35.7107μs | 28.0028 KOps/s | 25.6154 KOps/s | $\textbf{\color{#35bf28}+9.32\\%}$ | | test_add_pytree | 0.2653ms | 36.8321μs | 27.1502 KOps/s | 26.5611 KOps/s | $\color{#35bf28}+2.22\\%$ | | test_add_td | 0.1769ms | 49.0662μs | 20.3806 KOps/s | 20.3345 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_distributed | 3.4125ms | 71.6099μs | 13.9645 KOps/s | 14.6877 KOps/s | $\color{#d91a1a}-4.92\\%$ | | test_tdmodule | 28.5400μs | 13.4676μs | 74.2523 KOps/s | 62.1297 KOps/s | $\textbf{\color{#35bf28}+19.51\\%}$ | | test_tdmodule_dispatch | 51.2110μs | 27.3897μs | 36.5101 KOps/s | 32.2109 KOps/s | $\textbf{\color{#35bf28}+13.35\\%}$ | | test_tdseq | 35.8110μs | 15.2979μs | 65.3682 KOps/s | 58.0881 KOps/s | $\textbf{\color{#35bf28}+12.53\\%}$ | | test_tdseq_dispatch | 46.6210μs | 29.9784μs | 33.3573 KOps/s | 29.9451 KOps/s | $\textbf{\color{#35bf28}+11.39\\%}$ | | test_instantiation_functorch | 1.4650ms | 1.3242ms | 755.1851 Ops/s | 714.0396 Ops/s | $\textbf{\color{#35bf28}+5.76\\%}$ | | test_instantiation_td | 1.4511ms | 0.9603ms | 1.0413 KOps/s | 936.8048 Ops/s | $\textbf{\color{#35bf28}+11.16\\%}$ | | test_exec_functorch | 0.2081ms | 0.1419ms | 7.0495 KOps/s | 6.7594 KOps/s | $\color{#35bf28}+4.29\\%$ | | test_exec_functional_call | 0.2013ms | 0.1339ms | 7.4705 KOps/s | 7.1977 KOps/s | $\color{#35bf28}+3.79\\%$ | | test_exec_td | 0.1738ms | 0.1315ms | 7.6036 KOps/s | 7.3439 KOps/s | $\color{#35bf28}+3.54\\%$ | | test_exec_td_decorator | 0.4516ms | 0.1998ms | 5.0054 KOps/s | 4.7714 KOps/s | $\color{#35bf28}+4.90\\%$ | | test_vmap_mlp_speed[True-True] | 0.7512ms | 0.5742ms | 1.7415 KOps/s | 1.7459 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_vmap_mlp_speed[True-False] | 0.6438ms | 0.5717ms | 1.7491 KOps/s | 1.7403 KOps/s | $\color{#35bf28}+0.50\\%$ | | test_vmap_mlp_speed[False-True] | 0.5724ms | 0.5036ms | 1.9855 KOps/s | 1.9697 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_vmap_mlp_speed[False-False] | 0.5682ms | 0.5052ms | 1.9795 KOps/s | 1.9752 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1327ms | 0.6395ms | 1.5638 KOps/s | 1.5607 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8036ms | 0.6388ms | 1.5654 KOps/s | 1.5000 KOps/s | $\color{#35bf28}+4.36\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.6865ms | 0.5608ms | 1.7830 KOps/s | 1.7212 KOps/s | $\color{#35bf28}+3.59\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7342ms | 0.5641ms | 1.7727 KOps/s | 1.7187 KOps/s | $\color{#35bf28}+3.14\\%$ | | test_vmap_transformer_speed[True-True] | 7.8951ms | 7.6364ms | 130.9510 Ops/s | 128.0006 Ops/s | $\color{#35bf28}+2.31\\%$ | | test_vmap_transformer_speed[True-False] | 7.6414ms | 7.5684ms | 132.1290 Ops/s | 127.3266 Ops/s | $\color{#35bf28}+3.77\\%$ | | test_vmap_transformer_speed[False-True] | 7.5894ms | 7.5122ms | 133.1171 Ops/s | 128.6821 Ops/s | $\color{#35bf28}+3.45\\%$ | | test_vmap_transformer_speed[False-False] | 7.6241ms | 7.4952ms | 133.4183 Ops/s | 128.1891 Ops/s | $\color{#35bf28}+4.08\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 18.8672ms | 18.6822ms | 53.5268 Ops/s | 52.8255 Ops/s | $\color{#35bf28}+1.33\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 18.7597ms | 18.6760ms | 53.5445 Ops/s | 52.5811 Ops/s | $\color{#35bf28}+1.83\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 18.6680ms | 18.4947ms | 54.0696 Ops/s | 53.0682 Ops/s | $\color{#35bf28}+1.89\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 18.6386ms | 18.5153ms | 54.0093 Ops/s | 52.9421 Ops/s | $\color{#35bf28}+2.02\\%$ | | test_to_module_speed[True] | 1.5798ms | 1.4669ms | 681.6887 Ops/s | 666.8149 Ops/s | $\color{#35bf28}+2.23\\%$ | | test_to_module_speed[False] | 0.1018s | 1.6092ms | 621.4236 Ops/s | 671.8598 Ops/s | $\textbf{\color{#d91a1a}-7.51\\%}$ | | test_tc_init | 89.3110μs | 47.8214μs | 20.9111 KOps/s | 18.3226 KOps/s | $\textbf{\color{#35bf28}+14.13\\%}$ | | test_tc_init_nested | 0.1514ms | 90.0251μs | 11.1080 KOps/s | 9.0241 KOps/s | $\textbf{\color{#35bf28}+23.09\\%}$ | | test_tc_first_layer_tensor | 19.3400μs | 3.5760μs | 279.6449 KOps/s | 267.8810 KOps/s | $\color{#35bf28}+4.39\\%$ | | test_tc_first_layer_nontensor | 20.4200μs | 3.6053μs | 277.3691 KOps/s | 263.4041 KOps/s | $\textbf{\color{#35bf28}+5.30\\%}$ | | test_tc_second_layer_tensor | 4.1280μs | 1.1265μs | 887.7190 KOps/s | 764.0714 KOps/s | $\textbf{\color{#35bf28}+16.18\\%}$ | | test_tc_second_layer_nontensor | 26.1600μs | 4.0972μs | 244.0708 KOps/s | 235.4450 KOps/s | $\color{#35bf28}+3.66\\%$ |