pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[Feature] Compile - nn compatibility #877

Closed vmoens closed 1 month ago

vmoens commented 1 month ago

Stack from ghstack (oldest at bottom):

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 133. Improved: $\large\color{#35bf28}33$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ------------------------------------ | | test_plain_set_nested | 34.2650μs | 16.3788μs | 61.0545 KOps/s | 55.9024 KOps/s | $\textbf{\color{#35bf28}+9.22\\%}$ | | test_plain_set_stack_nested | 45.8160μs | 16.9767μs | 58.9044 KOps/s | 53.9996 KOps/s | $\textbf{\color{#35bf28}+9.08\\%}$ | | test_plain_set_nested_inplace | 51.3460μs | 18.6085μs | 53.7388 KOps/s | 50.3729 KOps/s | $\textbf{\color{#35bf28}+6.68\\%}$ | | test_plain_set_stack_nested_inplace | 46.9790μs | 18.4938μs | 54.0722 KOps/s | 50.0834 KOps/s | $\textbf{\color{#35bf28}+7.96\\%}$ | | test_items | 18.2040μs | 2.6014μs | 384.4022 KOps/s | 382.5681 KOps/s | $\color{#35bf28}+0.48\\%$ | | test_items_nested | 0.6503ms | 0.3639ms | 2.7482 KOps/s | 2.7152 KOps/s | $\color{#35bf28}+1.22\\%$ | | test_items_nested_locked | 0.7413ms | 0.3636ms | 2.7505 KOps/s | 2.7510 KOps/s | $\color{#d91a1a}-0.02\\%$ | | test_items_nested_leaf | 0.1696ms | 86.1541μs | 11.6071 KOps/s | 11.8925 KOps/s | $\color{#d91a1a}-2.40\\%$ | | test_items_stack_nested | 0.5064ms | 0.3660ms | 2.7320 KOps/s | 2.7064 KOps/s | $\color{#35bf28}+0.94\\%$ | | test_items_stack_nested_leaf | 0.3871ms | 90.3940μs | 11.0627 KOps/s | 11.7884 KOps/s | $\textbf{\color{#d91a1a}-6.16\\%}$ | | test_items_stack_nested_locked | 0.6722ms | 0.3696ms | 2.7057 KOps/s | 2.7214 KOps/s | $\color{#d91a1a}-0.57\\%$ | | test_keys | 17.8640μs | 3.8450μs | 260.0757 KOps/s | 247.0550 KOps/s | $\textbf{\color{#35bf28}+5.27\\%}$ | | test_keys_nested | 0.2465ms | 0.1444ms | 6.9258 KOps/s | 7.0157 KOps/s | $\color{#d91a1a}-1.28\\%$ | | test_keys_nested_locked | 0.7857ms | 0.1497ms | 6.6815 KOps/s | 6.7375 KOps/s | $\color{#d91a1a}-0.83\\%$ | | test_keys_nested_leaf | 0.2359ms | 0.1237ms | 8.0826 KOps/s | 8.2261 KOps/s | $\color{#d91a1a}-1.75\\%$ | | test_keys_stack_nested | 0.2385ms | 0.1456ms | 6.8687 KOps/s | 7.0496 KOps/s | $\color{#d91a1a}-2.57\\%$ | | test_keys_stack_nested_leaf | 0.2378ms | 0.1240ms | 8.0634 KOps/s | 8.2225 KOps/s | $\color{#d91a1a}-1.94\\%$ | | test_keys_stack_nested_locked | 0.2603ms | 0.1493ms | 6.6957 KOps/s | 6.8379 KOps/s | $\color{#d91a1a}-2.08\\%$ | | test_values | 12.2255μs | 1.1594μs | 862.5275 KOps/s | 852.5427 KOps/s | $\color{#35bf28}+1.17\\%$ | | test_values_nested | 0.1006ms | 49.1400μs | 20.3500 KOps/s | 20.6565 KOps/s | $\color{#d91a1a}-1.48\\%$ | | test_values_nested_locked | 0.1411ms | 49.1243μs | 20.3565 KOps/s | 20.6123 KOps/s | $\color{#d91a1a}-1.24\\%$ | | test_values_nested_leaf | 90.7800μs | 43.5195μs | 22.9782 KOps/s | 22.6063 KOps/s | $\color{#35bf28}+1.65\\%$ | | test_values_stack_nested | 91.0510μs | 49.4152μs | 20.2367 KOps/s | 19.9915 KOps/s | $\color{#35bf28}+1.23\\%$ | | test_values_stack_nested_leaf | 91.3010μs | 44.0593μs | 22.6967 KOps/s | 22.0019 KOps/s | $\color{#35bf28}+3.16\\%$ | | test_values_stack_nested_locked | 0.1002ms | 49.9488μs | 20.0205 KOps/s | 19.8632 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_membership | 2.7402μs | 0.7160μs | 1.3966 MOps/s | 1.1221 MOps/s | $\textbf{\color{#35bf28}+24.46\\%}$ | | test_membership_nested | 33.1710μs | 2.8225μs | 354.2963 KOps/s | 357.9236 KOps/s | $\color{#d91a1a}-1.01\\%$ | | test_membership_nested_leaf | 24.3350μs | 2.8225μs | 354.2962 KOps/s | 355.5171 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_membership_stacked_nested | 38.6720μs | 2.7538μs | 363.1356 KOps/s | 355.0826 KOps/s | $\color{#35bf28}+2.27\\%$ | | test_membership_stacked_nested_leaf | 21.3300μs | 2.8133μs | 355.4562 KOps/s | 357.2353 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_membership_nested_last | 28.0720μs | 4.1129μs | 243.1365 KOps/s | 253.0814 KOps/s | $\color{#d91a1a}-3.93\\%$ | | test_membership_nested_leaf_last | 28.6940μs | 4.0873μs | 244.6611 KOps/s | 252.5808 KOps/s | $\color{#d91a1a}-3.14\\%$ | | test_membership_stacked_nested_last | 21.8610μs | 4.0905μs | 244.4678 KOps/s | 141.2954 KOps/s | $\textbf{\color{#35bf28}+73.02\\%}$ | | test_membership_stacked_nested_leaf_last | 27.0310μs | 4.1387μs | 241.6195 KOps/s | 144.9187 KOps/s | $\textbf{\color{#35bf28}+66.73\\%}$ | | test_nested_getleaf | 41.7180μs | 10.9178μs | 91.5938 KOps/s | 91.4444 KOps/s | $\color{#35bf28}+0.16\\%$ | | test_nested_get | 40.2250μs | 10.2795μs | 97.2809 KOps/s | 96.8942 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_stacked_getleaf | 35.0160μs | 10.9406μs | 91.4030 KOps/s | 93.4546 KOps/s | $\color{#d91a1a}-2.20\\%$ | | test_stacked_get | 39.4450μs | 10.2706μs | 97.3652 KOps/s | 98.2606 KOps/s | $\color{#d91a1a}-0.91\\%$ | | test_nested_getitemleaf | 37.0490μs | 11.3371μs | 88.2060 KOps/s | 87.5686 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_nested_getitem | 36.6690μs | 10.5295μs | 94.9716 KOps/s | 94.7624 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_stacked_getitemleaf | 36.9300μs | 11.4250μs | 87.5277 KOps/s | 87.7580 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_stacked_getitem | 33.7830μs | 10.5755μs | 94.5580 KOps/s | 95.7801 KOps/s | $\color{#d91a1a}-1.28\\%$ | | test_lock_nested | 7.4099ms | 0.4356ms | 2.2957 KOps/s | 2.3121 KOps/s | $\color{#d91a1a}-0.71\\%$ | | test_lock_stack_nested | 0.6851ms | 0.4090ms | 2.4452 KOps/s | 2.5102 KOps/s | $\color{#d91a1a}-2.59\\%$ | | test_unlock_nested | 0.7262ms | 0.3505ms | 2.8532 KOps/s | 2.3462 KOps/s | $\textbf{\color{#35bf28}+21.61\\%}$ | | test_unlock_stack_nested | 0.6383ms | 0.3246ms | 3.0805 KOps/s | 3.1738 KOps/s | $\color{#d91a1a}-2.94\\%$ | | test_flatten_speed | 0.4716ms | 0.1053ms | 9.5005 KOps/s | 9.6858 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_unflatten_speed | 0.6339ms | 0.4416ms | 2.2647 KOps/s | 2.2957 KOps/s | $\color{#d91a1a}-1.35\\%$ | | test_common_ops | 1.4436ms | 0.7495ms | 1.3342 KOps/s | 1.2622 KOps/s | $\textbf{\color{#35bf28}+5.70\\%}$ | | test_creation | 20.7990μs | 2.3511μs | 425.3409 KOps/s | 430.1419 KOps/s | $\color{#d91a1a}-1.12\\%$ | | test_creation_empty | 38.8220μs | 10.1582μs | 98.4423 KOps/s | 77.3587 KOps/s | $\textbf{\color{#35bf28}+27.25\\%}$ | | test_creation_nested_1 | 51.4170μs | 13.1692μs | 75.9346 KOps/s | 63.1263 KOps/s | $\textbf{\color{#35bf28}+20.29\\%}$ | | test_creation_nested_2 | 46.6070μs | 16.8880μs | 59.2135 KOps/s | 50.3650 KOps/s | $\textbf{\color{#35bf28}+17.57\\%}$ | | test_clone | 64.4910μs | 12.7417μs | 78.4822 KOps/s | 77.3346 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_getitem[int] | 41.2370μs | 11.7684μs | 84.9733 KOps/s | 85.9046 KOps/s | $\color{#d91a1a}-1.08\\%$ | | test_getitem[slice_int] | 51.2160μs | 23.5795μs | 42.4098 KOps/s | 37.3862 KOps/s | $\textbf{\color{#35bf28}+13.44\\%}$ | | test_getitem[range] | 0.1656ms | 44.6629μs | 22.3899 KOps/s | 20.9826 KOps/s | $\textbf{\color{#35bf28}+6.71\\%}$ | | test_getitem[tuple] | 57.3080μs | 19.3095μs | 51.7880 KOps/s | 50.4293 KOps/s | $\color{#35bf28}+2.69\\%$ | | test_getitem[list] | 1.5470ms | 40.3961μs | 24.7549 KOps/s | 24.2095 KOps/s | $\color{#35bf28}+2.25\\%$ | | test_setitem_dim[int] | 67.9970μs | 31.3213μs | 31.9272 KOps/s | 28.7690 KOps/s | $\textbf{\color{#35bf28}+10.98\\%}$ | | test_setitem_dim[slice_int] | 0.1246ms | 62.5498μs | 15.9873 KOps/s | 15.7628 KOps/s | $\color{#35bf28}+1.42\\%$ | | test_setitem_dim[range] | 0.1422ms | 80.2381μs | 12.4629 KOps/s | 11.8955 KOps/s | $\color{#35bf28}+4.77\\%$ | | test_setitem_dim[tuple] | 94.0860μs | 47.3111μs | 21.1367 KOps/s | 19.3154 KOps/s | $\textbf{\color{#35bf28}+9.43\\%}$ | | test_setitem | 73.5580μs | 19.3055μs | 51.7987 KOps/s | 47.9783 KOps/s | $\textbf{\color{#35bf28}+7.96\\%}$ | | test_set | 71.1230μs | 18.4668μs | 54.1511 KOps/s | 48.7924 KOps/s | $\textbf{\color{#35bf28}+10.98\\%}$ | | test_set_shared | 3.8326ms | 0.1718ms | 5.8203 KOps/s | 5.8753 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_update | 0.1302ms | 20.9191μs | 47.8032 KOps/s | 41.2754 KOps/s | $\textbf{\color{#35bf28}+15.82\\%}$ | | test_update_nested | 94.1360μs | 29.8512μs | 33.4995 KOps/s | 30.9521 KOps/s | $\textbf{\color{#35bf28}+8.23\\%}$ | | test_update__nested | 79.4290μs | 25.5519μs | 39.1361 KOps/s | 40.6276 KOps/s | $\color{#d91a1a}-3.67\\%$ | | test_set_nested | 77.4650μs | 20.5444μs | 48.6751 KOps/s | 45.1066 KOps/s | $\textbf{\color{#35bf28}+7.91\\%}$ | | test_set_nested_new | 80.9810μs | 24.9606μs | 40.0631 KOps/s | 36.3697 KOps/s | $\textbf{\color{#35bf28}+10.16\\%}$ | | test_select | 0.1006ms | 41.8965μs | 23.8683 KOps/s | 22.9771 KOps/s | $\color{#35bf28}+3.88\\%$ | | test_select_nested | 0.1210ms | 60.3907μs | 16.5588 KOps/s | 16.7110 KOps/s | $\color{#d91a1a}-0.91\\%$ | | test_exclude_nested | 0.1702ms | 80.4816μs | 12.4252 KOps/s | 12.6143 KOps/s | $\color{#d91a1a}-1.50\\%$ | | test_empty[True] | 0.4254ms | 0.3420ms | 2.9240 KOps/s | 2.9614 KOps/s | $\color{#d91a1a}-1.27\\%$ | | test_empty[False] | 9.7030μs | 1.2479μs | 801.3282 KOps/s | 815.5576 KOps/s | $\color{#d91a1a}-1.74\\%$ | | test_unbind_speed | 0.3374ms | 0.2551ms | 3.9200 KOps/s | 3.9341 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_unbind_speed_stack0 | 0.3999ms | 0.2528ms | 3.9560 KOps/s | 4.0594 KOps/s | $\color{#d91a1a}-2.55\\%$ | | test_unbind_speed_stack1 | 82.9327ms | 0.7513ms | 1.3310 KOps/s | 1.5014 KOps/s | $\textbf{\color{#d91a1a}-11.35\\%}$ | | test_split | 77.0279ms | 1.6027ms | 623.9285 Ops/s | 608.4517 Ops/s | $\color{#35bf28}+2.54\\%$ | | test_chunk | 76.4437ms | 1.6067ms | 622.3818 Ops/s | 605.5857 Ops/s | $\color{#35bf28}+2.77\\%$ | | test_creation[device0] | 0.2207ms | 96.5213μs | 10.3604 KOps/s | 10.6707 KOps/s | $\color{#d91a1a}-2.91\\%$ | | test_creation_from_tensor | 4.2667ms | 98.8494μs | 10.1164 KOps/s | 10.1818 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_add_one[memmap_tensor0] | 0.1493ms | 5.2675μs | 189.8431 KOps/s | 180.9026 KOps/s | $\color{#35bf28}+4.94\\%$ | | test_contiguous[memmap_tensor0] | 25.4980μs | 0.6404μs | 1.5614 MOps/s | 1.5421 MOps/s | $\color{#35bf28}+1.25\\%$ | | test_stack[memmap_tensor0] | 45.3050μs | 3.4590μs | 289.1003 KOps/s | 282.0819 KOps/s | $\color{#35bf28}+2.49\\%$ | | test_memmaptd_index | 1.0242ms | 0.2516ms | 3.9739 KOps/s | 3.8745 KOps/s | $\color{#35bf28}+2.56\\%$ | | test_memmaptd_index_astensor | 0.7245ms | 0.3275ms | 3.0536 KOps/s | 3.0168 KOps/s | $\color{#35bf28}+1.22\\%$ | | test_memmaptd_index_op | 0.8302ms | 0.5858ms | 1.7071 KOps/s | 1.5358 KOps/s | $\textbf{\color{#35bf28}+11.15\\%}$ | | test_serialize_model | 0.1313s | 0.1229s | 8.1393 Ops/s | 7.3204 Ops/s | $\textbf{\color{#35bf28}+11.19\\%}$ | | test_serialize_model_pickle | 0.4470s | 0.3862s | 2.5896 Ops/s | 2.5265 Ops/s | $\color{#35bf28}+2.50\\%$ | | test_serialize_weights | 0.1336s | 0.1225s | 8.1644 Ops/s | 7.9378 Ops/s | $\color{#35bf28}+2.86\\%$ | | test_serialize_weights_returnearly | 0.1851s | 0.1640s | 6.0988 Ops/s | 6.0263 Ops/s | $\color{#35bf28}+1.20\\%$ | | test_serialize_weights_pickle | 0.4466s | 0.3991s | 2.5055 Ops/s | 1.0795 Ops/s | $\textbf{\color{#35bf28}+132.10\\%}$ | | test_serialize_weights_filesystem | 0.2192s | 0.1540s | 6.4926 Ops/s | 6.8973 Ops/s | $\textbf{\color{#d91a1a}-5.87\\%}$ | | test_serialize_model_filesystem | 0.1551s | 0.1517s | 6.5936 Ops/s | 6.7926 Ops/s | $\color{#d91a1a}-2.93\\%$ | | test_reshape_pytree | 58.3700μs | 25.7706μs | 38.8038 KOps/s | 39.3029 KOps/s | $\color{#d91a1a}-1.27\\%$ | | test_reshape_td | 82.2440μs | 33.9768μs | 29.4318 KOps/s | 29.8627 KOps/s | $\color{#d91a1a}-1.44\\%$ | | test_view_pytree | 65.0820μs | 25.2868μs | 39.5464 KOps/s | 39.2775 KOps/s | $\color{#35bf28}+0.68\\%$ | | test_view_td | 87.2540μs | 39.5094μs | 25.3104 KOps/s | 25.3406 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_unbind_pytree | 91.7680μs | 29.0499μs | 34.4235 KOps/s | 34.1031 KOps/s | $\color{#35bf28}+0.94\\%$ | | test_unbind_td | 0.3741ms | 37.5396μs | 26.6385 KOps/s | 25.8847 KOps/s | $\color{#35bf28}+2.91\\%$ | | test_split_pytree | 67.0160μs | 29.0526μs | 34.4203 KOps/s | 34.2049 KOps/s | $\color{#35bf28}+0.63\\%$ | | test_split_td | 0.1233ms | 40.3864μs | 24.7608 KOps/s | 24.4113 KOps/s | $\color{#35bf28}+1.43\\%$ | | test_add_pytree | 94.4370μs | 34.6674μs | 28.8456 KOps/s | 28.2289 KOps/s | $\color{#35bf28}+2.18\\%$ | | test_add_td | 0.1188ms | 54.9149μs | 18.2100 KOps/s | 16.5658 KOps/s | $\textbf{\color{#35bf28}+9.93\\%}$ | | test_distributed | 0.2430ms | 0.1287ms | 7.7675 KOps/s | 7.5644 KOps/s | $\color{#35bf28}+2.68\\%$ | | test_tdmodule | 34.0430μs | 15.5136μs | 64.4594 KOps/s | 48.5786 KOps/s | $\textbf{\color{#35bf28}+32.69\\%}$ | | test_tdmodule_dispatch | 99.5260μs | 34.3620μs | 29.1019 KOps/s | 26.0719 KOps/s | $\textbf{\color{#35bf28}+11.62\\%}$ | | test_tdseq | 52.5280μs | 18.0995μs | 55.2502 KOps/s | 46.5438 KOps/s | $\textbf{\color{#35bf28}+18.71\\%}$ | | test_tdseq_dispatch | 71.6140μs | 38.7714μs | 25.7922 KOps/s | 22.8952 KOps/s | $\textbf{\color{#35bf28}+12.65\\%}$ | | test_instantiation_functorch | 1.9314ms | 1.3089ms | 763.9775 Ops/s | 774.3936 Ops/s | $\color{#d91a1a}-1.35\\%$ | | test_instantiation_td | 1.5079ms | 1.0159ms | 984.3537 Ops/s | 983.4505 Ops/s | $\color{#35bf28}+0.09\\%$ | | test_exec_functorch | 0.4834ms | 0.1630ms | 6.1361 KOps/s | 6.1153 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_exec_functional_call | 0.2996ms | 0.1478ms | 6.7637 KOps/s | 6.6626 KOps/s | $\color{#35bf28}+1.52\\%$ | | test_exec_td | 0.2751ms | 0.1435ms | 6.9678 KOps/s | 6.6437 KOps/s | $\color{#35bf28}+4.88\\%$ | | test_exec_td_decorator | 0.8965ms | 0.2293ms | 4.3609 KOps/s | 4.5051 KOps/s | $\color{#d91a1a}-3.20\\%$ | | test_vmap_mlp_speed[True-True] | 0.7807ms | 0.4976ms | 2.0098 KOps/s | 2.0017 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_vmap_mlp_speed[True-False] | 0.6892ms | 0.4800ms | 2.0835 KOps/s | 2.0202 KOps/s | $\color{#35bf28}+3.13\\%$ | | test_vmap_mlp_speed[False-True] | 0.7363ms | 0.3984ms | 2.5100 KOps/s | 2.4921 KOps/s | $\color{#35bf28}+0.72\\%$ | | test_vmap_mlp_speed[False-False] | 0.6993ms | 0.3987ms | 2.5083 KOps/s | 2.4928 KOps/s | $\color{#35bf28}+0.62\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 0.9394ms | 0.5781ms | 1.7297 KOps/s | 1.7400 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9236ms | 0.5803ms | 1.7232 KOps/s | 1.7229 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7872ms | 0.4781ms | 2.0916 KOps/s | 2.1314 KOps/s | $\color{#d91a1a}-1.87\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.9125ms | 0.4743ms | 2.1084 KOps/s | 2.1294 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_to_module_speed[True] | 2.9246ms | 1.8188ms | 549.8239 Ops/s | 584.6336 Ops/s | $\textbf{\color{#d91a1a}-5.95\\%}$ | | test_to_module_speed[False] | 84.6603ms | 1.9490ms | 513.0783 Ops/s | 595.4755 Ops/s | $\textbf{\color{#d91a1a}-13.84\\%}$ | | test_tc_init | 93.2950μs | 53.0977μs | 18.8332 KOps/s | 16.9325 KOps/s | $\textbf{\color{#35bf28}+11.23\\%}$ | | test_tc_init_nested | 0.1823ms | 0.1014ms | 9.8571 KOps/s | 8.7745 KOps/s | $\textbf{\color{#35bf28}+12.34\\%}$ | | test_tc_first_layer_tensor | 33.0020μs | 8.2919μs | 120.6000 KOps/s | 123.6733 KOps/s | $\color{#d91a1a}-2.49\\%$ | | test_tc_first_layer_nontensor | 38.1310μs | 8.3019μs | 120.4542 KOps/s | 123.0259 KOps/s | $\color{#d91a1a}-2.09\\%$ | | test_tc_second_layer_tensor | 22.0520μs | 2.4451μs | 408.9799 KOps/s | 404.6055 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_tc_second_layer_nontensor | 31.9400μs | 9.1156μs | 109.7018 KOps/s | 112.3867 KOps/s | $\color{#d91a1a}-2.39\\%$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 141. Improved: $\large\color{#35bf28}34$. Worsened: $\large\color{#d91a1a}1$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 29.5800μs | 12.1663μs | 82.1942 KOps/s | 73.5916 KOps/s | $\textbf{\color{#35bf28}+11.69\\%}$ | | test_plain_set_stack_nested | 28.0410μs | 12.2184μs | 81.8441 KOps/s | 73.9217 KOps/s | $\textbf{\color{#35bf28}+10.72\\%}$ | | test_plain_set_nested_inplace | 42.9910μs | 13.2539μs | 75.4492 KOps/s | 68.9629 KOps/s | $\textbf{\color{#35bf28}+9.41\\%}$ | | test_plain_set_stack_nested_inplace | 38.1810μs | 13.2599μs | 75.4155 KOps/s | 68.7408 KOps/s | $\textbf{\color{#35bf28}+9.71\\%}$ | | test_items | 19.5700μs | 4.7538μs | 210.3576 KOps/s | 214.9126 KOps/s | $\color{#d91a1a}-2.12\\%$ | | test_items_nested | 0.4338ms | 0.3829ms | 2.6117 KOps/s | 2.5506 KOps/s | $\color{#35bf28}+2.40\\%$ | | test_items_nested_locked | 0.4491ms | 0.3916ms | 2.5537 KOps/s | 2.5298 KOps/s | $\color{#35bf28}+0.94\\%$ | | test_items_nested_leaf | 0.1245ms | 86.6272μs | 11.5437 KOps/s | 11.6320 KOps/s | $\color{#d91a1a}-0.76\\%$ | | test_items_stack_nested | 0.4956ms | 0.3886ms | 2.5736 KOps/s | 2.5338 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_items_stack_nested_leaf | 0.1249ms | 87.1943μs | 11.4686 KOps/s | 11.4432 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_items_stack_nested_locked | 0.4488ms | 0.3924ms | 2.5482 KOps/s | 2.5175 KOps/s | $\color{#35bf28}+1.22\\%$ | | test_keys | 36.4500μs | 4.3613μs | 229.2882 KOps/s | 229.9456 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_keys_nested | 0.1052ms | 68.3637μs | 14.6276 KOps/s | 14.6908 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_keys_nested_locked | 0.7447ms | 74.2342μs | 13.4709 KOps/s | 13.3108 KOps/s | $\color{#35bf28}+1.20\\%$ | | test_keys_nested_leaf | 84.7420μs | 59.3577μs | 16.8470 KOps/s | 17.1920 KOps/s | $\color{#d91a1a}-2.01\\%$ | | test_keys_stack_nested | 96.8520μs | 68.5613μs | 14.5855 KOps/s | 14.6483 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_keys_stack_nested_leaf | 0.1285ms | 57.9857μs | 17.2456 KOps/s | 16.9632 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_keys_stack_nested_locked | 0.1087ms | 74.8572μs | 13.3588 KOps/s | 13.5013 KOps/s | $\color{#d91a1a}-1.06\\%$ | | test_values | 16.1840μs | 1.7606μs | 567.9980 KOps/s | 564.3510 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_values_nested | 59.2310μs | 34.7182μs | 28.8033 KOps/s | 29.1932 KOps/s | $\color{#d91a1a}-1.34\\%$ | | test_values_nested_locked | 56.7400μs | 36.6502μs | 27.2850 KOps/s | 27.5396 KOps/s | $\color{#d91a1a}-0.92\\%$ | | test_values_nested_leaf | 41.7510μs | 30.7482μs | 32.5222 KOps/s | 32.8366 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_values_stack_nested | 73.0820μs | 35.2180μs | 28.3945 KOps/s | 28.4459 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_values_stack_nested_leaf | 58.5310μs | 31.4301μs | 31.8166 KOps/s | 31.8535 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_values_stack_nested_locked | 80.8010μs | 37.1266μs | 26.9349 KOps/s | 27.1096 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_membership | 1.7830μs | 0.5429μs | 1.8419 MOps/s | 1.8678 MOps/s | $\color{#d91a1a}-1.39\\%$ | | test_membership_nested | 16.7010μs | 2.0141μs | 496.5028 KOps/s | 494.3232 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_membership_nested_leaf | 16.5255μs | 1.9771μs | 505.7987 KOps/s | 515.3269 KOps/s | $\color{#d91a1a}-1.85\\%$ | | test_membership_stacked_nested | 23.1200μs | 2.0431μs | 489.4448 KOps/s | 499.5393 KOps/s | $\color{#d91a1a}-2.02\\%$ | | test_membership_stacked_nested_leaf | 14.6290μs | 2.0246μs | 493.9267 KOps/s | 500.3257 KOps/s | $\color{#d91a1a}-1.28\\%$ | | test_membership_nested_last | 17.9200μs | 2.9145μs | 343.1066 KOps/s | 344.0721 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_membership_nested_leaf_last | 33.7300μs | 2.9116μs | 343.4547 KOps/s | 345.4305 KOps/s | $\color{#d91a1a}-0.57\\%$ | | test_membership_stacked_nested_last | 19.4610μs | 2.9411μs | 340.0073 KOps/s | 339.1933 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_membership_stacked_nested_leaf_last | 19.8400μs | 2.9310μs | 341.1835 KOps/s | 344.8459 KOps/s | $\color{#d91a1a}-1.06\\%$ | | test_nested_getleaf | 36.3700μs | 8.1875μs | 122.1378 KOps/s | 125.8448 KOps/s | $\color{#d91a1a}-2.95\\%$ | | test_nested_get | 33.5510μs | 7.6589μs | 130.5670 KOps/s | 133.3094 KOps/s | $\color{#d91a1a}-2.06\\%$ | | test_stacked_getleaf | 24.2010μs | 8.2090μs | 121.8178 KOps/s | 124.9382 KOps/s | $\color{#d91a1a}-2.50\\%$ | | test_stacked_get | 35.7920μs | 7.6866μs | 130.0957 KOps/s | 133.2846 KOps/s | $\color{#d91a1a}-2.39\\%$ | | test_nested_getitemleaf | 37.0810μs | 8.3869μs | 119.2339 KOps/s | 122.6717 KOps/s | $\color{#d91a1a}-2.80\\%$ | | test_nested_getitem | 23.5190μs | 7.8909μs | 126.7284 KOps/s | 130.3616 KOps/s | $\color{#d91a1a}-2.79\\%$ | | test_stacked_getitemleaf | 37.2600μs | 8.3555μs | 119.6816 KOps/s | 123.2958 KOps/s | $\color{#d91a1a}-2.93\\%$ | | test_stacked_getitem | 24.2710μs | 7.8745μs | 126.9928 KOps/s | 129.5909 KOps/s | $\color{#d91a1a}-2.00\\%$ | | test_lock_nested | 10.0575ms | 0.4203ms | 2.3793 KOps/s | 2.4066 KOps/s | $\color{#d91a1a}-1.14\\%$ | | test_lock_stack_nested | 0.4477ms | 0.3759ms | 2.6606 KOps/s | 2.6171 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_unlock_nested | 0.7320ms | 0.3330ms | 3.0030 KOps/s | 3.0319 KOps/s | $\color{#d91a1a}-0.95\\%$ | | test_unlock_stack_nested | 0.3556ms | 0.2982ms | 3.3530 KOps/s | 3.3747 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_flatten_speed | 0.4049ms | 0.1065ms | 9.3934 KOps/s | 9.2859 KOps/s | $\color{#35bf28}+1.16\\%$ | | test_unflatten_speed | 0.3585ms | 0.2966ms | 3.3714 KOps/s | 3.4252 KOps/s | $\color{#d91a1a}-1.57\\%$ | | test_common_ops | 0.9802ms | 0.5486ms | 1.8229 KOps/s | 1.4437 KOps/s | $\textbf{\color{#35bf28}+26.27\\%}$ | | test_creation | 39.7510μs | 1.8757μs | 533.1467 KOps/s | 543.6420 KOps/s | $\color{#d91a1a}-1.93\\%$ | | test_creation_empty | 23.5910μs | 7.7328μs | 129.3185 KOps/s | 92.8259 KOps/s | $\textbf{\color{#35bf28}+39.31\\%}$ | | test_creation_nested_1 | 39.4100μs | 9.5359μs | 104.8671 KOps/s | 79.8560 KOps/s | $\textbf{\color{#35bf28}+31.32\\%}$ | | test_creation_nested_2 | 29.3400μs | 11.9780μs | 83.4864 KOps/s | 67.2825 KOps/s | $\textbf{\color{#35bf28}+24.08\\%}$ | | test_clone | 56.9920μs | 10.5256μs | 95.0067 KOps/s | 90.2051 KOps/s | $\textbf{\color{#35bf28}+5.32\\%}$ | | test_getitem[int] | 24.5010μs | 9.9297μs | 100.7081 KOps/s | 99.7747 KOps/s | $\color{#35bf28}+0.94\\%$ | | test_getitem[slice_int] | 48.3310μs | 18.9751μs | 52.7006 KOps/s | 52.2198 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_getitem[range] | 0.1800ms | 35.5540μs | 28.1263 KOps/s | 28.2920 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_getitem[tuple] | 0.1134ms | 17.1118μs | 58.4391 KOps/s | 59.1047 KOps/s | $\color{#d91a1a}-1.13\\%$ | | test_getitem[list] | 0.1649ms | 30.8164μs | 32.4502 KOps/s | 32.0221 KOps/s | $\color{#35bf28}+1.34\\%$ | | test_setitem_dim[int] | 43.4400μs | 22.8714μs | 43.7228 KOps/s | 37.8902 KOps/s | $\textbf{\color{#35bf28}+15.39\\%}$ | | test_setitem_dim[slice_int] | 74.8210μs | 43.5215μs | 22.9772 KOps/s | 21.2013 KOps/s | $\textbf{\color{#35bf28}+8.38\\%}$ | | test_setitem_dim[range] | 82.7520μs | 59.2401μs | 16.8805 KOps/s | 16.1102 KOps/s | $\color{#35bf28}+4.78\\%$ | | test_setitem_dim[tuple] | 57.4910μs | 37.8418μs | 26.4258 KOps/s | 24.1335 KOps/s | $\textbf{\color{#35bf28}+9.50\\%}$ | | test_setitem | 54.6510μs | 14.6748μs | 68.1439 KOps/s | 60.6580 KOps/s | $\textbf{\color{#35bf28}+12.34\\%}$ | | test_set | 55.8010μs | 14.0212μs | 71.3208 KOps/s | 62.9262 KOps/s | $\textbf{\color{#35bf28}+13.34\\%}$ | | test_set_shared | 2.8569ms | 95.0105μs | 10.5252 KOps/s | 10.1978 KOps/s | $\color{#35bf28}+3.21\\%$ | | test_update | 88.9020μs | 16.4635μs | 60.7406 KOps/s | 49.3025 KOps/s | $\textbf{\color{#35bf28}+23.20\\%}$ | | test_update_nested | 62.4010μs | 21.1322μs | 47.3212 KOps/s | 39.3540 KOps/s | $\textbf{\color{#35bf28}+20.24\\%}$ | | test_update__nested | 65.5020μs | 20.5354μs | 48.6964 KOps/s | 46.8452 KOps/s | $\color{#35bf28}+3.95\\%$ | | test_set_nested | 74.1510μs | 14.8294μs | 67.4334 KOps/s | 56.8308 KOps/s | $\textbf{\color{#35bf28}+18.66\\%}$ | | test_set_nested_new | 60.3020μs | 17.5996μs | 56.8195 KOps/s | 49.4252 KOps/s | $\textbf{\color{#35bf28}+14.96\\%}$ | | test_select | 70.6510μs | 29.9132μs | 33.4300 KOps/s | 30.6920 KOps/s | $\textbf{\color{#35bf28}+8.92\\%}$ | | test_select_nested | 83.0020μs | 51.9295μs | 19.2569 KOps/s | 18.7930 KOps/s | $\color{#35bf28}+2.47\\%$ | | test_exclude_nested | 94.2630μs | 70.7003μs | 14.1442 KOps/s | 14.2141 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_empty[True] | 0.3489ms | 0.2965ms | 3.3722 KOps/s | 3.3974 KOps/s | $\color{#d91a1a}-0.74\\%$ | | test_empty[False] | 2.9631μs | 0.9330μs | 1.0718 MOps/s | 1.0902 MOps/s | $\color{#d91a1a}-1.68\\%$ | | test_to | 85.7610μs | 56.8549μs | 17.5886 KOps/s | 16.7091 KOps/s | $\textbf{\color{#35bf28}+5.26\\%}$ | | test_to_nonblocking | 61.4210μs | 33.8454μs | 29.5462 KOps/s | 26.1264 KOps/s | $\textbf{\color{#35bf28}+13.09\\%}$ | | test_unbind_speed | 0.2939ms | 0.2512ms | 3.9811 KOps/s | 3.9761 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_unbind_speed_stack0 | 0.3166ms | 0.2521ms | 3.9666 KOps/s | 3.9712 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_unbind_speed_stack1 | 93.9179ms | 0.7892ms | 1.2672 KOps/s | 1.3921 KOps/s | $\textbf{\color{#d91a1a}-8.98\\%}$ | | test_split | 91.4785ms | 1.5663ms | 638.4663 Ops/s | 635.6309 Ops/s | $\color{#35bf28}+0.45\\%$ | | test_chunk | 91.1198ms | 1.5702ms | 636.8544 Ops/s | 637.3445 Ops/s | $\color{#d91a1a}-0.08\\%$ | | test_creation[device0] | 0.1336ms | 53.7370μs | 18.6091 KOps/s | 17.4431 KOps/s | $\textbf{\color{#35bf28}+6.68\\%}$ | | test_creation_from_tensor | 0.1277ms | 50.4249μs | 19.8315 KOps/s | 18.0915 KOps/s | $\textbf{\color{#35bf28}+9.62\\%}$ | | test_add_one[memmap_tensor0] | 82.5720μs | 6.5102μs | 153.6042 KOps/s | 147.5549 KOps/s | $\color{#35bf28}+4.10\\%$ | | test_contiguous[memmap_tensor0] | 26.7710μs | 0.5724μs | 1.7470 MOps/s | 1.7306 MOps/s | $\color{#35bf28}+0.95\\%$ | | test_stack[memmap_tensor0] | 20.2000μs | 4.4497μs | 224.7334 KOps/s | 219.7850 KOps/s | $\color{#35bf28}+2.25\\%$ | | test_memmaptd_index | 1.0466ms | 0.2476ms | 4.0388 KOps/s | 4.0058 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_memmaptd_index_astensor | 0.5899ms | 0.3115ms | 3.2104 KOps/s | 3.2082 KOps/s | $\color{#35bf28}+0.07\\%$ | | test_memmaptd_index_op | 1.0717ms | 0.5680ms | 1.7606 KOps/s | 1.5962 KOps/s | $\textbf{\color{#35bf28}+10.30\\%}$ | | test_serialize_model | 95.1928ms | 89.9353ms | 11.1191 Ops/s | 10.4796 Ops/s | $\textbf{\color{#35bf28}+6.10\\%}$ | | test_serialize_model_pickle | 1.3526s | 1.2360s | 0.8091 Ops/s | 0.7321 Ops/s | $\textbf{\color{#35bf28}+10.52\\%}$ | | test_serialize_weights | 0.1847s | 97.9815ms | 10.2060 Ops/s | 9.5229 Ops/s | $\textbf{\color{#35bf28}+7.17\\%}$ | | test_serialize_weights_returnearly | 0.2870s | 77.5784ms | 12.8902 Ops/s | 12.8582 Ops/s | $\color{#35bf28}+0.25\\%$ | | test_serialize_weights_pickle | 1.3505s | 1.2480s | 0.8013 Ops/s | 0.8013 Ops/s | $-0.00\\%$ | | test_reshape_pytree | 41.7600μs | 24.4521μs | 40.8962 KOps/s | 40.6365 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_reshape_td | 51.0110μs | 29.5352μs | 33.8579 KOps/s | 33.7432 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_view_pytree | 46.5800μs | 24.4502μs | 40.8994 KOps/s | 40.4742 KOps/s | $\color{#35bf28}+1.05\\%$ | | test_view_td | 88.8210μs | 35.3419μs | 28.2951 KOps/s | 27.2389 KOps/s | $\color{#35bf28}+3.88\\%$ | | test_unbind_pytree | 48.7110μs | 30.2278μs | 33.0822 KOps/s | 33.3271 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_unbind_td | 0.4636ms | 39.6946μs | 25.1923 KOps/s | 26.3966 KOps/s | $\color{#d91a1a}-4.56\\%$ | | test_split_pytree | 51.1110μs | 32.7752μs | 30.5109 KOps/s | 30.5814 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_split_td | 0.5334ms | 35.6938μs | 28.0161 KOps/s | 28.1080 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_add_pytree | 76.5120μs | 36.3018μs | 27.5468 KOps/s | 26.7176 KOps/s | $\color{#35bf28}+3.10\\%$ | | test_add_td | 84.1420μs | 45.2773μs | 22.0861 KOps/s | 18.9584 KOps/s | $\textbf{\color{#35bf28}+16.50\\%}$ | | test_distributed | 1.8846ms | 71.5783μs | 13.9707 KOps/s | 14.6340 KOps/s | $\color{#d91a1a}-4.53\\%$ | | test_tdmodule | 30.0210μs | 13.5364μs | 73.8748 KOps/s | 61.9980 KOps/s | $\textbf{\color{#35bf28}+19.16\\%}$ | | test_tdmodule_dispatch | 42.0410μs | 26.8801μs | 37.2023 KOps/s | 31.3723 KOps/s | $\textbf{\color{#35bf28}+18.58\\%}$ | | test_tdseq | 29.4810μs | 14.5045μs | 68.9442 KOps/s | 57.7328 KOps/s | $\textbf{\color{#35bf28}+19.42\\%}$ | | test_tdseq_dispatch | 45.8210μs | 29.7215μs | 33.6456 KOps/s | 28.9861 KOps/s | $\textbf{\color{#35bf28}+16.08\\%}$ | | test_instantiation_functorch | 1.4801ms | 1.3466ms | 742.6030 Ops/s | 736.5731 Ops/s | $\color{#35bf28}+0.82\\%$ | | test_instantiation_td | 1.4726ms | 0.9703ms | 1.0306 KOps/s | 1.0239 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_exec_functorch | 0.1795ms | 0.1454ms | 6.8763 KOps/s | 6.9844 KOps/s | $\color{#d91a1a}-1.55\\%$ | | test_exec_functional_call | 0.1559ms | 0.1326ms | 7.5393 KOps/s | 7.5298 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_exec_td | 0.1707ms | 0.1325ms | 7.5471 KOps/s | 7.3249 KOps/s | $\color{#35bf28}+3.03\\%$ | | test_exec_td_decorator | 0.6778ms | 0.2107ms | 4.7450 KOps/s | 4.7596 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_vmap_mlp_speed[True-True] | 0.6342ms | 0.5728ms | 1.7457 KOps/s | 1.7081 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_vmap_mlp_speed[True-False] | 0.6391ms | 0.5718ms | 1.7490 KOps/s | 1.7092 KOps/s | $\color{#35bf28}+2.32\\%$ | | test_vmap_mlp_speed[False-True] | 0.5435ms | 0.5094ms | 1.9629 KOps/s | 1.9570 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_vmap_mlp_speed[False-False] | 0.5522ms | 0.5099ms | 1.9610 KOps/s | 1.9539 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 0.7759ms | 0.6623ms | 1.5099 KOps/s | 1.5226 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9684ms | 0.6768ms | 1.4775 KOps/s | 1.5212 KOps/s | $\color{#d91a1a}-2.88\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7257ms | 0.5785ms | 1.7286 KOps/s | 1.6474 KOps/s | $\color{#35bf28}+4.93\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7208ms | 0.5785ms | 1.7285 KOps/s | 1.7591 KOps/s | $\color{#d91a1a}-1.74\\%$ | | test_vmap_transformer_speed[True-True] | 7.8521ms | 7.5875ms | 131.7957 Ops/s | 130.4145 Ops/s | $\color{#35bf28}+1.06\\%$ | | test_vmap_transformer_speed[True-False] | 7.6248ms | 7.5459ms | 132.5225 Ops/s | 128.8243 Ops/s | $\color{#35bf28}+2.87\\%$ | | test_vmap_transformer_speed[False-True] | 7.9490ms | 7.5817ms | 131.8968 Ops/s | 132.1799 Ops/s | $\color{#d91a1a}-0.21\\%$ | | test_vmap_transformer_speed[False-False] | 7.6702ms | 7.5365ms | 132.6869 Ops/s | 132.5313 Ops/s | $\color{#35bf28}+0.12\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 19.1378ms | 18.8524ms | 53.0435 Ops/s | 52.9394 Ops/s | $\color{#35bf28}+0.20\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 19.5916ms | 18.8590ms | 53.0250 Ops/s | 53.0331 Ops/s | $\color{#d91a1a}-0.02\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 18.8561ms | 18.6455ms | 53.6322 Ops/s | 53.0754 Ops/s | $\color{#35bf28}+1.05\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 19.5446ms | 18.7355ms | 53.3745 Ops/s | 53.1085 Ops/s | $\color{#35bf28}+0.50\\%$ | | test_to_module_speed[True] | 2.0921ms | 1.5459ms | 646.8753 Ops/s | 673.8501 Ops/s | $\color{#d91a1a}-4.00\\%$ | | test_to_module_speed[False] | 1.6199ms | 1.5256ms | 655.5009 Ops/s | 682.0032 Ops/s | $\color{#d91a1a}-3.89\\%$ | | test_tc_init | 72.9310μs | 49.2714μs | 20.2958 KOps/s | 18.2225 KOps/s | $\textbf{\color{#35bf28}+11.38\\%}$ | | test_tc_init_nested | 0.1288ms | 96.6659μs | 10.3449 KOps/s | 9.5533 KOps/s | $\textbf{\color{#35bf28}+8.29\\%}$ | | test_tc_first_layer_tensor | 22.3110μs | 3.5236μs | 283.7986 KOps/s | 284.7482 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_tc_first_layer_nontensor | 21.8000μs | 3.5410μs | 282.4047 KOps/s | 282.0316 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_tc_second_layer_tensor | 4.5580μs | 1.1279μs | 886.5801 KOps/s | 901.6739 KOps/s | $\color{#d91a1a}-1.67\\%$ | | test_tc_second_layer_nontensor | 18.7590μs | 4.0266μs | 248.3492 KOps/s | 239.7648 KOps/s | $\color{#35bf28}+3.58\\%$ |