pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[BugFix] Fix (keys, values) in sub #907

Closed vmoens closed 1 month ago

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}28$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 41.3970μs | 22.7223μs | 44.0096 KOps/s | 42.9488 KOps/s | $\color{#35bf28}+2.47\\%$ | | test_plain_set_stack_nested | 63.8190μs | 23.1330μs | 43.2283 KOps/s | 42.6912 KOps/s | $\color{#35bf28}+1.26\\%$ | | test_plain_set_nested_inplace | 65.0420μs | 25.0312μs | 39.9501 KOps/s | 38.7644 KOps/s | $\color{#35bf28}+3.06\\%$ | | test_plain_set_stack_nested_inplace | 55.6240μs | 24.7946μs | 40.3313 KOps/s | 38.7870 KOps/s | $\color{#35bf28}+3.98\\%$ | | test_items | 28.8130μs | 2.6502μs | 377.3367 KOps/s | 372.5310 KOps/s | $\color{#35bf28}+1.29\\%$ | | test_items_nested | 0.5623ms | 0.3695ms | 2.7063 KOps/s | 2.7213 KOps/s | $\color{#d91a1a}-0.55\\%$ | | test_items_nested_locked | 0.5869ms | 0.3751ms | 2.6658 KOps/s | 2.7018 KOps/s | $\color{#d91a1a}-1.33\\%$ | | test_items_nested_leaf | 0.1605ms | 87.9632μs | 11.3684 KOps/s | 11.2806 KOps/s | $\color{#35bf28}+0.78\\%$ | | test_items_stack_nested | 0.7602ms | 0.3716ms | 2.6908 KOps/s | 2.7426 KOps/s | $\color{#d91a1a}-1.89\\%$ | | test_items_stack_nested_leaf | 0.1704ms | 89.4505μs | 11.1794 KOps/s | 11.3340 KOps/s | $\color{#d91a1a}-1.36\\%$ | | test_items_stack_nested_locked | 0.6034ms | 0.3696ms | 2.7060 KOps/s | 2.7500 KOps/s | $\color{#d91a1a}-1.60\\%$ | | test_keys | 28.2130μs | 3.8927μs | 256.8899 KOps/s | 257.5916 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_keys_nested | 0.2977ms | 0.1453ms | 6.8811 KOps/s | 7.0211 KOps/s | $\color{#d91a1a}-1.99\\%$ | | test_keys_nested_locked | 0.7078ms | 0.1521ms | 6.5747 KOps/s | 6.6802 KOps/s | $\color{#d91a1a}-1.58\\%$ | | test_keys_nested_leaf | 0.2384ms | 0.1241ms | 8.0581 KOps/s | 8.1164 KOps/s | $\color{#d91a1a}-0.72\\%$ | | test_keys_stack_nested | 0.2618ms | 0.1457ms | 6.8616 KOps/s | 6.9307 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_keys_stack_nested_leaf | 0.2296ms | 0.1228ms | 8.1436 KOps/s | 8.1600 KOps/s | $\color{#d91a1a}-0.20\\%$ | | test_keys_stack_nested_locked | 0.2869ms | 0.1508ms | 6.6327 KOps/s | 6.6890 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_values | 5.0454μs | 1.1799μs | 847.5023 KOps/s | 881.5315 KOps/s | $\color{#d91a1a}-3.86\\%$ | | test_values_nested | 0.1114ms | 50.1914μs | 19.9237 KOps/s | 19.9377 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_values_nested_locked | 99.0950μs | 49.8736μs | 20.0507 KOps/s | 20.1141 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_values_nested_leaf | 0.1002ms | 44.9677μs | 22.2382 KOps/s | 22.2438 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_values_stack_nested | 0.1342ms | 50.5068μs | 19.7993 KOps/s | 20.0596 KOps/s | $\color{#d91a1a}-1.30\\%$ | | test_values_stack_nested_leaf | 86.7020μs | 44.7836μs | 22.3296 KOps/s | 22.1042 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_values_stack_nested_locked | 93.3240μs | 50.1876μs | 19.9252 KOps/s | 19.6618 KOps/s | $\color{#35bf28}+1.34\\%$ | | test_membership | 4.0519μs | 0.7547μs | 1.3251 MOps/s | 1.0378 MOps/s | $\textbf{\color{#35bf28}+27.68\\%}$ | | test_membership_nested | 25.4480μs | 2.6621μs | 375.6388 KOps/s | 352.2627 KOps/s | $\textbf{\color{#35bf28}+6.64\\%}$ | | test_membership_nested_leaf | 29.9150μs | 2.6633μs | 375.4752 KOps/s | 355.0363 KOps/s | $\textbf{\color{#35bf28}+5.76\\%}$ | | test_membership_stacked_nested | 23.1430μs | 2.6516μs | 377.1337 KOps/s | 367.1076 KOps/s | $\color{#35bf28}+2.73\\%$ | | test_membership_stacked_nested_leaf | 20.6290μs | 2.6405μs | 378.7155 KOps/s | 366.1049 KOps/s | $\color{#35bf28}+3.44\\%$ | | test_membership_nested_last | 0.1291ms | 4.2525μs | 235.1575 KOps/s | 244.1595 KOps/s | $\color{#d91a1a}-3.69\\%$ | | test_membership_nested_leaf_last | 43.8120μs | 4.1427μs | 241.3895 KOps/s | 242.6957 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_membership_stacked_nested_last | 27.2500μs | 4.1068μs | 243.5007 KOps/s | 254.3055 KOps/s | $\color{#d91a1a}-4.25\\%$ | | test_membership_stacked_nested_leaf_last | 19.4760μs | 4.1184μs | 242.8111 KOps/s | 248.3047 KOps/s | $\color{#d91a1a}-2.21\\%$ | | test_nested_getleaf | 39.0020μs | 11.0709μs | 90.3269 KOps/s | 93.5041 KOps/s | $\color{#d91a1a}-3.40\\%$ | | test_nested_get | 54.8830μs | 10.3804μs | 96.3351 KOps/s | 96.8802 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_stacked_getleaf | 31.7990μs | 10.9362μs | 91.4391 KOps/s | 93.4414 KOps/s | $\color{#d91a1a}-2.14\\%$ | | test_stacked_get | 0.2606ms | 10.5305μs | 94.9625 KOps/s | 96.5816 KOps/s | $\color{#d91a1a}-1.68\\%$ | | test_nested_getitemleaf | 39.3540μs | 11.4960μs | 86.9867 KOps/s | 88.9138 KOps/s | $\color{#d91a1a}-2.17\\%$ | | test_nested_getitem | 33.9630μs | 10.5635μs | 94.6654 KOps/s | 97.3102 KOps/s | $\color{#d91a1a}-2.72\\%$ | | test_stacked_getitemleaf | 33.5930μs | 11.4031μs | 87.6951 KOps/s | 89.2123 KOps/s | $\color{#d91a1a}-1.70\\%$ | | test_stacked_getitem | 33.6630μs | 10.5168μs | 95.0859 KOps/s | 96.8387 KOps/s | $\color{#d91a1a}-1.81\\%$ | | test_lock_nested | 1.0027ms | 0.5149ms | 1.9423 KOps/s | 1.6350 KOps/s | $\textbf{\color{#35bf28}+18.80\\%}$ | | test_lock_stack_nested | 0.8449ms | 0.4890ms | 2.0450 KOps/s | 2.0399 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_unlock_nested | 0.8382ms | 0.4376ms | 2.2851 KOps/s | 2.2946 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_unlock_stack_nested | 0.6994ms | 0.4029ms | 2.4820 KOps/s | 2.4852 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_flatten_speed | 0.1967ms | 0.1060ms | 9.4354 KOps/s | 9.2749 KOps/s | $\color{#35bf28}+1.73\\%$ | | test_unflatten_speed | 0.6569ms | 0.4482ms | 2.2313 KOps/s | 2.2290 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_common_ops | 1.9171ms | 1.1469ms | 871.8908 Ops/s | 827.1231 Ops/s | $\textbf{\color{#35bf28}+5.41\\%}$ | | test_creation | 26.0090μs | 2.5031μs | 399.5028 KOps/s | 400.6651 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_creation_empty | 55.5840μs | 20.0006μs | 49.9985 KOps/s | 44.4805 KOps/s | $\textbf{\color{#35bf28}+12.41\\%}$ | | test_creation_nested_1 | 53.6900μs | 23.6549μs | 42.2746 KOps/s | 38.9234 KOps/s | $\textbf{\color{#35bf28}+8.61\\%}$ | | test_creation_nested_2 | 1.3242ms | 27.4953μs | 36.3699 KOps/s | 34.2073 KOps/s | $\textbf{\color{#35bf28}+6.32\\%}$ | | test_clone | 67.9070μs | 17.1473μs | 58.3184 KOps/s | 55.3606 KOps/s | $\textbf{\color{#35bf28}+5.34\\%}$ | | test_getitem[int] | 0.8654ms | 13.1401μs | 76.1030 KOps/s | 74.2859 KOps/s | $\color{#35bf28}+2.45\\%$ | | test_getitem[slice_int] | 0.1324ms | 33.6768μs | 29.6940 KOps/s | 28.5413 KOps/s | $\color{#35bf28}+4.04\\%$ | | test_getitem[range] | 0.1593ms | 58.3569μs | 17.1359 KOps/s | 16.5401 KOps/s | $\color{#35bf28}+3.60\\%$ | | test_getitem[tuple] | 0.1256ms | 27.1581μs | 36.8215 KOps/s | 36.0892 KOps/s | $\color{#35bf28}+2.03\\%$ | | test_getitem[list] | 0.1522ms | 53.1790μs | 18.8044 KOps/s | 18.2029 KOps/s | $\color{#35bf28}+3.30\\%$ | | test_setitem_dim[int] | 61.2740μs | 37.4377μs | 26.7110 KOps/s | 26.2411 KOps/s | $\color{#35bf28}+1.79\\%$ | | test_setitem_dim[slice_int] | 0.1349ms | 75.2224μs | 13.2939 KOps/s | 12.8331 KOps/s | $\color{#35bf28}+3.59\\%$ | | test_setitem_dim[range] | 0.1613ms | 97.5180μs | 10.2545 KOps/s | 10.1237 KOps/s | $\color{#35bf28}+1.29\\%$ | | test_setitem_dim[tuple] | 0.1021ms | 64.8460μs | 15.4212 KOps/s | 15.8374 KOps/s | $\color{#d91a1a}-2.63\\%$ | | test_setitem | 88.7650μs | 30.9344μs | 32.3265 KOps/s | 30.3524 KOps/s | $\textbf{\color{#35bf28}+6.50\\%}$ | | test_set | 87.6430μs | 30.2976μs | 33.0059 KOps/s | 30.6419 KOps/s | $\textbf{\color{#35bf28}+7.72\\%}$ | | test_set_shared | 3.4067ms | 0.2170ms | 4.6074 KOps/s | 4.5915 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_update | 0.8476ms | 41.2699μs | 24.2308 KOps/s | 24.1848 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_update_nested | 0.1128ms | 49.0020μs | 20.4073 KOps/s | 19.3734 KOps/s | $\textbf{\color{#35bf28}+5.34\\%}$ | | test_update__nested | 0.1121ms | 34.5393μs | 28.9525 KOps/s | 27.3815 KOps/s | $\textbf{\color{#35bf28}+5.74\\%}$ | | test_set_nested | 93.1530μs | 33.0043μs | 30.2991 KOps/s | 28.4198 KOps/s | $\textbf{\color{#35bf28}+6.61\\%}$ | | test_set_nested_new | 86.3710μs | 38.1884μs | 26.1860 KOps/s | 25.0105 KOps/s | $\color{#35bf28}+4.70\\%$ | | test_select | 0.1453ms | 55.9143μs | 17.8845 KOps/s | 17.5315 KOps/s | $\color{#35bf28}+2.01\\%$ | | test_select_nested | 0.1162ms | 61.6962μs | 16.2085 KOps/s | 16.8244 KOps/s | $\color{#d91a1a}-3.66\\%$ | | test_exclude_nested | 0.1547ms | 82.5450μs | 12.1146 KOps/s | 12.2687 KOps/s | $\color{#d91a1a}-1.26\\%$ | | test_empty[True] | 0.5343ms | 0.3493ms | 2.8629 KOps/s | 2.9206 KOps/s | $\color{#d91a1a}-1.97\\%$ | | test_empty[False] | 7.2913μs | 1.2757μs | 783.8547 KOps/s | 767.2170 KOps/s | $\color{#35bf28}+2.17\\%$ | | test_unbind_speed | 0.4695ms | 0.3232ms | 3.0938 KOps/s | 3.0006 KOps/s | $\color{#35bf28}+3.11\\%$ | | test_unbind_speed_stack0 | 0.6772ms | 0.3227ms | 3.0991 KOps/s | 3.0645 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_unbind_speed_stack1 | 77.2287ms | 0.8258ms | 1.2109 KOps/s | 1.2703 KOps/s | $\color{#d91a1a}-4.68\\%$ | | test_split | 74.9783ms | 2.2543ms | 443.5996 Ops/s | 394.6534 Ops/s | $\textbf{\color{#35bf28}+12.40\\%}$ | | test_chunk | 75.8780ms | 2.2670ms | 441.1028 Ops/s | 454.7622 Ops/s | $\color{#d91a1a}-3.00\\%$ | | test_creation[device0] | 0.2238ms | 0.1217ms | 8.2189 KOps/s | 7.9083 KOps/s | $\color{#35bf28}+3.93\\%$ | | test_creation_from_tensor | 3.6174ms | 0.1226ms | 8.1533 KOps/s | 8.2892 KOps/s | $\color{#d91a1a}-1.64\\%$ | | test_add_one[memmap_tensor0] | 0.1557ms | 7.7420μs | 129.1661 KOps/s | 121.5098 KOps/s | $\textbf{\color{#35bf28}+6.30\\%}$ | | test_contiguous[memmap_tensor0] | 17.4930μs | 2.1958μs | 455.4133 KOps/s | 439.9498 KOps/s | $\color{#35bf28}+3.51\\%$ | | test_stack[memmap_tensor0] | 45.2050μs | 5.7251μs | 174.6685 KOps/s | 159.9693 KOps/s | $\textbf{\color{#35bf28}+9.19\\%}$ | | test_memmaptd_index | 1.0570ms | 0.4420ms | 2.2624 KOps/s | 2.2521 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_memmaptd_index_astensor | 0.7569ms | 0.5219ms | 1.9159 KOps/s | 1.9262 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_memmaptd_index_op | 1.8859ms | 1.0994ms | 909.5905 Ops/s | 786.3875 Ops/s | $\textbf{\color{#35bf28}+15.67\\%}$ | | test_serialize_model | 0.2012s | 0.1412s | 7.0825 Ops/s | 7.6632 Ops/s | $\textbf{\color{#d91a1a}-7.58\\%}$ | | test_serialize_model_pickle | 0.4476s | 0.3950s | 2.5316 Ops/s | 2.5328 Ops/s | $\color{#d91a1a}-0.05\\%$ | | test_serialize_weights | 0.1311s | 0.1244s | 8.0365 Ops/s | 7.1286 Ops/s | $\textbf{\color{#35bf28}+12.74\\%}$ | | test_serialize_weights_returnearly | 0.2449s | 0.1823s | 5.4867 Ops/s | 6.0668 Ops/s | $\textbf{\color{#d91a1a}-9.56\\%}$ | | test_serialize_weights_pickle | 0.4946s | 0.4199s | 2.3814 Ops/s | 2.5223 Ops/s | $\textbf{\color{#d91a1a}-5.59\\%}$ | | test_serialize_weights_filesystem | 0.1460s | 0.1430s | 6.9911 Ops/s | 7.0024 Ops/s | $\color{#d91a1a}-0.16\\%$ | | test_serialize_model_filesystem | 0.1575s | 0.1518s | 6.5863 Ops/s | 6.5629 Ops/s | $\color{#35bf28}+0.36\\%$ | | test_reshape_pytree | 84.6380μs | 39.5313μs | 25.2964 KOps/s | 24.9644 KOps/s | $\color{#35bf28}+1.33\\%$ | | test_reshape_td | 0.1182ms | 49.1289μs | 20.3546 KOps/s | 19.3407 KOps/s | $\textbf{\color{#35bf28}+5.24\\%}$ | | test_view_pytree | 93.9550μs | 39.5774μs | 25.2669 KOps/s | 25.3909 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_view_td | 0.1500ms | 56.8299μs | 17.5964 KOps/s | 17.5768 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_unbind_pytree | 71.7140μs | 35.7072μs | 28.0055 KOps/s | 27.6903 KOps/s | $\color{#35bf28}+1.14\\%$ | | test_unbind_td | 0.3267ms | 47.9816μs | 20.8413 KOps/s | 20.5346 KOps/s | $\color{#35bf28}+1.49\\%$ | | test_split_pytree | 82.6240μs | 38.8807μs | 25.7197 KOps/s | 25.0254 KOps/s | $\color{#35bf28}+2.77\\%$ | | test_split_td | 75.4821ms | 71.5753μs | 13.9713 KOps/s | 15.4184 KOps/s | $\textbf{\color{#d91a1a}-9.39\\%}$ | | test_add_pytree | 0.1321ms | 43.5371μs | 22.9689 KOps/s | 21.6018 KOps/s | $\textbf{\color{#35bf28}+6.33\\%}$ | | test_add_td | 0.1878ms | 87.2304μs | 11.4639 KOps/s | 10.5285 KOps/s | $\textbf{\color{#35bf28}+8.88\\%}$ | | test_distributed | 0.2670ms | 0.1329ms | 7.5245 KOps/s | 7.6189 KOps/s | $\color{#d91a1a}-1.24\\%$ | | test_tdmodule | 33.2420μs | 16.9692μs | 58.9304 KOps/s | 51.7545 KOps/s | $\textbf{\color{#35bf28}+13.87\\%}$ | | test_tdmodule_dispatch | 59.9920μs | 36.3369μs | 27.5202 KOps/s | 25.0349 KOps/s | $\textbf{\color{#35bf28}+9.93\\%}$ | | test_tdseq | 43.0200μs | 19.8579μs | 50.3579 KOps/s | 47.5730 KOps/s | $\textbf{\color{#35bf28}+5.85\\%}$ | | test_tdseq_dispatch | 62.7470μs | 40.6067μs | 24.6265 KOps/s | 23.0984 KOps/s | $\textbf{\color{#35bf28}+6.62\\%}$ | | test_instantiation_functorch | 1.7539ms | 1.5976ms | 625.9477 Ops/s | 626.3290 Ops/s | $\color{#d91a1a}-0.06\\%$ | | test_instantiation_td | 2.2004ms | 1.1651ms | 858.3297 Ops/s | 863.5623 Ops/s | $\color{#d91a1a}-0.61\\%$ | | test_exec_functorch | 0.3418ms | 0.1806ms | 5.5362 KOps/s | 5.5202 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_exec_functional_call | 5.3986ms | 0.1736ms | 5.7592 KOps/s | 5.7652 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_exec_td | 0.3373ms | 0.1748ms | 5.7198 KOps/s | 5.7485 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_exec_td_decorator | 0.4673ms | 0.2617ms | 3.8207 KOps/s | 3.8998 KOps/s | $\color{#d91a1a}-2.03\\%$ | | test_vmap_mlp_speed[True-True] | 0.8755ms | 0.6111ms | 1.6363 KOps/s | 1.6015 KOps/s | $\color{#35bf28}+2.17\\%$ | | test_vmap_mlp_speed[True-False] | 0.8763ms | 0.6156ms | 1.6244 KOps/s | 1.6107 KOps/s | $\color{#35bf28}+0.85\\%$ | | test_vmap_mlp_speed[False-True] | 0.8169ms | 0.5013ms | 1.9947 KOps/s | 1.9531 KOps/s | $\color{#35bf28}+2.13\\%$ | | test_vmap_mlp_speed[False-False] | 1.1274ms | 0.4995ms | 2.0019 KOps/s | 1.9537 KOps/s | $\color{#35bf28}+2.47\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.0838ms | 0.7077ms | 1.4131 KOps/s | 1.4084 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 1.1696ms | 0.7070ms | 1.4144 KOps/s | 1.3557 KOps/s | $\color{#35bf28}+4.33\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 1.1555ms | 0.6042ms | 1.6552 KOps/s | 1.7191 KOps/s | $\color{#d91a1a}-3.72\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.9156ms | 0.5779ms | 1.7303 KOps/s | 1.7153 KOps/s | $\color{#35bf28}+0.88\\%$ | | test_to_module_speed[True] | 2.3631ms | 1.8233ms | 548.4482 Ops/s | 556.4121 Ops/s | $\color{#d91a1a}-1.43\\%$ | | test_to_module_speed[False] | 2.2468ms | 1.7745ms | 563.5460 Ops/s | 562.0112 Ops/s | $\color{#35bf28}+0.27\\%$ | | test_tc_init | 96.4300μs | 45.5573μs | 21.9504 KOps/s | 20.4369 KOps/s | $\textbf{\color{#35bf28}+7.41\\%}$ | | test_tc_init_nested | 0.1679ms | 94.6353μs | 10.5669 KOps/s | 10.1004 KOps/s | $\color{#35bf28}+4.62\\%$ | | test_tc_first_layer_tensor | 54.1100μs | 9.0989μs | 109.9030 KOps/s | 108.7524 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_tc_first_layer_nontensor | 53.6600μs | 9.1465μs | 109.3312 KOps/s | 109.0340 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_tc_second_layer_tensor | 19.6860μs | 2.8978μs | 345.0876 KOps/s | 351.9284 KOps/s | $\color{#d91a1a}-1.94\\%$ | | test_tc_second_layer_nontensor | 55.8040μs | 10.2803μs | 97.2738 KOps/s | 97.3509 KOps/s | $\color{#d91a1a}-0.08\\%$ | | test_unbind | 97.5956ms | 13.7305ms | 72.8308 Ops/s | 75.1765 Ops/s | $\color{#d91a1a}-3.12\\%$ | | test_full_like | 8.9573ms | 7.2619ms | 137.7050 Ops/s | 139.4639 Ops/s | $\color{#d91a1a}-1.26\\%$ | | test_zeros_like | 13.0549ms | 6.4885ms | 154.1185 Ops/s | 159.6210 Ops/s | $\color{#d91a1a}-3.45\\%$ | | test_ones_like | 14.5941ms | 7.6714ms | 130.3535 Ops/s | 140.7343 Ops/s | $\textbf{\color{#d91a1a}-7.38\\%}$ | | test_clone | 14.1756ms | 9.3179ms | 107.3198 Ops/s | 113.6487 Ops/s | $\textbf{\color{#d91a1a}-5.57\\%}$ | | test_squeeze | 64.1000μs | 13.9890μs | 71.4846 KOps/s | 69.6551 KOps/s | $\color{#35bf28}+2.63\\%$ | | test_unsqueeze | 0.1957ms | 97.6287μs | 10.2429 KOps/s | 9.2213 KOps/s | $\textbf{\color{#35bf28}+11.08\\%}$ | | test_split | 0.4519ms | 0.2085ms | 4.7951 KOps/s | 4.6939 KOps/s | $\color{#35bf28}+2.16\\%$ | | test_permute | 0.4665ms | 0.2265ms | 4.4147 KOps/s | 4.3355 KOps/s | $\color{#35bf28}+1.83\\%$ | | test_stack | 31.6002ms | 25.2329ms | 39.6308 Ops/s | 41.3034 Ops/s | $\color{#d91a1a}-4.05\\%$ | | test_cat | 29.5932ms | 25.0251ms | 39.9599 Ops/s | 41.5027 Ops/s | $\color{#d91a1a}-3.72\\%$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 26.9410μs | 15.4074μs | 64.9040 KOps/s | 59.5570 KOps/s | $\textbf{\color{#35bf28}+8.98\\%}$ | | test_plain_set_stack_nested | 37.0600μs | 15.3124μs | 65.3066 KOps/s | 59.6148 KOps/s | $\textbf{\color{#35bf28}+9.55\\%}$ | | test_plain_set_nested_inplace | 45.7110μs | 16.4164μs | 60.9147 KOps/s | 55.9982 KOps/s | $\textbf{\color{#35bf28}+8.78\\%}$ | | test_plain_set_stack_nested_inplace | 43.2710μs | 16.4360μs | 60.8421 KOps/s | 56.4249 KOps/s | $\textbf{\color{#35bf28}+7.83\\%}$ | | test_items | 16.3400μs | 4.6081μs | 217.0075 KOps/s | 214.4173 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_items_nested | 0.4405ms | 0.3875ms | 2.5808 KOps/s | 2.5366 KOps/s | $\color{#35bf28}+1.74\\%$ | | test_items_nested_locked | 0.4290ms | 0.3909ms | 2.5583 KOps/s | 2.5504 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_items_nested_leaf | 0.1064ms | 85.5378μs | 11.6907 KOps/s | 11.5313 KOps/s | $\color{#35bf28}+1.38\\%$ | | test_items_stack_nested | 0.4498ms | 0.3952ms | 2.5303 KOps/s | 2.5347 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_items_stack_nested_leaf | 0.1073ms | 86.1002μs | 11.6144 KOps/s | 11.5491 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_items_stack_nested_locked | 0.4273ms | 0.3988ms | 2.5073 KOps/s | 2.5264 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_keys | 21.9510μs | 4.3641μs | 229.1415 KOps/s | 226.0935 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_keys_nested | 91.7630μs | 67.1304μs | 14.8964 KOps/s | 15.1861 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_keys_nested_locked | 1.8699ms | 71.0104μs | 14.0824 KOps/s | 13.7021 KOps/s | $\color{#35bf28}+2.78\\%$ | | test_keys_nested_leaf | 79.6810μs | 56.0910μs | 17.8282 KOps/s | 17.2879 KOps/s | $\color{#35bf28}+3.13\\%$ | | test_keys_stack_nested | 86.1620μs | 65.3446μs | 15.3035 KOps/s | 15.3780 KOps/s | $\color{#d91a1a}-0.48\\%$ | | test_keys_stack_nested_leaf | 80.6520μs | 56.7201μs | 17.6304 KOps/s | 17.3493 KOps/s | $\color{#35bf28}+1.62\\%$ | | test_keys_stack_nested_locked | 0.1014ms | 71.9450μs | 13.8995 KOps/s | 13.7432 KOps/s | $\color{#35bf28}+1.14\\%$ | | test_values | 8.6567μs | 1.7559μs | 569.5081 KOps/s | 567.3254 KOps/s | $\color{#35bf28}+0.38\\%$ | | test_values_nested | 54.9710μs | 33.8518μs | 29.5406 KOps/s | 29.5624 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_values_nested_locked | 58.6110μs | 35.8708μs | 27.8778 KOps/s | 28.0283 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_values_nested_leaf | 44.6310μs | 30.0903μs | 33.2333 KOps/s | 33.3791 KOps/s | $\color{#d91a1a}-0.44\\%$ | | test_values_stack_nested | 60.0510μs | 34.6169μs | 28.8876 KOps/s | 29.4781 KOps/s | $\color{#d91a1a}-2.00\\%$ | | test_values_stack_nested_leaf | 50.7010μs | 30.9576μs | 32.3022 KOps/s | 33.1711 KOps/s | $\color{#d91a1a}-2.62\\%$ | | test_values_stack_nested_locked | 54.4920μs | 36.7370μs | 27.2205 KOps/s | 28.0019 KOps/s | $\color{#d91a1a}-2.79\\%$ | | test_membership | 1.4145μs | 0.5450μs | 1.8349 MOps/s | 1.8462 MOps/s | $\color{#d91a1a}-0.61\\%$ | | test_membership_nested | 18.1200μs | 2.0881μs | 478.9077 KOps/s | 471.6394 KOps/s | $\color{#35bf28}+1.54\\%$ | | test_membership_nested_leaf | 16.0755μs | 2.0520μs | 487.3194 KOps/s | 488.4312 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_membership_stacked_nested | 24.7510μs | 2.0837μs | 479.9114 KOps/s | 472.1140 KOps/s | $\color{#35bf28}+1.65\\%$ | | test_membership_stacked_nested_leaf | 31.7910μs | 2.0463μs | 488.6778 KOps/s | 479.9480 KOps/s | $\color{#35bf28}+1.82\\%$ | | test_membership_nested_last | 20.6810μs | 3.0028μs | 333.0175 KOps/s | 325.2360 KOps/s | $\color{#35bf28}+2.39\\%$ | | test_membership_nested_leaf_last | 31.0510μs | 2.9933μs | 334.0821 KOps/s | 331.4320 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_membership_stacked_nested_last | 24.7100μs | 4.3758μs | 228.5318 KOps/s | 333.7789 KOps/s | $\textbf{\color{#d91a1a}-31.53\\%}$ | | test_membership_stacked_nested_leaf_last | 22.5610μs | 4.3195μs | 231.5083 KOps/s | 328.2295 KOps/s | $\textbf{\color{#d91a1a}-29.47\\%}$ | | test_nested_getleaf | 37.7910μs | 7.9930μs | 125.1099 KOps/s | 124.3339 KOps/s | $\color{#35bf28}+0.62\\%$ | | test_nested_get | 30.4000μs | 7.5451μs | 132.5367 KOps/s | 131.2504 KOps/s | $\color{#35bf28}+0.98\\%$ | | test_stacked_getleaf | 29.4310μs | 8.0503μs | 124.2196 KOps/s | 124.3413 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_stacked_get | 32.9410μs | 7.5112μs | 133.1353 KOps/s | 131.9112 KOps/s | $\color{#35bf28}+0.93\\%$ | | test_nested_getitemleaf | 23.5910μs | 8.1293μs | 123.0121 KOps/s | 122.4599 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_nested_getitem | 27.6510μs | 7.6819μs | 130.1763 KOps/s | 128.8586 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_stacked_getitemleaf | 22.6410μs | 8.1915μs | 122.0781 KOps/s | 121.6771 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_stacked_getitem | 31.8810μs | 7.7037μs | 129.8077 KOps/s | 129.3669 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_lock_nested | 4.3976ms | 0.4848ms | 2.0626 KOps/s | 2.0714 KOps/s | $\color{#d91a1a}-0.42\\%$ | | test_lock_stack_nested | 0.4736ms | 0.4341ms | 2.3036 KOps/s | 2.2499 KOps/s | $\color{#35bf28}+2.39\\%$ | | test_unlock_nested | 0.8472ms | 0.4007ms | 2.4958 KOps/s | 2.4614 KOps/s | $\color{#35bf28}+1.40\\%$ | | test_unlock_stack_nested | 0.4092ms | 0.3547ms | 2.8195 KOps/s | 2.7658 KOps/s | $\color{#35bf28}+1.94\\%$ | | test_flatten_speed | 0.1992ms | 0.1069ms | 9.3531 KOps/s | 9.5155 KOps/s | $\color{#d91a1a}-1.71\\%$ | | test_unflatten_speed | 0.3455ms | 0.2966ms | 3.3710 KOps/s | 3.3776 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_common_ops | 1.6971ms | 1.2787ms | 782.0374 Ops/s | 748.5977 Ops/s | $\color{#35bf28}+4.47\\%$ | | test_creation | 17.0500μs | 1.9916μs | 502.1201 KOps/s | 504.0542 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_creation_empty | 36.9310μs | 14.3929μs | 69.4789 KOps/s | 59.4282 KOps/s | $\textbf{\color{#35bf28}+16.91\\%}$ | | test_creation_nested_1 | 41.9510μs | 16.3887μs | 61.0175 KOps/s | 52.5891 KOps/s | $\textbf{\color{#35bf28}+16.03\\%}$ | | test_creation_nested_2 | 43.8600μs | 18.8916μs | 52.9336 KOps/s | 45.4877 KOps/s | $\textbf{\color{#35bf28}+16.37\\%}$ | | test_clone | 57.3610μs | 31.3326μs | 31.9157 KOps/s | 32.0896 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_getitem[int] | 1.1849ms | 17.7399μs | 56.3701 KOps/s | 57.3979 KOps/s | $\color{#d91a1a}-1.79\\%$ | | test_getitem[slice_int] | 0.1538ms | 29.8664μs | 33.4825 KOps/s | 34.2937 KOps/s | $\color{#d91a1a}-2.37\\%$ | | test_getitem[range] | 0.3110ms | 0.1195ms | 8.3690 KOps/s | 8.3887 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_getitem[tuple] | 0.1502ms | 26.2479μs | 38.0983 KOps/s | 38.5674 KOps/s | $\color{#d91a1a}-1.22\\%$ | | test_getitem[list] | 0.2773ms | 0.1089ms | 9.1793 KOps/s | 9.3232 KOps/s | $\color{#d91a1a}-1.54\\%$ | | test_setitem_dim[int] | 74.0720μs | 51.7468μs | 19.3249 KOps/s | 18.1243 KOps/s | $\textbf{\color{#35bf28}+6.62\\%}$ | | test_setitem_dim[slice_int] | 0.1005ms | 77.0893μs | 12.9720 KOps/s | 12.6242 KOps/s | $\color{#35bf28}+2.76\\%$ | | test_setitem_dim[range] | 0.1731ms | 0.1412ms | 7.0816 KOps/s | 7.0082 KOps/s | $\color{#35bf28}+1.05\\%$ | | test_setitem_dim[tuple] | 0.1619ms | 70.1824μs | 14.2486 KOps/s | 13.8029 KOps/s | $\color{#35bf28}+3.23\\%$ | | test_setitem | 78.0820μs | 43.3361μs | 23.0754 KOps/s | 22.6521 KOps/s | $\color{#35bf28}+1.87\\%$ | | test_set | 69.6410μs | 41.7419μs | 23.9567 KOps/s | 22.8678 KOps/s | $\color{#35bf28}+4.76\\%$ | | test_set_shared | 0.3903ms | 55.0147μs | 18.1770 KOps/s | 18.1636 KOps/s | $\color{#35bf28}+0.07\\%$ | | test_update | 87.5620μs | 51.6162μs | 19.3738 KOps/s | 18.7866 KOps/s | $\color{#35bf28}+3.13\\%$ | | test_update_nested | 85.8320μs | 59.8592μs | 16.7059 KOps/s | 16.5913 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_update__nested | 0.1336ms | 68.3630μs | 14.6278 KOps/s | 15.9756 KOps/s | $\textbf{\color{#d91a1a}-8.44\\%}$ | | test_set_nested | 77.2620μs | 47.9349μs | 20.8616 KOps/s | 22.0050 KOps/s | $\textbf{\color{#d91a1a}-5.20\\%}$ | | test_set_nested_new | 74.5720μs | 51.7860μs | 19.3103 KOps/s | 20.0040 KOps/s | $\color{#d91a1a}-3.47\\%$ | | test_select | 0.1003ms | 67.3722μs | 14.8429 KOps/s | 15.3017 KOps/s | $\color{#d91a1a}-3.00\\%$ | | test_select_nested | 0.4868ms | 54.3975μs | 18.3832 KOps/s | 18.8719 KOps/s | $\color{#d91a1a}-2.59\\%$ | | test_exclude_nested | 99.2020μs | 75.0972μs | 13.3161 KOps/s | 13.7418 KOps/s | $\color{#d91a1a}-3.10\\%$ | | test_empty[True] | 0.3401ms | 0.3003ms | 3.3295 KOps/s | 3.3730 KOps/s | $\color{#d91a1a}-1.29\\%$ | | test_empty[False] | 2.7380μs | 0.9330μs | 1.0718 MOps/s | 1.0867 MOps/s | $\color{#d91a1a}-1.37\\%$ | | test_to | 57.8710μs | 38.0436μs | 26.2856 KOps/s | 27.1533 KOps/s | $\color{#d91a1a}-3.20\\%$ | | test_to_nonblocking | 53.4410μs | 23.7716μs | 42.0669 KOps/s | 42.1143 KOps/s | $\color{#d91a1a}-0.11\\%$ | | test_unbind_speed | 0.3418ms | 0.3070ms | 3.2573 KOps/s | 3.1425 KOps/s | $\color{#35bf28}+3.65\\%$ | | test_unbind_speed_stack0 | 0.3513ms | 0.3014ms | 3.3174 KOps/s | 3.2050 KOps/s | $\color{#35bf28}+3.51\\%$ | | test_unbind_speed_stack1 | 88.9872ms | 0.7859ms | 1.2725 KOps/s | 1.2597 KOps/s | $\color{#35bf28}+1.01\\%$ | | test_split | 91.7578ms | 2.3881ms | 418.7425 Ops/s | 420.8472 Ops/s | $\color{#d91a1a}-0.50\\%$ | | test_chunk | 91.2964ms | 2.3755ms | 420.9599 Ops/s | 419.4262 Ops/s | $\color{#35bf28}+0.37\\%$ | | test_creation[device0] | 0.1581ms | 0.1044ms | 9.5781 KOps/s | 9.5118 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_creation_from_tensor | 0.1564ms | 0.1025ms | 9.7561 KOps/s | 9.7913 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_add_one[memmap_tensor0] | 26.0510μs | 10.1685μs | 98.3431 KOps/s | 93.5751 KOps/s | $\textbf{\color{#35bf28}+5.10\\%}$ | | test_contiguous[memmap_tensor0] | 21.5500μs | 2.1845μs | 457.7724 KOps/s | 448.4145 KOps/s | $\color{#35bf28}+2.09\\%$ | | test_stack[memmap_tensor0] | 52.5110μs | 7.1864μs | 139.1509 KOps/s | 139.6515 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_memmaptd_index | 1.1103ms | 0.4469ms | 2.2375 KOps/s | 2.2420 KOps/s | $\color{#d91a1a}-0.20\\%$ | | test_memmaptd_index_astensor | 0.7734ms | 0.5098ms | 1.9614 KOps/s | 1.8550 KOps/s | $\textbf{\color{#35bf28}+5.74\\%}$ | | test_memmaptd_index_op | 1.4590ms | 1.0311ms | 969.8087 Ops/s | 909.8104 Ops/s | $\textbf{\color{#35bf28}+6.59\\%}$ | | test_serialize_model | 99.3850ms | 94.9508ms | 10.5318 Ops/s | 10.1877 Ops/s | $\color{#35bf28}+3.38\\%$ | | test_serialize_model_pickle | 1.3694s | 1.2395s | 0.8068 Ops/s | 0.8064 Ops/s | $\color{#35bf28}+0.05\\%$ | | test_serialize_weights | 0.1847s | 0.1017s | 9.8307 Ops/s | 9.4492 Ops/s | $\color{#35bf28}+4.04\\%$ | | test_serialize_weights_returnearly | 0.3139s | 88.4810ms | 11.3019 Ops/s | 11.6390 Ops/s | $\color{#d91a1a}-2.90\\%$ | | test_serialize_weights_pickle | 1.3511s | 1.2362s | 0.8089 Ops/s | 0.8086 Ops/s | $\color{#35bf28}+0.04\\%$ | | test_reshape_pytree | 68.9310μs | 38.8653μs | 25.7299 KOps/s | 25.4244 KOps/s | $\color{#35bf28}+1.20\\%$ | | test_reshape_td | 68.6820μs | 48.1644μs | 20.7622 KOps/s | 21.9670 KOps/s | $\textbf{\color{#d91a1a}-5.48\\%}$ | | test_view_pytree | 75.1420μs | 39.2586μs | 25.4721 KOps/s | 24.5007 KOps/s | $\color{#35bf28}+3.96\\%$ | | test_view_td | 0.2260ms | 58.2442μs | 17.1691 KOps/s | 19.5174 KOps/s | $\textbf{\color{#d91a1a}-12.03\\%}$ | | test_unbind_pytree | 0.1614ms | 37.8559μs | 26.4160 KOps/s | 26.0910 KOps/s | $\color{#35bf28}+1.25\\%$ | | test_unbind_td | 0.4196ms | 46.3014μs | 21.5976 KOps/s | 21.0134 KOps/s | $\color{#35bf28}+2.78\\%$ | | test_split_pytree | 80.6210μs | 51.4238μs | 19.4462 KOps/s | 19.4300 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_split_td | 0.4968ms | 61.2298μs | 16.3319 KOps/s | 16.1170 KOps/s | $\color{#35bf28}+1.33\\%$ | | test_add_pytree | 0.1003ms | 60.7351μs | 16.4649 KOps/s | 16.4037 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_add_td | 0.1355ms | 91.8794μs | 10.8838 KOps/s | 10.1946 KOps/s | $\textbf{\color{#35bf28}+6.76\\%}$ | | test_compile_add_one_nested[tensordict-compile] | 0.4089ms | 0.2113ms | 4.7316 KOps/s | 4.7349 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.2598ms | 0.1738ms | 5.7535 KOps/s | 5.7726 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.1806ms | 0.1465ms | 6.8260 KOps/s | 6.7872 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.2483ms | 0.1956ms | 5.1132 KOps/s | 5.0622 KOps/s | $\color{#35bf28}+1.01\\%$ | | test_compile_copy_nested[tensordict-compile] | 45.0810μs | 21.6034μs | 46.2889 KOps/s | 44.6837 KOps/s | $\color{#35bf28}+3.59\\%$ | | test_compile_copy_nested[tensordict-eager] | 76.8810μs | 48.8530μs | 20.4696 KOps/s | 20.5212 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_compile_copy_nested[pytree-compile] | 99.2830μs | 72.1268μs | 13.8645 KOps/s | 13.6492 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_compile_copy_nested[pytree-eager] | 83.0610μs | 59.7963μs | 16.7234 KOps/s | 16.7125 KOps/s | $\color{#35bf28}+0.07\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.4068ms | 0.3283ms | 3.0462 KOps/s | 3.0349 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.2561ms | 0.2222ms | 4.5010 KOps/s | 4.4932 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.1704ms | 0.1311ms | 7.6299 KOps/s | 7.6081 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.1171ms | 62.8806μs | 15.9032 KOps/s | 15.8431 KOps/s | $\color{#35bf28}+0.38\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.3707ms | 0.3278ms | 3.0502 KOps/s | 3.0715 KOps/s | $\color{#d91a1a}-0.69\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.7052ms | 0.6426ms | 1.5562 KOps/s | 1.5523 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.3214ms | 0.2723ms | 3.6722 KOps/s | 3.6849 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.4068ms | 0.3298ms | 3.0324 KOps/s | 3.0189 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.1556ms | 75.0415μs | 13.3260 KOps/s | 13.2308 KOps/s | $\color{#35bf28}+0.72\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.2839ms | 0.1323ms | 7.5601 KOps/s | 7.4552 KOps/s | $\color{#35bf28}+1.41\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.6151ms | 0.5427ms | 1.8426 KOps/s | 1.8241 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.3603ms | 0.3273ms | 3.0552 KOps/s | 3.0600 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_compile_copy_flat[tensordict-compile] | 40.6810μs | 18.6343μs | 53.6645 KOps/s | 52.9300 KOps/s | $\color{#35bf28}+1.39\\%$ | | test_compile_copy_flat[tensordict-eager] | 48.0710μs | 31.6253μs | 31.6202 KOps/s | 31.1130 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1093ms | 75.4315μs | 13.2571 KOps/s | 13.2171 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_compile_copy_flat[pytree-eager] | 91.8220μs | 60.3925μs | 16.5583 KOps/s | 16.4098 KOps/s | $\color{#35bf28}+0.91\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.5066ms | 0.9271ms | 1.0787 KOps/s | 1.0595 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.4825ms | 3.3472ms | 298.7546 Ops/s | 290.0832 Ops/s | $\color{#35bf28}+2.99\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.4925ms | 0.9134ms | 1.0948 KOps/s | 1.0840 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_compile_assign_and_add[pytree-eager] | 4.5653ms | 3.3909ms | 294.9053 Ops/s | 291.7828 Ops/s | $\color{#35bf28}+1.07\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.1592ms | 0.1139ms | 8.7772 KOps/s | 8.8538 KOps/s | $\color{#d91a1a}-0.87\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.2493ms | 67.8281μs | 14.7432 KOps/s | 14.8670 KOps/s | $\color{#d91a1a}-0.83\\%$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.1358ms | 0.1041ms | 9.6027 KOps/s | 9.5039 KOps/s | $\color{#35bf28}+1.04\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 96.4820μs | 46.9714μs | 21.2896 KOps/s | 21.0671 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_compile_indexing[tensor-pytree-compile] | 0.1394ms | 0.1065ms | 9.3916 KOps/s | 9.3252 KOps/s | $\color{#35bf28}+0.71\\%$ | | test_compile_indexing[tensor-pytree-eager] | 80.3120μs | 48.2253μs | 20.7360 KOps/s | 21.1023 KOps/s | $\color{#d91a1a}-1.74\\%$ | | test_compile_indexing[slice-tensordict-compile] | 0.1721ms | 0.1401ms | 7.1364 KOps/s | 7.0571 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.1887ms | 26.9879μs | 37.0536 KOps/s | 36.5376 KOps/s | $\color{#35bf28}+1.41\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.1746ms | 0.1312ms | 7.6214 KOps/s | 7.5215 KOps/s | $\color{#35bf28}+1.33\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 51.1410μs | 22.9556μs | 43.5623 KOps/s | 42.8208 KOps/s | $\color{#35bf28}+1.73\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.1642ms | 0.1317ms | 7.5954 KOps/s | 7.4833 KOps/s | $\color{#35bf28}+1.50\\%$ | | test_compile_indexing[slice-pytree-eager] | 47.5510μs | 23.0950μs | 43.2994 KOps/s | 43.1073 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_compile_indexing[int-tensordict-compile] | 0.1664ms | 0.1398ms | 7.1545 KOps/s | 7.1064 KOps/s | $\color{#35bf28}+0.68\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.4971ms | 27.2378μs | 36.7137 KOps/s | 37.1769 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.1614ms | 0.1316ms | 7.5998 KOps/s | 7.5092 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_compile_indexing[int-tensorclass-eager] | 59.1110μs | 23.0454μs | 43.3926 KOps/s | 43.1061 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_compile_indexing[int-pytree-compile] | 0.1643ms | 0.1315ms | 7.6057 KOps/s | 7.5312 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_compile_indexing[int-pytree-eager] | 46.3910μs | 22.8789μs | 43.7084 KOps/s | 43.1523 KOps/s | $\color{#35bf28}+1.29\\%$ | | test_mod_add[eager] | 70.6310μs | 37.9329μs | 26.3623 KOps/s | 26.3564 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_mod_add[compile] | 0.1729ms | 68.0717μs | 14.6904 KOps/s | 14.4239 KOps/s | $\color{#35bf28}+1.85\\%$ | | test_mod_add[compile-overhead] | 0.2628ms | 0.1485ms | 6.7328 KOps/s | 6.6398 KOps/s | $\color{#35bf28}+1.40\\%$ | | test_mod_wrap[eager] | 0.4046ms | 0.2565ms | 3.8985 KOps/s | 3.8738 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_mod_wrap[compile] | 1.2240ms | 0.2976ms | 3.3599 KOps/s | 3.3256 KOps/s | $\color{#35bf28}+1.03\\%$ | | test_mod_wrap[compile-overhead] | 8.2351ms | 4.2980ms | 232.6657 Ops/s | 233.0126 Ops/s | $\color{#d91a1a}-0.15\\%$ | | test_mod_wrap_and_backward[eager] | 1.5965ms | 1.4536ms | 687.9593 Ops/s | 685.5892 Ops/s | $\color{#35bf28}+0.35\\%$ | | test_mod_wrap_and_backward[compile] | 1.6111ms | 1.4737ms | 678.5525 Ops/s | 726.7149 Ops/s | $\textbf{\color{#d91a1a}-6.63\\%}$ | | test_mod_wrap_and_backward[compile-overhead] | 1.7846ms | 1.0622ms | 941.4182 Ops/s | 1.1076 KOps/s | $\textbf{\color{#d91a1a}-15.00\\%}$ | | test_seq_add[eager] | 0.1524ms | 0.1114ms | 8.9741 KOps/s | 8.5038 KOps/s | $\textbf{\color{#35bf28}+5.53\\%}$ | | test_seq_add[compile] | 0.1503ms | 89.3284μs | 11.1946 KOps/s | 11.4003 KOps/s | $\color{#d91a1a}-1.80\\%$ | | test_seq_add[compile-overhead] | 0.1595ms | 0.1246ms | 8.0264 KOps/s | 8.0867 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_seq_wrap[eager] | 0.4987ms | 0.4368ms | 2.2893 KOps/s | 2.2156 KOps/s | $\color{#35bf28}+3.33\\%$ | | test_seq_wrap[compile] | 1.4878ms | 0.3445ms | 2.9024 KOps/s | 2.8442 KOps/s | $\color{#35bf28}+2.05\\%$ | | test_seq_wrap[compile-overhead] | 0.3045s | 0.1456s | 6.8703 Ops/s | 6.8077 Ops/s | $\color{#35bf28}+0.92\\%$ | | test_func_call_runtime[False-eager] | 1.1009ms | 0.7953ms | 1.2574 KOps/s | 1.3137 KOps/s | $\color{#d91a1a}-4.29\\%$ | | test_func_call_runtime[False-compile] | 0.9050ms | 0.8293ms | 1.2058 KOps/s | 1.2065 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.4349ms | 0.3629ms | 2.7554 KOps/s | 2.7363 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_func_call_runtime[True-eager] | 1.3599ms | 1.0122ms | 987.8991 Ops/s | 985.7730 Ops/s | $\color{#35bf28}+0.22\\%$ | | test_func_call_runtime[True-compile] | 0.9513ms | 0.8633ms | 1.1583 KOps/s | 1.1538 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.4617ms | 0.4049ms | 2.4697 KOps/s | 2.4655 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_distributed | 0.2726ms | 68.0825μs | 14.6881 KOps/s | 11.1452 KOps/s | $\textbf{\color{#35bf28}+31.79\\%}$ | | test_tdmodule | 29.1110μs | 14.0585μs | 71.1312 KOps/s | 62.3466 KOps/s | $\textbf{\color{#35bf28}+14.09\\%}$ | | test_tdmodule_dispatch | 50.8310μs | 29.0860μs | 34.3808 KOps/s | 29.4473 KOps/s | $\textbf{\color{#35bf28}+16.75\\%}$ | | test_tdseq | 31.8110μs | 15.2549μs | 65.5528 KOps/s | 57.3123 KOps/s | $\textbf{\color{#35bf28}+14.38\\%}$ | | test_tdseq_dispatch | 60.0410μs | 32.0455μs | 31.2056 KOps/s | 27.6620 KOps/s | $\textbf{\color{#35bf28}+12.81\\%}$ | | test_instantiation_functorch | 2.1616ms | 2.0276ms | 493.1979 Ops/s | 496.6940 Ops/s | $\color{#d91a1a}-0.70\\%$ | | test_instantiation_td | 2.0158ms | 1.3146ms | 760.6660 Ops/s | 759.9221 Ops/s | $\color{#35bf28}+0.10\\%$ | | test_exec_functorch | 0.2871ms | 0.2312ms | 4.3253 KOps/s | 4.3571 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_exec_functional_call | 0.3109ms | 0.2263ms | 4.4191 KOps/s | 4.4353 KOps/s | $\color{#d91a1a}-0.37\\%$ | | test_exec_td | 0.3293ms | 0.2244ms | 4.4560 KOps/s | 4.4546 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_exec_td_decorator | 1.0205ms | 0.3125ms | 3.2002 KOps/s | 3.3469 KOps/s | $\color{#d91a1a}-4.38\\%$ | | test_vmap_mlp_speed[True-True] | 0.8154ms | 0.6747ms | 1.4821 KOps/s | 1.4756 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_vmap_mlp_speed[True-False] | 0.7866ms | 0.7014ms | 1.4257 KOps/s | 1.4824 KOps/s | $\color{#d91a1a}-3.83\\%$ | | test_vmap_mlp_speed[False-True] | 0.6872ms | 0.6080ms | 1.6448 KOps/s | 1.6974 KOps/s | $\color{#d91a1a}-3.10\\%$ | | test_vmap_mlp_speed[False-False] | 0.6813ms | 0.6225ms | 1.6063 KOps/s | 1.6978 KOps/s | $\textbf{\color{#d91a1a}-5.39\\%}$ | | test_vmap_mlp_speed_decorator[True-True] | 0.8493ms | 0.7516ms | 1.3306 KOps/s | 1.3236 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 1.1379ms | 0.7527ms | 1.3285 KOps/s | 1.3005 KOps/s | $\color{#35bf28}+2.15\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8044ms | 0.6617ms | 1.5113 KOps/s | 1.5207 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8255ms | 0.6616ms | 1.5115 KOps/s | 1.5195 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_vmap_transformer_speed[True-True] | 9.0935ms | 8.9162ms | 112.1553 Ops/s | 112.8696 Ops/s | $\color{#d91a1a}-0.63\\%$ | | test_vmap_transformer_speed[True-False] | 9.0660ms | 8.8972ms | 112.3943 Ops/s | 112.9501 Ops/s | $\color{#d91a1a}-0.49\\%$ | | test_vmap_transformer_speed[False-True] | 10.5280ms | 8.8407ms | 113.1128 Ops/s | 114.1490 Ops/s | $\color{#d91a1a}-0.91\\%$ | | test_vmap_transformer_speed[False-False] | 8.8836ms | 8.8008ms | 113.6256 Ops/s | 114.2045 Ops/s | $\color{#d91a1a}-0.51\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 22.0313ms | 21.2545ms | 47.0490 Ops/s | 47.4176 Ops/s | $\color{#d91a1a}-0.78\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 21.9396ms | 21.2154ms | 47.1355 Ops/s | 47.6472 Ops/s | $\color{#d91a1a}-1.07\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 21.7730ms | 21.0526ms | 47.5002 Ops/s | 48.0240 Ops/s | $\color{#d91a1a}-1.09\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 21.7879ms | 21.0123ms | 47.5911 Ops/s | 48.0876 Ops/s | $\color{#d91a1a}-1.03\\%$ | | test_to_module_speed[True] | 2.7672ms | 1.4838ms | 673.9417 Ops/s | 666.2330 Ops/s | $\color{#35bf28}+1.16\\%$ | | test_to_module_speed[False] | 1.9976ms | 1.4845ms | 673.6394 Ops/s | 666.1134 Ops/s | $\color{#35bf28}+1.13\\%$ | | test_tc_init | 63.4020μs | 33.9831μs | 29.4264 KOps/s | 28.0399 KOps/s | $\color{#35bf28}+4.94\\%$ | | test_tc_init_nested | 0.1601ms | 68.2841μs | 14.6447 KOps/s | 14.1729 KOps/s | $\color{#35bf28}+3.33\\%$ | | test_tc_first_layer_tensor | 18.6710μs | 3.9873μs | 250.7974 KOps/s | 249.5989 KOps/s | $\color{#35bf28}+0.48\\%$ | | test_tc_first_layer_nontensor | 26.2600μs | 4.0249μs | 248.4535 KOps/s | 248.1238 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_tc_second_layer_tensor | 4.7703μs | 1.2904μs | 774.9263 KOps/s | 777.2183 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_tc_second_layer_nontensor | 26.5710μs | 4.6097μs | 216.9349 KOps/s | 215.5012 KOps/s | $\color{#35bf28}+0.67\\%$ | | test_unbind | 0.3151s | 12.9163ms | 77.4214 Ops/s | 76.3961 Ops/s | $\color{#35bf28}+1.34\\%$ | | test_full_like | 0.6593ms | 0.5786ms | 1.7283 KOps/s | 1.7284 KOps/s | $-0.00\\%$ | | test_zeros_like | 0.2617ms | 0.1977ms | 5.0584 KOps/s | 5.0582 KOps/s | $+0.00\\%$ | | test_ones_like | 0.2255ms | 0.1975ms | 5.0625 KOps/s | 5.0632 KOps/s | $\color{#d91a1a}-0.01\\%$ | | test_clone | 0.4481ms | 0.4146ms | 2.4118 KOps/s | 2.4116 KOps/s | $+0.01\\%$ | | test_squeeze | 38.5010μs | 12.5660μs | 79.5800 KOps/s | 83.8658 KOps/s | $\textbf{\color{#d91a1a}-5.11\\%}$ | | test_unsqueeze | 0.2624ms | 89.5282μs | 11.1697 KOps/s | 11.5915 KOps/s | $\color{#d91a1a}-3.64\\%$ | | test_split | 0.4613ms | 0.1843ms | 5.4264 KOps/s | 5.4105 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_permute | 0.3120ms | 0.2013ms | 4.9689 KOps/s | 5.0381 KOps/s | $\color{#d91a1a}-1.37\\%$ | | test_stack | 1.2567ms | 0.9049ms | 1.1051 KOps/s | 1.0827 KOps/s | $\color{#35bf28}+2.07\\%$ | | test_cat | 1.2646ms | 1.2318ms | 811.8360 Ops/s | 811.8973 Ops/s | $-0.01\\%$ |