pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
803 stars 65 forks source link

[Feature] add_custom_mapping and NPE refactors #910

Closed vmoens closed 1 month ago

vmoens commented 1 month ago

Description

github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ------------------------------------ | | test_plain_set_nested | 47.3890μs | 22.4666μs | 44.5105 KOps/s | 43.9410 KOps/s | $\color{#35bf28}+1.30\\%$ | | test_plain_set_stack_nested | 63.2990μs | 22.1853μs | 45.0749 KOps/s | 44.2853 KOps/s | $\color{#35bf28}+1.78\\%$ | | test_plain_set_nested_inplace | 64.5710μs | 24.4162μs | 40.9564 KOps/s | 40.4515 KOps/s | $\color{#35bf28}+1.25\\%$ | | test_plain_set_stack_nested_inplace | 53.1600μs | 24.3109μs | 41.1338 KOps/s | 40.6412 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_items | 43.4910μs | 2.5903μs | 386.0623 KOps/s | 371.0531 KOps/s | $\color{#35bf28}+4.05\\%$ | | test_items_nested | 0.4448ms | 0.3647ms | 2.7422 KOps/s | 2.7554 KOps/s | $\color{#d91a1a}-0.48\\%$ | | test_items_nested_locked | 0.4259ms | 0.3657ms | 2.7348 KOps/s | 2.7468 KOps/s | $\color{#d91a1a}-0.44\\%$ | | test_items_nested_leaf | 0.1241ms | 87.4485μs | 11.4353 KOps/s | 11.3720 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_items_stack_nested | 0.4188ms | 0.3638ms | 2.7490 KOps/s | 2.7493 KOps/s | $\color{#d91a1a}-0.01\\%$ | | test_items_stack_nested_leaf | 0.1881ms | 87.9099μs | 11.3753 KOps/s | 11.6969 KOps/s | $\color{#d91a1a}-2.75\\%$ | | test_items_stack_nested_locked | 0.5762ms | 0.3648ms | 2.7410 KOps/s | 2.7249 KOps/s | $\color{#35bf28}+0.59\\%$ | | test_keys | 45.4660μs | 3.9887μs | 250.7068 KOps/s | 259.1482 KOps/s | $\color{#d91a1a}-3.26\\%$ | | test_keys_nested | 0.2424ms | 0.1433ms | 6.9778 KOps/s | 6.8615 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_keys_nested_locked | 0.7846ms | 0.1507ms | 6.6359 KOps/s | 6.5652 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_keys_nested_leaf | 0.2119ms | 0.1229ms | 8.1382 KOps/s | 8.0958 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_keys_stack_nested | 0.2390ms | 0.1448ms | 6.9078 KOps/s | 6.7632 KOps/s | $\color{#35bf28}+2.14\\%$ | | test_keys_stack_nested_leaf | 0.2642ms | 0.1225ms | 8.1655 KOps/s | 8.0694 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_keys_stack_nested_locked | 0.2858ms | 0.1511ms | 6.6167 KOps/s | 6.6620 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_values | 11.3212μs | 1.1595μs | 862.4403 KOps/s | 867.2639 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_values_nested | 0.1256ms | 51.9042μs | 19.2663 KOps/s | 19.1341 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_values_nested_locked | 0.1036ms | 51.2623μs | 19.5075 KOps/s | 19.1678 KOps/s | $\color{#35bf28}+1.77\\%$ | | test_values_nested_leaf | 0.1067ms | 46.6245μs | 21.4480 KOps/s | 21.3734 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_values_stack_nested | 0.1081ms | 52.5133μs | 19.0428 KOps/s | 18.6105 KOps/s | $\color{#35bf28}+2.32\\%$ | | test_values_stack_nested_leaf | 92.7740μs | 46.8801μs | 21.3310 KOps/s | 21.5282 KOps/s | $\color{#d91a1a}-0.92\\%$ | | test_values_stack_nested_locked | 0.1100ms | 51.6840μs | 19.3483 KOps/s | 18.3372 KOps/s | $\textbf{\color{#35bf28}+5.51\\%}$ | | test_membership | 2.8764μs | 0.7307μs | 1.3686 MOps/s | 1.0927 MOps/s | $\textbf{\color{#35bf28}+25.25\\%}$ | | test_membership_nested | 30.1560μs | 2.6386μs | 378.9928 KOps/s | 378.3170 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_membership_nested_leaf | 43.0810μs | 2.6729μs | 374.1319 KOps/s | 374.1716 KOps/s | $\color{#d91a1a}-0.01\\%$ | | test_membership_stacked_nested | 39.4740μs | 2.6197μs | 381.7285 KOps/s | 373.7238 KOps/s | $\color{#35bf28}+2.14\\%$ | | test_membership_stacked_nested_leaf | 36.7090μs | 2.6344μs | 379.5908 KOps/s | 376.1450 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_membership_nested_last | 27.3020μs | 4.0136μs | 249.1511 KOps/s | 251.4329 KOps/s | $\color{#d91a1a}-0.91\\%$ | | test_membership_nested_leaf_last | 20.6890μs | 4.0224μs | 248.6096 KOps/s | 250.3478 KOps/s | $\color{#d91a1a}-0.69\\%$ | | test_membership_stacked_nested_last | 48.8220μs | 4.6373μs | 215.6450 KOps/s | 75.3592 KOps/s | $\textbf{\color{#35bf28}+186.16\\%}$ | | test_membership_stacked_nested_leaf_last | 39.7940μs | 4.6222μs | 216.3477 KOps/s | 76.8065 KOps/s | $\textbf{\color{#35bf28}+181.68\\%}$ | | test_nested_getleaf | 54.0010μs | 10.9708μs | 91.1507 KOps/s | 91.1414 KOps/s | $\color{#35bf28}+0.01\\%$ | | test_nested_get | 62.8480μs | 10.5030μs | 95.2110 KOps/s | 94.7726 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_stacked_getleaf | 34.7850μs | 10.9376μs | 91.4279 KOps/s | 90.4620 KOps/s | $\color{#35bf28}+1.07\\%$ | | test_stacked_get | 49.7940μs | 10.4482μs | 95.7102 KOps/s | 96.9911 KOps/s | $\color{#d91a1a}-1.32\\%$ | | test_nested_getitemleaf | 51.7970μs | 11.3979μs | 87.7353 KOps/s | 86.5658 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_nested_getitem | 55.6340μs | 10.5406μs | 94.8710 KOps/s | 93.4494 KOps/s | $\color{#35bf28}+1.52\\%$ | | test_stacked_getitemleaf | 39.7040μs | 11.3705μs | 87.9470 KOps/s | 87.1831 KOps/s | $\color{#35bf28}+0.88\\%$ | | test_stacked_getitem | 49.7030μs | 10.5802μs | 94.5165 KOps/s | 94.0009 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_lock_nested | 1.3688ms | 0.5228ms | 1.9126 KOps/s | 1.6878 KOps/s | $\textbf{\color{#35bf28}+13.32\\%}$ | | test_lock_stack_nested | 0.7191ms | 0.4910ms | 2.0368 KOps/s | 2.1692 KOps/s | $\textbf{\color{#d91a1a}-6.11\\%}$ | | test_unlock_nested | 0.8066ms | 0.4314ms | 2.3181 KOps/s | 2.3542 KOps/s | $\color{#d91a1a}-1.53\\%$ | | test_unlock_stack_nested | 0.6347ms | 0.4019ms | 2.4885 KOps/s | 2.6624 KOps/s | $\textbf{\color{#d91a1a}-6.53\\%}$ | | test_flatten_speed | 0.6522ms | 0.1080ms | 9.2604 KOps/s | 8.9780 KOps/s | $\color{#35bf28}+3.15\\%$ | | test_unflatten_speed | 4.0111ms | 0.4477ms | 2.2339 KOps/s | 2.2156 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_common_ops | 4.9344ms | 1.1595ms | 862.4178 Ops/s | 871.7229 Ops/s | $\color{#d91a1a}-1.07\\%$ | | test_creation | 0.1081ms | 2.4668μs | 405.3900 KOps/s | 391.4879 KOps/s | $\color{#35bf28}+3.55\\%$ | | test_creation_empty | 65.0120μs | 19.4693μs | 51.3629 KOps/s | 50.8409 KOps/s | $\color{#35bf28}+1.03\\%$ | | test_creation_nested_1 | 58.7400μs | 23.1631μs | 43.1721 KOps/s | 43.4476 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_creation_nested_2 | 72.1750μs | 27.3025μs | 36.6267 KOps/s | 37.2181 KOps/s | $\color{#d91a1a}-1.59\\%$ | | test_clone | 0.1670ms | 17.6446μs | 56.6744 KOps/s | 53.8508 KOps/s | $\textbf{\color{#35bf28}+5.24\\%}$ | | test_getitem[int] | 0.8016ms | 12.5733μs | 79.5335 KOps/s | 69.2917 KOps/s | $\textbf{\color{#35bf28}+14.78\\%}$ | | test_getitem[slice_int] | 0.1795ms | 32.2193μs | 31.0373 KOps/s | 29.0890 KOps/s | $\textbf{\color{#35bf28}+6.70\\%}$ | | test_getitem[range] | 0.3814ms | 56.9924μs | 17.5462 KOps/s | 16.9085 KOps/s | $\color{#35bf28}+3.77\\%$ | | test_getitem[tuple] | 0.1346ms | 26.4712μs | 37.7769 KOps/s | 35.0566 KOps/s | $\textbf{\color{#35bf28}+7.76\\%}$ | | test_getitem[list] | 0.3594ms | 52.7044μs | 18.9738 KOps/s | 18.1752 KOps/s | $\color{#35bf28}+4.39\\%$ | | test_setitem_dim[int] | 58.5500μs | 34.8237μs | 28.7161 KOps/s | 29.0623 KOps/s | $\color{#d91a1a}-1.19\\%$ | | test_setitem_dim[slice_int] | 0.1215ms | 75.1120μs | 13.3134 KOps/s | 13.6408 KOps/s | $\color{#d91a1a}-2.40\\%$ | | test_setitem_dim[range] | 0.1784ms | 96.6304μs | 10.3487 KOps/s | 10.6997 KOps/s | $\color{#d91a1a}-3.28\\%$ | | test_setitem_dim[tuple] | 0.1051ms | 61.5215μs | 16.2545 KOps/s | 16.9308 KOps/s | $\color{#d91a1a}-3.99\\%$ | | test_setitem | 0.2393ms | 30.6625μs | 32.6131 KOps/s | 32.5977 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_set | 0.2092ms | 29.6476μs | 33.7296 KOps/s | 33.4678 KOps/s | $\color{#35bf28}+0.78\\%$ | | test_set_shared | 2.2320ms | 0.2252ms | 4.4397 KOps/s | 4.5697 KOps/s | $\color{#d91a1a}-2.84\\%$ | | test_update | 0.2479ms | 37.7757μs | 26.4720 KOps/s | 26.5773 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_update_nested | 0.2411ms | 47.1886μs | 21.1916 KOps/s | 21.0185 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_update__nested | 0.1866ms | 35.9917μs | 27.7842 KOps/s | 27.6733 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_set_nested | 0.1899ms | 32.6762μs | 30.6033 KOps/s | 30.5772 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_set_nested_new | 0.1826ms | 37.5867μs | 26.6052 KOps/s | 26.4928 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_select | 1.1540ms | 55.2067μs | 18.1137 KOps/s | 18.2337 KOps/s | $\color{#d91a1a}-0.66\\%$ | | test_select_nested | 0.1151ms | 61.4412μs | 16.2757 KOps/s | 16.4115 KOps/s | $\color{#d91a1a}-0.83\\%$ | | test_exclude_nested | 0.1530ms | 81.2629μs | 12.3057 KOps/s | 12.3548 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_empty[True] | 0.6181ms | 0.3448ms | 2.9004 KOps/s | 2.6396 KOps/s | $\textbf{\color{#35bf28}+9.88\\%}$ | | test_empty[False] | 13.0295μs | 1.2229μs | 817.7365 KOps/s | 765.6828 KOps/s | $\textbf{\color{#35bf28}+6.80\\%}$ | | test_unbind_speed | 0.4721ms | 0.3228ms | 3.0976 KOps/s | 3.0484 KOps/s | $\color{#35bf28}+1.61\\%$ | | test_unbind_speed_stack0 | 0.6762ms | 0.3208ms | 3.1175 KOps/s | 3.2300 KOps/s | $\color{#d91a1a}-3.48\\%$ | | test_unbind_speed_stack1 | 89.6761ms | 0.8429ms | 1.1863 KOps/s | 1.3280 KOps/s | $\textbf{\color{#d91a1a}-10.67\\%}$ | | test_split | 87.0424ms | 2.2558ms | 443.3041 Ops/s | 403.4063 Ops/s | $\textbf{\color{#35bf28}+9.89\\%}$ | | test_chunk | 89.8301ms | 2.2700ms | 440.5331 Ops/s | 469.4761 Ops/s | $\textbf{\color{#d91a1a}-6.16\\%}$ | | test_creation[device0] | 4.1365ms | 0.1258ms | 7.9512 KOps/s | 8.1977 KOps/s | $\color{#d91a1a}-3.01\\%$ | | test_creation_from_tensor | 0.2821ms | 0.1216ms | 8.2254 KOps/s | 8.1120 KOps/s | $\color{#35bf28}+1.40\\%$ | | test_add_one[memmap_tensor0] | 0.3029ms | 8.2978μs | 120.5136 KOps/s | 125.0686 KOps/s | $\color{#d91a1a}-3.64\\%$ | | test_contiguous[memmap_tensor0] | 37.4100μs | 2.2145μs | 451.5680 KOps/s | 441.6203 KOps/s | $\color{#35bf28}+2.25\\%$ | | test_stack[memmap_tensor0] | 77.8370μs | 6.2146μs | 160.9124 KOps/s | 162.6747 KOps/s | $\color{#d91a1a}-1.08\\%$ | | test_memmaptd_index | 1.2642ms | 0.4318ms | 2.3160 KOps/s | 2.2492 KOps/s | $\color{#35bf28}+2.97\\%$ | | test_memmaptd_index_astensor | 0.8090ms | 0.5097ms | 1.9618 KOps/s | 1.6500 KOps/s | $\textbf{\color{#35bf28}+18.90\\%}$ | | test_memmaptd_index_op | 2.2353ms | 1.1047ms | 905.2529 Ops/s | 905.0582 Ops/s | $\color{#35bf28}+0.02\\%$ | | test_serialize_model | 0.2116s | 0.1450s | 6.8979 Ops/s | 7.8825 Ops/s | $\textbf{\color{#d91a1a}-12.49\\%}$ | | test_serialize_model_pickle | 0.4667s | 0.3969s | 2.5196 Ops/s | 2.5111 Ops/s | $\color{#35bf28}+0.34\\%$ | | test_serialize_weights | 0.1330s | 0.1294s | 7.7274 Ops/s | 6.9966 Ops/s | $\textbf{\color{#35bf28}+10.45\\%}$ | | test_serialize_weights_returnearly | 0.2751s | 0.1834s | 5.4532 Ops/s | 6.1934 Ops/s | $\textbf{\color{#d91a1a}-11.95\\%}$ | | test_serialize_weights_pickle | 1.2594s | 0.7248s | 1.3797 Ops/s | 2.5002 Ops/s | $\textbf{\color{#d91a1a}-44.82\\%}$ | | test_serialize_weights_filesystem | 0.1532s | 0.1473s | 6.7877 Ops/s | 6.8726 Ops/s | $\color{#d91a1a}-1.24\\%$ | | test_serialize_model_filesystem | 0.1538s | 0.1486s | 6.7301 Ops/s | 5.8919 Ops/s | $\textbf{\color{#35bf28}+14.23\\%}$ | | test_reshape_pytree | 92.5650μs | 39.2437μs | 25.4818 KOps/s | 25.1109 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_reshape_td | 97.6230μs | 48.2068μs | 20.7440 KOps/s | 19.7657 KOps/s | $\color{#35bf28}+4.95\\%$ | | test_view_pytree | 0.1605ms | 38.9113μs | 25.6995 KOps/s | 25.6843 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_view_td | 0.1110ms | 54.4112μs | 18.3786 KOps/s | 17.5240 KOps/s | $\color{#35bf28}+4.88\\%$ | | test_unbind_pytree | 0.1068ms | 36.0112μs | 27.7692 KOps/s | 27.8619 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_unbind_td | 0.4312ms | 47.4296μs | 21.0839 KOps/s | 20.7087 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_split_pytree | 0.1049ms | 39.0064μs | 25.6368 KOps/s | 25.7977 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_split_td | 0.2190ms | 60.5458μs | 16.5164 KOps/s | 16.0254 KOps/s | $\color{#35bf28}+3.06\\%$ | | test_add_pytree | 0.1196ms | 45.3060μs | 22.0721 KOps/s | 22.5411 KOps/s | $\color{#d91a1a}-2.08\\%$ | | test_add_td | 0.1939ms | 89.9598μs | 11.1161 KOps/s | 11.5159 KOps/s | $\color{#d91a1a}-3.47\\%$ | | test_distributed | 0.3043ms | 0.1340ms | 7.4607 KOps/s | 7.5991 KOps/s | $\color{#d91a1a}-1.82\\%$ | | test_tdmodule | 0.1358ms | 18.2222μs | 54.8781 KOps/s | 56.4564 KOps/s | $\color{#d91a1a}-2.80\\%$ | | test_tdmodule_dispatch | 70.3620μs | 36.7107μs | 27.2400 KOps/s | 26.7950 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_tdseq | 56.4370μs | 19.5540μs | 51.1404 KOps/s | 51.2329 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_tdseq_dispatch | 66.2350μs | 40.8660μs | 24.4702 KOps/s | 24.3197 KOps/s | $\color{#35bf28}+0.62\\%$ | | test_instantiation_functorch | 2.7077ms | 1.6461ms | 607.4872 Ops/s | 623.1942 Ops/s | $\color{#d91a1a}-2.52\\%$ | | test_instantiation_td | 1.8577ms | 1.1787ms | 848.4101 Ops/s | 848.8219 Ops/s | $\color{#d91a1a}-0.05\\%$ | | test_exec_functorch | 0.3420ms | 0.1873ms | 5.3401 KOps/s | 5.3934 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_exec_functional_call | 0.3153ms | 0.1786ms | 5.6004 KOps/s | 5.5277 KOps/s | $\color{#35bf28}+1.31\\%$ | | test_exec_td | 0.3586ms | 0.1835ms | 5.4503 KOps/s | 5.4259 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_exec_td_decorator | 0.8917ms | 0.2668ms | 3.7485 KOps/s | 3.7740 KOps/s | $\color{#d91a1a}-0.67\\%$ | | test_vmap_mlp_speed[True-True] | 0.9505ms | 0.6333ms | 1.5791 KOps/s | 1.6151 KOps/s | $\color{#d91a1a}-2.23\\%$ | | test_vmap_mlp_speed[True-False] | 0.8685ms | 0.6099ms | 1.6396 KOps/s | 1.6222 KOps/s | $\color{#35bf28}+1.07\\%$ | | test_vmap_mlp_speed[False-True] | 0.7158ms | 0.5061ms | 1.9759 KOps/s | 1.9713 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_vmap_mlp_speed[False-False] | 0.8086ms | 0.5114ms | 1.9555 KOps/s | 1.9708 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 0.9813ms | 0.7000ms | 1.4286 KOps/s | 1.4191 KOps/s | $\color{#35bf28}+0.67\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 1.1096ms | 0.6960ms | 1.4367 KOps/s | 1.4150 KOps/s | $\color{#35bf28}+1.54\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7825ms | 0.5762ms | 1.7356 KOps/s | 1.7035 KOps/s | $\color{#35bf28}+1.89\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7503ms | 0.5749ms | 1.7395 KOps/s | 1.7029 KOps/s | $\color{#35bf28}+2.15\\%$ | | test_to_module_speed[True] | 2.9406ms | 1.8185ms | 549.8891 Ops/s | 557.1073 Ops/s | $\color{#d91a1a}-1.30\\%$ | | test_to_module_speed[False] | 2.3311ms | 1.7685ms | 565.4617 Ops/s | 569.7961 Ops/s | $\color{#d91a1a}-0.76\\%$ | | test_tc_init | 83.4570μs | 45.6724μs | 21.8951 KOps/s | 22.4488 KOps/s | $\color{#d91a1a}-2.47\\%$ | | test_tc_init_nested | 0.1951ms | 90.6586μs | 11.0304 KOps/s | 11.1056 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_tc_first_layer_tensor | 31.8700μs | 9.0474μs | 110.5289 KOps/s | 109.5269 KOps/s | $\color{#35bf28}+0.91\\%$ | | test_tc_first_layer_nontensor | 59.8820μs | 9.0533μs | 110.4568 KOps/s | 110.3392 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_tc_second_layer_tensor | 44.2830μs | 2.8496μs | 350.9297 KOps/s | 354.0855 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_tc_second_layer_nontensor | 33.2530μs | 10.2028μs | 98.0122 KOps/s | 97.7017 KOps/s | $\color{#35bf28}+0.32\\%$ | | test_unbind | 0.1082s | 14.9165ms | 67.0399 Ops/s | 70.1069 Ops/s | $\color{#d91a1a}-4.37\\%$ | | test_full_like | 20.8370ms | 13.8066ms | 72.4294 Ops/s | 127.1794 Ops/s | $\textbf{\color{#d91a1a}-43.05\\%}$ | | test_zeros_like | 13.9955ms | 7.9949ms | 125.0796 Ops/s | 140.2603 Ops/s | $\textbf{\color{#d91a1a}-10.82\\%}$ | | test_ones_like | 12.7224ms | 7.6286ms | 131.0848 Ops/s | 126.3117 Ops/s | $\color{#35bf28}+3.78\\%$ | | test_clone | 16.2579ms | 9.3800ms | 106.6094 Ops/s | 104.3603 Ops/s | $\color{#35bf28}+2.16\\%$ | | test_squeeze | 65.7430μs | 14.6913μs | 68.0673 KOps/s | 68.2987 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_unsqueeze | 0.3055ms | 97.1156μs | 10.2970 KOps/s | 10.1800 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_split | 0.4467ms | 0.2077ms | 4.8156 KOps/s | 4.7243 KOps/s | $\color{#35bf28}+1.93\\%$ | | test_permute | 0.4509ms | 0.2323ms | 4.3041 KOps/s | 4.4282 KOps/s | $\color{#d91a1a}-2.80\\%$ | | test_stack | 33.3410ms | 26.3065ms | 38.0135 Ops/s | 37.9456 Ops/s | $\color{#35bf28}+0.18\\%$ | | test_cat | 32.8345ms | 25.9465ms | 38.5409 Ops/s | 38.2969 Ops/s | $\color{#35bf28}+0.64\\%$ |
github-actions[bot] commented 1 month ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}32$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 0.2060ms | 15.7175μs | 63.6233 KOps/s | 57.1696 KOps/s | $\textbf{\color{#35bf28}+11.29\\%}$ | | test_plain_set_stack_nested | 34.8810μs | 15.8924μs | 62.9233 KOps/s | 56.0269 KOps/s | $\textbf{\color{#35bf28}+12.31\\%}$ | | test_plain_set_nested_inplace | 0.2162ms | 16.9186μs | 59.1064 KOps/s | 52.8282 KOps/s | $\textbf{\color{#35bf28}+11.88\\%}$ | | test_plain_set_stack_nested_inplace | 37.9600μs | 16.9206μs | 59.0994 KOps/s | 52.8633 KOps/s | $\textbf{\color{#35bf28}+11.80\\%}$ | | test_items | 0.2300ms | 4.6902μs | 213.2120 KOps/s | 211.4908 KOps/s | $\color{#35bf28}+0.81\\%$ | | test_items_nested | 0.6047ms | 0.3943ms | 2.5359 KOps/s | 2.5469 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_items_nested_locked | 0.6009ms | 0.3954ms | 2.5289 KOps/s | 2.5314 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_items_nested_leaf | 0.2679ms | 86.4266μs | 11.5705 KOps/s | 11.6092 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_items_stack_nested | 0.5955ms | 0.3957ms | 2.5268 KOps/s | 2.5529 KOps/s | $\color{#d91a1a}-1.02\\%$ | | test_items_stack_nested_leaf | 0.1033ms | 87.7101μs | 11.4012 KOps/s | 11.4265 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_items_stack_nested_locked | 0.5913ms | 0.3944ms | 2.5356 KOps/s | 2.5341 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_keys | 0.2201ms | 4.3913μs | 227.7255 KOps/s | 227.3222 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_keys_nested | 0.2669ms | 66.4931μs | 15.0392 KOps/s | 14.9290 KOps/s | $\color{#35bf28}+0.74\\%$ | | test_keys_nested_locked | 2.1109ms | 72.2084μs | 13.8488 KOps/s | 13.6334 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_keys_nested_leaf | 0.2491ms | 58.0139μs | 17.2372 KOps/s | 17.3957 KOps/s | $\color{#d91a1a}-0.91\\%$ | | test_keys_stack_nested | 0.2614ms | 66.9137μs | 14.9446 KOps/s | 14.9283 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_keys_stack_nested_leaf | 75.9410μs | 58.0578μs | 17.2242 KOps/s | 17.2355 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_keys_stack_nested_locked | 0.2590ms | 72.5753μs | 13.7788 KOps/s | 13.7174 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_values | 64.5277μs | 1.7725μs | 564.1907 KOps/s | 568.6093 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_values_nested | 0.3287ms | 33.7764μs | 29.6065 KOps/s | 29.8207 KOps/s | $\color{#d91a1a}-0.72\\%$ | | test_values_nested_locked | 0.2217ms | 35.7378μs | 27.9816 KOps/s | 28.0791 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_values_nested_leaf | 0.2350ms | 29.8748μs | 33.4731 KOps/s | 33.3727 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_values_stack_nested | 0.2226ms | 33.8456μs | 29.5459 KOps/s | 29.4288 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_values_stack_nested_leaf | 47.1010μs | 30.1079μs | 33.2139 KOps/s | 33.2652 KOps/s | $\color{#d91a1a}-0.15\\%$ | | test_values_stack_nested_locked | 0.2364ms | 35.8531μs | 27.8916 KOps/s | 28.0741 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_membership | 10.7957μs | 0.5479μs | 1.8251 MOps/s | 1.8361 MOps/s | $\color{#d91a1a}-0.59\\%$ | | test_membership_nested | 93.8970μs | 2.0158μs | 496.0867 KOps/s | 483.1771 KOps/s | $\color{#35bf28}+2.67\\%$ | | test_membership_nested_leaf | 98.4370μs | 2.0021μs | 499.4778 KOps/s | 502.0628 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_membership_stacked_nested | 22.8410μs | 2.0564μs | 486.2940 KOps/s | 471.5019 KOps/s | $\color{#35bf28}+3.14\\%$ | | test_membership_stacked_nested_leaf | 0.2242ms | 2.0627μs | 484.8063 KOps/s | 476.1683 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_membership_nested_last | 31.7100μs | 3.0541μs | 327.4323 KOps/s | 329.7391 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_membership_nested_leaf_last | 0.2043ms | 3.0435μs | 328.5683 KOps/s | 332.3512 KOps/s | $\color{#d91a1a}-1.14\\%$ | | test_membership_stacked_nested_last | 15.4900μs | 3.0289μs | 330.1492 KOps/s | 328.3920 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_membership_stacked_nested_leaf_last | 19.6900μs | 3.0064μs | 332.6193 KOps/s | 333.9815 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_nested_getleaf | 0.2124ms | 8.0988μs | 123.4746 KOps/s | 121.8064 KOps/s | $\color{#35bf28}+1.37\\%$ | | test_nested_get | 0.1935ms | 7.6189μs | 131.2527 KOps/s | 129.0554 KOps/s | $\color{#35bf28}+1.70\\%$ | | test_stacked_getleaf | 24.2010μs | 8.1300μs | 123.0014 KOps/s | 121.7784 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_stacked_get | 0.2270ms | 7.6538μs | 130.6534 KOps/s | 129.2572 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_nested_getitemleaf | 0.2137ms | 8.2378μs | 121.3912 KOps/s | 119.8549 KOps/s | $\color{#35bf28}+1.28\\%$ | | test_nested_getitem | 23.7800μs | 7.7895μs | 128.3781 KOps/s | 126.8723 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_stacked_getitemleaf | 0.2082ms | 8.2496μs | 121.2181 KOps/s | 119.5961 KOps/s | $\color{#35bf28}+1.36\\%$ | | test_stacked_getitem | 22.2700μs | 7.7960μs | 128.2706 KOps/s | 126.2890 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_lock_nested | 5.0199ms | 0.4801ms | 2.0827 KOps/s | 2.0824 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_lock_stack_nested | 0.5045ms | 0.4394ms | 2.2757 KOps/s | 2.2894 KOps/s | $\color{#d91a1a}-0.60\\%$ | | test_unlock_nested | 0.8432ms | 0.3951ms | 2.5309 KOps/s | 2.5271 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_unlock_stack_nested | 0.3866ms | 0.3580ms | 2.7934 KOps/s | 2.8071 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_flatten_speed | 0.3080ms | 0.1055ms | 9.4745 KOps/s | 9.4362 KOps/s | $\color{#35bf28}+0.41\\%$ | | test_unflatten_speed | 0.4867ms | 0.2983ms | 3.3522 KOps/s | 3.3652 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_common_ops | 1.6670ms | 1.2893ms | 775.6341 Ops/s | 717.1197 Ops/s | $\textbf{\color{#35bf28}+8.16\\%}$ | | test_creation | 20.6510μs | 2.0047μs | 498.8252 KOps/s | 497.9301 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_creation_empty | 0.2354ms | 15.2204μs | 65.7015 KOps/s | 53.8734 KOps/s | $\textbf{\color{#35bf28}+21.96\\%}$ | | test_creation_nested_1 | 34.0200μs | 17.3221μs | 57.7297 KOps/s | 48.2051 KOps/s | $\textbf{\color{#35bf28}+19.76\\%}$ | | test_creation_nested_2 | 0.2231ms | 20.3954μs | 49.0307 KOps/s | 42.2753 KOps/s | $\textbf{\color{#35bf28}+15.98\\%}$ | | test_clone | 51.5510μs | 30.6543μs | 32.6219 KOps/s | 32.1801 KOps/s | $\color{#35bf28}+1.37\\%$ | | test_getitem[int] | 1.2077ms | 17.6553μs | 56.6401 KOps/s | 59.0950 KOps/s | $\color{#d91a1a}-4.15\\%$ | | test_getitem[slice_int] | 0.1485ms | 29.0814μs | 34.3862 KOps/s | 34.4510 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_getitem[range] | 0.3511ms | 0.1183ms | 8.4541 KOps/s | 8.6973 KOps/s | $\color{#d91a1a}-2.80\\%$ | | test_getitem[tuple] | 0.1495ms | 25.6870μs | 38.9301 KOps/s | 39.0566 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_getitem[list] | 0.3455ms | 0.1061ms | 9.4269 KOps/s | 9.1487 KOps/s | $\color{#35bf28}+3.04\\%$ | | test_setitem_dim[int] | 0.1750ms | 51.1388μs | 19.5546 KOps/s | 16.7833 KOps/s | $\textbf{\color{#35bf28}+16.51\\%}$ | | test_setitem_dim[slice_int] | 95.3510μs | 75.4432μs | 13.2550 KOps/s | 12.7606 KOps/s | $\color{#35bf28}+3.87\\%$ | | test_setitem_dim[range] | 0.2709ms | 0.1390ms | 7.1928 KOps/s | 7.0058 KOps/s | $\color{#35bf28}+2.67\\%$ | | test_setitem_dim[tuple] | 92.3010μs | 68.9214μs | 14.5093 KOps/s | 13.8462 KOps/s | $\color{#35bf28}+4.79\\%$ | | test_setitem | 0.2105ms | 42.5881μs | 23.4808 KOps/s | 22.4299 KOps/s | $\color{#35bf28}+4.68\\%$ | | test_set | 0.2531ms | 41.2408μs | 24.2478 KOps/s | 21.4775 KOps/s | $\textbf{\color{#35bf28}+12.90\\%}$ | | test_set_shared | 92.1938ms | 61.4133μs | 16.2831 KOps/s | 18.2730 KOps/s | $\textbf{\color{#d91a1a}-10.89\\%}$ | | test_update | 0.2619ms | 49.3313μs | 20.2711 KOps/s | 18.4674 KOps/s | $\textbf{\color{#35bf28}+9.77\\%}$ | | test_update_nested | 0.2690ms | 56.6890μs | 17.6401 KOps/s | 15.5427 KOps/s | $\textbf{\color{#35bf28}+13.49\\%}$ | | test_update__nested | 0.2610ms | 62.5311μs | 15.9920 KOps/s | 14.7091 KOps/s | $\textbf{\color{#35bf28}+8.72\\%}$ | | test_set_nested | 0.1868ms | 43.9013μs | 22.7784 KOps/s | 19.6658 KOps/s | $\textbf{\color{#35bf28}+15.83\\%}$ | | test_set_nested_new | 0.4983ms | 48.3549μs | 20.6804 KOps/s | 18.3438 KOps/s | $\textbf{\color{#35bf28}+12.74\\%}$ | | test_select | 0.1043ms | 64.9116μs | 15.4056 KOps/s | 14.3067 KOps/s | $\textbf{\color{#35bf28}+7.68\\%}$ | | test_select_nested | 0.2446ms | 52.0249μs | 19.2216 KOps/s | 19.0828 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_exclude_nested | 0.2621ms | 71.7151μs | 13.9441 KOps/s | 13.8067 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_empty[True] | 0.4796ms | 0.2954ms | 3.3850 KOps/s | 3.3250 KOps/s | $\color{#35bf28}+1.80\\%$ | | test_empty[False] | 20.7123μs | 0.9564μs | 1.0456 MOps/s | 1.0759 MOps/s | $\color{#d91a1a}-2.81\\%$ | | test_to | 59.2410μs | 37.4306μs | 26.7161 KOps/s | 26.4734 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_to_nonblocking | 46.5810μs | 24.0054μs | 41.6573 KOps/s | 41.2358 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_unbind_speed | 1.3119ms | 0.2991ms | 3.3435 KOps/s | 3.2751 KOps/s | $\color{#35bf28}+2.09\\%$ | | test_unbind_speed_stack0 | 0.5106ms | 0.2995ms | 3.3389 KOps/s | 3.2789 KOps/s | $\color{#35bf28}+1.83\\%$ | | test_unbind_speed_stack1 | 91.2092ms | 0.7928ms | 1.2613 KOps/s | 1.2747 KOps/s | $\color{#d91a1a}-1.05\\%$ | | test_split | 93.5992ms | 2.3602ms | 423.6907 Ops/s | 429.1197 Ops/s | $\color{#d91a1a}-1.27\\%$ | | test_chunk | 93.6948ms | 2.3648ms | 422.8673 Ops/s | 427.5689 Ops/s | $\color{#d91a1a}-1.10\\%$ | | test_creation[device0] | 0.3428ms | 0.1112ms | 8.9899 KOps/s | 9.6659 KOps/s | $\textbf{\color{#d91a1a}-6.99\\%}$ | | test_creation_from_tensor | 0.2825ms | 0.1075ms | 9.3062 KOps/s | 9.3394 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_add_one[memmap_tensor0] | 0.1406ms | 8.8143μs | 113.4523 KOps/s | 102.9182 KOps/s | $\textbf{\color{#35bf28}+10.24\\%}$ | | test_contiguous[memmap_tensor0] | 0.1916ms | 2.1657μs | 461.7367 KOps/s | 452.7243 KOps/s | $\color{#35bf28}+1.99\\%$ | | test_stack[memmap_tensor0] | 60.0110μs | 6.8650μs | 145.6666 KOps/s | 145.0987 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_memmaptd_index | 1.1713ms | 0.4345ms | 2.3014 KOps/s | 2.3138 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_memmaptd_index_astensor | 0.7611ms | 0.4978ms | 2.0087 KOps/s | 2.0225 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_memmaptd_index_op | 1.4062ms | 1.0069ms | 993.1548 Ops/s | 935.5185 Ops/s | $\textbf{\color{#35bf28}+6.16\\%}$ | | test_serialize_model | 0.1011s | 96.3487ms | 10.3790 Ops/s | 10.0472 Ops/s | $\color{#35bf28}+3.30\\%$ | | test_serialize_model_pickle | 1.3695s | 1.2392s | 0.8069 Ops/s | 0.8077 Ops/s | $\color{#d91a1a}-0.09\\%$ | | test_serialize_weights | 0.1882s | 0.1031s | 9.7005 Ops/s | 10.2811 Ops/s | $\textbf{\color{#d91a1a}-5.65\\%}$ | | test_serialize_weights_returnearly | 0.2930s | 90.4912ms | 11.0508 Ops/s | 12.1212 Ops/s | $\textbf{\color{#d91a1a}-8.83\\%}$ | | test_serialize_weights_pickle | 1.3513s | 1.2364s | 0.8088 Ops/s | 0.8091 Ops/s | $\color{#d91a1a}-0.03\\%$ | | test_reshape_pytree | 0.2474ms | 38.3151μs | 26.0994 KOps/s | 25.8222 KOps/s | $\color{#35bf28}+1.07\\%$ | | test_reshape_td | 0.1007ms | 44.8938μs | 22.2748 KOps/s | 21.9878 KOps/s | $\color{#35bf28}+1.31\\%$ | | test_view_pytree | 0.2346ms | 38.4427μs | 26.0127 KOps/s | 25.7893 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_view_td | 0.2281ms | 51.0772μs | 19.5782 KOps/s | 19.8490 KOps/s | $\color{#d91a1a}-1.36\\%$ | | test_unbind_pytree | 0.2957ms | 37.7313μs | 26.5032 KOps/s | 27.2396 KOps/s | $\color{#d91a1a}-2.70\\%$ | | test_unbind_td | 0.4416ms | 46.1731μs | 21.6577 KOps/s | 21.3683 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_split_pytree | 78.4920μs | 50.9988μs | 19.6083 KOps/s | 18.6622 KOps/s | $\textbf{\color{#35bf28}+5.07\\%}$ | | test_split_td | 0.4796ms | 60.2943μs | 16.5853 KOps/s | 14.0106 KOps/s | $\textbf{\color{#35bf28}+18.38\\%}$ | | test_add_pytree | 0.2974ms | 59.3595μs | 16.8465 KOps/s | 16.9852 KOps/s | $\color{#d91a1a}-0.82\\%$ | | test_add_td | 0.2343ms | 89.7838μs | 11.1379 KOps/s | 10.4073 KOps/s | $\textbf{\color{#35bf28}+7.02\\%}$ | | test_compile_add_one_nested[tensordict-compile] | 0.4092ms | 0.2106ms | 4.7492 KOps/s | 4.7607 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.3155ms | 0.1758ms | 5.6891 KOps/s | 5.7365 KOps/s | $\color{#d91a1a}-0.83\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.2768ms | 0.1437ms | 6.9596 KOps/s | 6.9558 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.3456ms | 0.1959ms | 5.1048 KOps/s | 5.1538 KOps/s | $\color{#d91a1a}-0.95\\%$ | | test_compile_copy_nested[tensordict-compile] | 0.1713ms | 22.2704μs | 44.9027 KOps/s | 45.2412 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_compile_copy_nested[tensordict-eager] | 84.1110μs | 48.2635μs | 20.7196 KOps/s | 20.4568 KOps/s | $\color{#35bf28}+1.28\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1911ms | 71.9327μs | 13.9019 KOps/s | 13.9965 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_compile_copy_nested[pytree-eager] | 88.8110μs | 59.9610μs | 16.6775 KOps/s | 16.8927 KOps/s | $\color{#d91a1a}-1.27\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.4967ms | 0.3196ms | 3.1288 KOps/s | 3.1208 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.3575ms | 0.2208ms | 4.5281 KOps/s | 4.4944 KOps/s | $\color{#35bf28}+0.75\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.3344ms | 0.1305ms | 7.6622 KOps/s | 7.8270 KOps/s | $\color{#d91a1a}-2.10\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.2103ms | 63.0888μs | 15.8507 KOps/s | 15.8070 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.4184ms | 0.3205ms | 3.1204 KOps/s | 3.1231 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.8080ms | 0.6375ms | 1.5687 KOps/s | 1.5800 KOps/s | $\color{#d91a1a}-0.71\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.4066ms | 0.2705ms | 3.6975 KOps/s | 3.6937 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.4653ms | 0.3235ms | 3.0907 KOps/s | 3.0890 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.2113ms | 76.0016μs | 13.1576 KOps/s | 13.1336 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.2558ms | 0.1304ms | 7.6681 KOps/s | 7.6784 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.6887ms | 0.5338ms | 1.8733 KOps/s | 1.8417 KOps/s | $\color{#35bf28}+1.71\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.4546ms | 0.3194ms | 3.1308 KOps/s | 3.1245 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_compile_copy_flat[tensordict-compile] | 0.1449ms | 18.6557μs | 53.6028 KOps/s | 53.5734 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_compile_copy_flat[tensordict-eager] | 55.2910μs | 32.0492μs | 31.2020 KOps/s | 30.1754 KOps/s | $\color{#35bf28}+3.40\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1517ms | 75.3691μs | 13.2680 KOps/s | 13.2189 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_compile_copy_flat[pytree-eager] | 0.1823ms | 60.6132μs | 16.4981 KOps/s | 16.3040 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.5492ms | 0.9287ms | 1.0768 KOps/s | 1.0922 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.6139ms | 3.3749ms | 296.3073 Ops/s | 292.3578 Ops/s | $\color{#35bf28}+1.35\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.4915ms | 0.9041ms | 1.1061 KOps/s | 1.0499 KOps/s | $\textbf{\color{#35bf28}+5.35\\%}$ | | test_compile_assign_and_add[pytree-eager] | 3.8140ms | 3.3703ms | 296.7125 Ops/s | 295.8981 Ops/s | $\color{#35bf28}+0.28\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.3488ms | 0.1141ms | 8.7651 KOps/s | 9.0841 KOps/s | $\color{#d91a1a}-3.51\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.2772ms | 66.2708μs | 15.0896 KOps/s | 15.8570 KOps/s | $\color{#d91a1a}-4.84\\%$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.2312ms | 0.1034ms | 9.6679 KOps/s | 9.7781 KOps/s | $\color{#d91a1a}-1.13\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 0.2760ms | 48.1412μs | 20.7722 KOps/s | 21.9912 KOps/s | $\textbf{\color{#d91a1a}-5.54\\%}$ | | test_compile_indexing[tensor-pytree-compile] | 0.3132ms | 0.1069ms | 9.3535 KOps/s | 9.8249 KOps/s | $\color{#d91a1a}-4.80\\%$ | | test_compile_indexing[tensor-pytree-eager] | 0.2857ms | 48.0681μs | 20.8038 KOps/s | 22.1712 KOps/s | $\textbf{\color{#d91a1a}-6.17\\%}$ | | test_compile_indexing[slice-tensordict-compile] | 0.3459ms | 0.1389ms | 7.2019 KOps/s | 7.2277 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.3471ms | 26.3132μs | 38.0037 KOps/s | 37.9258 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.3566ms | 0.1352ms | 7.3959 KOps/s | 7.7109 KOps/s | $\color{#d91a1a}-4.08\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 56.6410μs | 22.3149μs | 44.8131 KOps/s | 45.1287 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.3437ms | 0.1352ms | 7.3951 KOps/s | 7.6852 KOps/s | $\color{#d91a1a}-3.77\\%$ | | test_compile_indexing[slice-pytree-eager] | 0.2330ms | 22.3366μs | 44.7696 KOps/s | 41.9017 KOps/s | $\textbf{\color{#35bf28}+6.84\\%}$ | | test_compile_indexing[int-tensordict-compile] | 0.3569ms | 0.1433ms | 6.9770 KOps/s | 7.2127 KOps/s | $\color{#d91a1a}-3.27\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.5057ms | 26.6972μs | 37.4571 KOps/s | 37.7479 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.3527ms | 0.1352ms | 7.3964 KOps/s | 7.6371 KOps/s | $\color{#d91a1a}-3.15\\%$ | | test_compile_indexing[int-tensorclass-eager] | 0.2296ms | 22.3213μs | 44.8002 KOps/s | 45.0505 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_compile_indexing[int-pytree-compile] | 0.3013ms | 0.1323ms | 7.5584 KOps/s | 7.6949 KOps/s | $\color{#d91a1a}-1.77\\%$ | | test_compile_indexing[int-pytree-eager] | 0.2520ms | 22.2166μs | 45.0114 KOps/s | 44.3224 KOps/s | $\color{#35bf28}+1.55\\%$ | | test_mod_add[eager] | 0.1630ms | 38.1668μs | 26.2008 KOps/s | 25.7947 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_mod_add[compile] | 99.3110μs | 67.1530μs | 14.8914 KOps/s | 14.8829 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_mod_add[compile-overhead] | 0.2613ms | 0.1454ms | 6.8791 KOps/s | 6.9650 KOps/s | $\color{#d91a1a}-1.23\\%$ | | test_mod_wrap[eager] | 0.4672ms | 0.2526ms | 3.9584 KOps/s | 3.8570 KOps/s | $\color{#35bf28}+2.63\\%$ | | test_mod_wrap[compile] | 1.2130ms | 0.2942ms | 3.3986 KOps/s | 3.3713 KOps/s | $\color{#35bf28}+0.81\\%$ | | test_mod_wrap[compile-overhead] | 8.0407ms | 4.2431ms | 235.6788 Ops/s | 230.7184 Ops/s | $\color{#35bf28}+2.15\\%$ | | test_mod_wrap_and_backward[eager] | 1.5915ms | 1.4477ms | 690.7313 Ops/s | 723.2797 Ops/s | $\color{#d91a1a}-4.50\\%$ | | test_mod_wrap_and_backward[compile] | 1.6293ms | 1.4632ms | 683.4192 Ops/s | 680.0693 Ops/s | $\color{#35bf28}+0.49\\%$ | | test_mod_wrap_and_backward[compile-overhead] | 1.4660ms | 0.9913ms | 1.0088 KOps/s | 984.7971 Ops/s | $\color{#35bf28}+2.44\\%$ | | test_seq_add[eager] | 0.2492ms | 0.1086ms | 9.2083 KOps/s | 8.8214 KOps/s | $\color{#35bf28}+4.39\\%$ | | test_seq_add[compile] | 0.2685ms | 86.9871μs | 11.4960 KOps/s | 11.7064 KOps/s | $\color{#d91a1a}-1.80\\%$ | | test_seq_add[compile-overhead] | 0.3207ms | 0.1227ms | 8.1532 KOps/s | 8.2800 KOps/s | $\color{#d91a1a}-1.53\\%$ | | test_seq_wrap[eager] | 0.6322ms | 0.4188ms | 2.3877 KOps/s | 2.2347 KOps/s | $\textbf{\color{#35bf28}+6.84\\%}$ | | test_seq_wrap[compile] | 1.4761ms | 0.3262ms | 3.0652 KOps/s | 3.0364 KOps/s | $\color{#35bf28}+0.95\\%$ | | test_seq_wrap[compile-overhead] | 0.3128s | 0.1489s | 6.7154 Ops/s | 6.6944 Ops/s | $\color{#35bf28}+0.31\\%$ | | test_func_call_runtime[False-eager] | 0.8981ms | 0.7504ms | 1.3327 KOps/s | 1.3318 KOps/s | $\color{#35bf28}+0.07\\%$ | | test_func_call_runtime[False-compile] | 0.9932ms | 0.8155ms | 1.2263 KOps/s | 1.2151 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.4982ms | 0.3572ms | 2.7993 KOps/s | 2.7832 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_func_call_runtime[True-eager] | 1.1300ms | 0.9949ms | 1.0051 KOps/s | 997.4993 Ops/s | $\color{#35bf28}+0.77\\%$ | | test_func_call_runtime[True-compile] | 0.9624ms | 0.8549ms | 1.1697 KOps/s | 1.1672 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.4678ms | 0.4021ms | 2.4868 KOps/s | 2.5013 KOps/s | $\color{#d91a1a}-0.58\\%$ | | test_distributed | 2.4214ms | 72.6365μs | 13.7672 KOps/s | 11.4944 KOps/s | $\textbf{\color{#35bf28}+19.77\\%}$ | | test_tdmodule | 38.9700μs | 15.0769μs | 66.3266 KOps/s | 58.9220 KOps/s | $\textbf{\color{#35bf28}+12.57\\%}$ | | test_tdmodule_dispatch | 47.9410μs | 30.0109μs | 33.3213 KOps/s | 29.0605 KOps/s | $\textbf{\color{#35bf28}+14.66\\%}$ | | test_tdseq | 31.5010μs | 15.6538μs | 63.8822 KOps/s | 56.5111 KOps/s | $\textbf{\color{#35bf28}+13.04\\%}$ | | test_tdseq_dispatch | 49.1710μs | 32.2601μs | 30.9981 KOps/s | 27.4090 KOps/s | $\textbf{\color{#35bf28}+13.09\\%}$ | | test_instantiation_functorch | 2.0867ms | 1.9930ms | 501.7623 Ops/s | 497.2410 Ops/s | $\color{#35bf28}+0.91\\%$ | | test_instantiation_td | 2.0276ms | 1.3122ms | 762.0728 Ops/s | 763.8546 Ops/s | $\color{#d91a1a}-0.23\\%$ | | test_exec_functorch | 0.3758ms | 0.2267ms | 4.4106 KOps/s | 4.3806 KOps/s | $\color{#35bf28}+0.68\\%$ | | test_exec_functional_call | 0.3604ms | 0.2221ms | 4.5032 KOps/s | 4.4774 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_exec_td | 0.2804ms | 0.2222ms | 4.5001 KOps/s | 4.4875 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_exec_td_decorator | 0.6444ms | 0.2981ms | 3.3548 KOps/s | 3.3333 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_vmap_mlp_speed[True-True] | 0.8170ms | 0.6736ms | 1.4846 KOps/s | 1.4752 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_vmap_mlp_speed[True-False] | 0.8233ms | 0.6689ms | 1.4950 KOps/s | 1.4794 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_vmap_mlp_speed[False-True] | 0.7484ms | 0.5905ms | 1.6934 KOps/s | 1.6229 KOps/s | $\color{#35bf28}+4.35\\%$ | | test_vmap_mlp_speed[False-False] | 0.7605ms | 0.5906ms | 1.6931 KOps/s | 1.6953 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.4017ms | 0.7527ms | 1.3286 KOps/s | 1.3234 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9053ms | 0.7492ms | 1.3347 KOps/s | 1.3195 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8004ms | 0.6554ms | 1.5259 KOps/s | 1.5224 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7946ms | 0.6546ms | 1.5276 KOps/s | 1.5221 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_vmap_transformer_speed[True-True] | 9.0229ms | 8.8371ms | 113.1595 Ops/s | 112.3106 Ops/s | $\color{#35bf28}+0.76\\%$ | | test_vmap_transformer_speed[True-False] | 10.1253ms | 8.8124ms | 113.4768 Ops/s | 112.7515 Ops/s | $\color{#35bf28}+0.64\\%$ | | test_vmap_transformer_speed[False-True] | 8.9297ms | 8.7175ms | 114.7119 Ops/s | 113.4178 Ops/s | $\color{#35bf28}+1.14\\%$ | | test_vmap_transformer_speed[False-False] | 8.8875ms | 8.7269ms | 114.5888 Ops/s | 113.5793 Ops/s | $\color{#35bf28}+0.89\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 21.3191ms | 21.0845ms | 47.4281 Ops/s | 47.3062 Ops/s | $\color{#35bf28}+0.26\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 21.2998ms | 21.0556ms | 47.4934 Ops/s | 47.2519 Ops/s | $\color{#35bf28}+0.51\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 20.9247ms | 20.8237ms | 48.0222 Ops/s | 47.7384 Ops/s | $\color{#35bf28}+0.59\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 21.0638ms | 20.8565ms | 47.9466 Ops/s | 47.7050 Ops/s | $\color{#35bf28}+0.51\\%$ | | test_to_module_speed[True] | 1.5725ms | 1.4762ms | 677.4163 Ops/s | 674.7740 Ops/s | $\color{#35bf28}+0.39\\%$ | | test_to_module_speed[False] | 1.5751ms | 1.4716ms | 679.5228 Ops/s | 683.6523 Ops/s | $\color{#d91a1a}-0.60\\%$ | | test_tc_init | 54.3910μs | 36.0432μs | 27.7445 KOps/s | 26.2005 KOps/s | $\textbf{\color{#35bf28}+5.89\\%}$ | | test_tc_init_nested | 0.1153ms | 71.2239μs | 14.0402 KOps/s | 13.0300 KOps/s | $\textbf{\color{#35bf28}+7.75\\%}$ | | test_tc_first_layer_tensor | 19.0400μs | 4.0300μs | 248.1391 KOps/s | 250.5494 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_tc_first_layer_nontensor | 19.1900μs | 4.0167μs | 248.9605 KOps/s | 248.8830 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_tc_second_layer_tensor | 14.1052μs | 1.2932μs | 773.2935 KOps/s | 775.2856 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_tc_second_layer_nontensor | 24.2110μs | 4.6214μs | 216.3859 KOps/s | 217.3886 KOps/s | $\color{#d91a1a}-0.46\\%$ | | test_unbind | 0.3230s | 12.1517ms | 82.2931 Ops/s | 75.9071 Ops/s | $\textbf{\color{#35bf28}+8.41\\%}$ | | test_full_like | 0.7600ms | 0.5776ms | 1.7312 KOps/s | 1.7266 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_zeros_like | 0.2736ms | 0.1978ms | 5.0552 KOps/s | 5.0534 KOps/s | $\color{#35bf28}+0.04\\%$ | | test_ones_like | 0.3425ms | 0.1977ms | 5.0583 KOps/s | 5.0612 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_clone | 0.4962ms | 0.4142ms | 2.4140 KOps/s | 2.4093 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_squeeze | 31.2510μs | 11.6621μs | 85.7480 KOps/s | 83.8382 KOps/s | $\color{#35bf28}+2.28\\%$ | | test_unsqueeze | 0.2627ms | 82.7068μs | 12.0909 KOps/s | 11.8320 KOps/s | $\color{#35bf28}+2.19\\%$ | | test_split | 0.4820ms | 0.1785ms | 5.6022 KOps/s | 5.4328 KOps/s | $\color{#35bf28}+3.12\\%$ | | test_permute | 0.3013ms | 0.1895ms | 5.2773 KOps/s | 5.2204 KOps/s | $\color{#35bf28}+1.09\\%$ | | test_stack | 1.2824ms | 0.9107ms | 1.0980 KOps/s | 1.1108 KOps/s | $\color{#d91a1a}-1.15\\%$ | | test_cat | 1.3507ms | 1.2319ms | 811.7697 Ops/s | 811.8104 Ops/s | $-0.01\\%$ |