issues
search
pytorch
/
tensordict
TensorDict is a pytorch dedicated tensor container.
MIT License
832
stars
74
forks
source link
[Feature] add_custom_mapping and NPE refactors
#910
Closed
vmoens
closed
3 months ago
vmoens
commented
3 months ago
Description
Adds a
add_custom_mapping
for the
mappings
function
moves the
mappings
dict outside of the function to reduce overhead
documents the NPW deprecation
Adds the
"none"
option in the mappings to allow for no transformation to occur.
github-actions[bot]
commented
3 months ago
$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests
Total Benchmarks: 144. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}9$.
Expand to view detailed results
| Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ------------------------------------ | | test_plain_set_nested | 47.3890μs | 22.4666μs | 44.5105 KOps/s | 43.9410 KOps/s | $\color{#35bf28}+1.30\\%$ | | test_plain_set_stack_nested | 63.2990μs | 22.1853μs | 45.0749 KOps/s | 44.2853 KOps/s | $\color{#35bf28}+1.78\\%$ | | test_plain_set_nested_inplace | 64.5710μs | 24.4162μs | 40.9564 KOps/s | 40.4515 KOps/s | $\color{#35bf28}+1.25\\%$ | | test_plain_set_stack_nested_inplace | 53.1600μs | 24.3109μs | 41.1338 KOps/s | 40.6412 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_items | 43.4910μs | 2.5903μs | 386.0623 KOps/s | 371.0531 KOps/s | $\color{#35bf28}+4.05\\%$ | | test_items_nested | 0.4448ms | 0.3647ms | 2.7422 KOps/s | 2.7554 KOps/s | $\color{#d91a1a}-0.48\\%$ | | test_items_nested_locked | 0.4259ms | 0.3657ms | 2.7348 KOps/s | 2.7468 KOps/s | $\color{#d91a1a}-0.44\\%$ | | test_items_nested_leaf | 0.1241ms | 87.4485μs | 11.4353 KOps/s | 11.3720 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_items_stack_nested | 0.4188ms | 0.3638ms | 2.7490 KOps/s | 2.7493 KOps/s | $\color{#d91a1a}-0.01\\%$ | | test_items_stack_nested_leaf | 0.1881ms | 87.9099μs | 11.3753 KOps/s | 11.6969 KOps/s | $\color{#d91a1a}-2.75\\%$ | | test_items_stack_nested_locked | 0.5762ms | 0.3648ms | 2.7410 KOps/s | 2.7249 KOps/s | $\color{#35bf28}+0.59\\%$ | | test_keys | 45.4660μs | 3.9887μs | 250.7068 KOps/s | 259.1482 KOps/s | $\color{#d91a1a}-3.26\\%$ | | test_keys_nested | 0.2424ms | 0.1433ms | 6.9778 KOps/s | 6.8615 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_keys_nested_locked | 0.7846ms | 0.1507ms | 6.6359 KOps/s | 6.5652 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_keys_nested_leaf | 0.2119ms | 0.1229ms | 8.1382 KOps/s | 8.0958 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_keys_stack_nested | 0.2390ms | 0.1448ms | 6.9078 KOps/s | 6.7632 KOps/s | $\color{#35bf28}+2.14\\%$ | | test_keys_stack_nested_leaf | 0.2642ms | 0.1225ms | 8.1655 KOps/s | 8.0694 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_keys_stack_nested_locked | 0.2858ms | 0.1511ms | 6.6167 KOps/s | 6.6620 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_values | 11.3212μs | 1.1595μs | 862.4403 KOps/s | 867.2639 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_values_nested | 0.1256ms | 51.9042μs | 19.2663 KOps/s | 19.1341 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_values_nested_locked | 0.1036ms | 51.2623μs | 19.5075 KOps/s | 19.1678 KOps/s | $\color{#35bf28}+1.77\\%$ | | test_values_nested_leaf | 0.1067ms | 46.6245μs | 21.4480 KOps/s | 21.3734 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_values_stack_nested | 0.1081ms | 52.5133μs | 19.0428 KOps/s | 18.6105 KOps/s | $\color{#35bf28}+2.32\\%$ | | test_values_stack_nested_leaf | 92.7740μs | 46.8801μs | 21.3310 KOps/s | 21.5282 KOps/s | $\color{#d91a1a}-0.92\\%$ | | test_values_stack_nested_locked | 0.1100ms | 51.6840μs | 19.3483 KOps/s | 18.3372 KOps/s | $\textbf{\color{#35bf28}+5.51\\%}$ | | test_membership | 2.8764μs | 0.7307μs | 1.3686 MOps/s | 1.0927 MOps/s | $\textbf{\color{#35bf28}+25.25\\%}$ | | test_membership_nested | 30.1560μs | 2.6386μs | 378.9928 KOps/s | 378.3170 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_membership_nested_leaf | 43.0810μs | 2.6729μs | 374.1319 KOps/s | 374.1716 KOps/s | $\color{#d91a1a}-0.01\\%$ | | test_membership_stacked_nested | 39.4740μs | 2.6197μs | 381.7285 KOps/s | 373.7238 KOps/s | $\color{#35bf28}+2.14\\%$ | | test_membership_stacked_nested_leaf | 36.7090μs | 2.6344μs | 379.5908 KOps/s | 376.1450 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_membership_nested_last | 27.3020μs | 4.0136μs | 249.1511 KOps/s | 251.4329 KOps/s | $\color{#d91a1a}-0.91\\%$ | | test_membership_nested_leaf_last | 20.6890μs | 4.0224μs | 248.6096 KOps/s | 250.3478 KOps/s | $\color{#d91a1a}-0.69\\%$ | | test_membership_stacked_nested_last | 48.8220μs | 4.6373μs | 215.6450 KOps/s | 75.3592 KOps/s | $\textbf{\color{#35bf28}+186.16\\%}$ | | test_membership_stacked_nested_leaf_last | 39.7940μs | 4.6222μs | 216.3477 KOps/s | 76.8065 KOps/s | $\textbf{\color{#35bf28}+181.68\\%}$ | | test_nested_getleaf | 54.0010μs | 10.9708μs | 91.1507 KOps/s | 91.1414 KOps/s | $\color{#35bf28}+0.01\\%$ | | test_nested_get | 62.8480μs | 10.5030μs | 95.2110 KOps/s | 94.7726 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_stacked_getleaf | 34.7850μs | 10.9376μs | 91.4279 KOps/s | 90.4620 KOps/s | $\color{#35bf28}+1.07\\%$ | | test_stacked_get | 49.7940μs | 10.4482μs | 95.7102 KOps/s | 96.9911 KOps/s | $\color{#d91a1a}-1.32\\%$ | | test_nested_getitemleaf | 51.7970μs | 11.3979μs | 87.7353 KOps/s | 86.5658 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_nested_getitem | 55.6340μs | 10.5406μs | 94.8710 KOps/s | 93.4494 KOps/s | $\color{#35bf28}+1.52\\%$ | | test_stacked_getitemleaf | 39.7040μs | 11.3705μs | 87.9470 KOps/s | 87.1831 KOps/s | $\color{#35bf28}+0.88\\%$ | | test_stacked_getitem | 49.7030μs | 10.5802μs | 94.5165 KOps/s | 94.0009 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_lock_nested | 1.3688ms | 0.5228ms | 1.9126 KOps/s | 1.6878 KOps/s | $\textbf{\color{#35bf28}+13.32\\%}$ | | test_lock_stack_nested | 0.7191ms | 0.4910ms | 2.0368 KOps/s | 2.1692 KOps/s | $\textbf{\color{#d91a1a}-6.11\\%}$ | | test_unlock_nested | 0.8066ms | 0.4314ms | 2.3181 KOps/s | 2.3542 KOps/s | $\color{#d91a1a}-1.53\\%$ | | test_unlock_stack_nested | 0.6347ms | 0.4019ms | 2.4885 KOps/s | 2.6624 KOps/s | $\textbf{\color{#d91a1a}-6.53\\%}$ | | test_flatten_speed | 0.6522ms | 0.1080ms | 9.2604 KOps/s | 8.9780 KOps/s | $\color{#35bf28}+3.15\\%$ | | test_unflatten_speed | 4.0111ms | 0.4477ms | 2.2339 KOps/s | 2.2156 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_common_ops | 4.9344ms | 1.1595ms | 862.4178 Ops/s | 871.7229 Ops/s | $\color{#d91a1a}-1.07\\%$ | | test_creation | 0.1081ms | 2.4668μs | 405.3900 KOps/s | 391.4879 KOps/s | $\color{#35bf28}+3.55\\%$ | | test_creation_empty | 65.0120μs | 19.4693μs | 51.3629 KOps/s | 50.8409 KOps/s | $\color{#35bf28}+1.03\\%$ | | test_creation_nested_1 | 58.7400μs | 23.1631μs | 43.1721 KOps/s | 43.4476 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_creation_nested_2 | 72.1750μs | 27.3025μs | 36.6267 KOps/s | 37.2181 KOps/s | $\color{#d91a1a}-1.59\\%$ | | test_clone | 0.1670ms | 17.6446μs | 56.6744 KOps/s | 53.8508 KOps/s | $\textbf{\color{#35bf28}+5.24\\%}$ | | test_getitem[int] | 0.8016ms | 12.5733μs | 79.5335 KOps/s | 69.2917 KOps/s | $\textbf{\color{#35bf28}+14.78\\%}$ | | test_getitem[slice_int] | 0.1795ms | 32.2193μs | 31.0373 KOps/s | 29.0890 KOps/s | $\textbf{\color{#35bf28}+6.70\\%}$ | | test_getitem[range] | 0.3814ms | 56.9924μs | 17.5462 KOps/s | 16.9085 KOps/s | $\color{#35bf28}+3.77\\%$ | | test_getitem[tuple] | 0.1346ms | 26.4712μs | 37.7769 KOps/s | 35.0566 KOps/s | $\textbf{\color{#35bf28}+7.76\\%}$ | | test_getitem[list] | 0.3594ms | 52.7044μs | 18.9738 KOps/s | 18.1752 KOps/s | $\color{#35bf28}+4.39\\%$ | | test_setitem_dim[int] | 58.5500μs | 34.8237μs | 28.7161 KOps/s | 29.0623 KOps/s | $\color{#d91a1a}-1.19\\%$ | | test_setitem_dim[slice_int] | 0.1215ms | 75.1120μs | 13.3134 KOps/s | 13.6408 KOps/s | $\color{#d91a1a}-2.40\\%$ | | test_setitem_dim[range] | 0.1784ms | 96.6304μs | 10.3487 KOps/s | 10.6997 KOps/s | $\color{#d91a1a}-3.28\\%$ | | test_setitem_dim[tuple] | 0.1051ms | 61.5215μs | 16.2545 KOps/s | 16.9308 KOps/s | $\color{#d91a1a}-3.99\\%$ | | test_setitem | 0.2393ms | 30.6625μs | 32.6131 KOps/s | 32.5977 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_set | 0.2092ms | 29.6476μs | 33.7296 KOps/s | 33.4678 KOps/s | $\color{#35bf28}+0.78\\%$ | | test_set_shared | 2.2320ms | 0.2252ms | 4.4397 KOps/s | 4.5697 KOps/s | $\color{#d91a1a}-2.84\\%$ | | test_update | 0.2479ms | 37.7757μs | 26.4720 KOps/s | 26.5773 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_update_nested | 0.2411ms | 47.1886μs | 21.1916 KOps/s | 21.0185 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_update__nested | 0.1866ms | 35.9917μs | 27.7842 KOps/s | 27.6733 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_set_nested | 0.1899ms | 32.6762μs | 30.6033 KOps/s | 30.5772 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_set_nested_new | 0.1826ms | 37.5867μs | 26.6052 KOps/s | 26.4928 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_select | 1.1540ms | 55.2067μs | 18.1137 KOps/s | 18.2337 KOps/s | $\color{#d91a1a}-0.66\\%$ | | test_select_nested | 0.1151ms | 61.4412μs | 16.2757 KOps/s | 16.4115 KOps/s | $\color{#d91a1a}-0.83\\%$ | | test_exclude_nested | 0.1530ms | 81.2629μs | 12.3057 KOps/s | 12.3548 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_empty[True] | 0.6181ms | 0.3448ms | 2.9004 KOps/s | 2.6396 KOps/s | $\textbf{\color{#35bf28}+9.88\\%}$ | | test_empty[False] | 13.0295μs | 1.2229μs | 817.7365 KOps/s | 765.6828 KOps/s | $\textbf{\color{#35bf28}+6.80\\%}$ | | test_unbind_speed | 0.4721ms | 0.3228ms | 3.0976 KOps/s | 3.0484 KOps/s | $\color{#35bf28}+1.61\\%$ | | test_unbind_speed_stack0 | 0.6762ms | 0.3208ms | 3.1175 KOps/s | 3.2300 KOps/s | $\color{#d91a1a}-3.48\\%$ | | test_unbind_speed_stack1 | 89.6761ms | 0.8429ms | 1.1863 KOps/s | 1.3280 KOps/s | $\textbf{\color{#d91a1a}-10.67\\%}$ | | test_split | 87.0424ms | 2.2558ms | 443.3041 Ops/s | 403.4063 Ops/s | $\textbf{\color{#35bf28}+9.89\\%}$ | | test_chunk | 89.8301ms | 2.2700ms | 440.5331 Ops/s | 469.4761 Ops/s | $\textbf{\color{#d91a1a}-6.16\\%}$ | | test_creation[device0] | 4.1365ms | 0.1258ms | 7.9512 KOps/s | 8.1977 KOps/s | $\color{#d91a1a}-3.01\\%$ | | test_creation_from_tensor | 0.2821ms | 0.1216ms | 8.2254 KOps/s | 8.1120 KOps/s | $\color{#35bf28}+1.40\\%$ | | test_add_one[memmap_tensor0] | 0.3029ms | 8.2978μs | 120.5136 KOps/s | 125.0686 KOps/s | $\color{#d91a1a}-3.64\\%$ | | test_contiguous[memmap_tensor0] | 37.4100μs | 2.2145μs | 451.5680 KOps/s | 441.6203 KOps/s | $\color{#35bf28}+2.25\\%$ | | test_stack[memmap_tensor0] | 77.8370μs | 6.2146μs | 160.9124 KOps/s | 162.6747 KOps/s | $\color{#d91a1a}-1.08\\%$ | | test_memmaptd_index | 1.2642ms | 0.4318ms | 2.3160 KOps/s | 2.2492 KOps/s | $\color{#35bf28}+2.97\\%$ | | test_memmaptd_index_astensor | 0.8090ms | 0.5097ms | 1.9618 KOps/s | 1.6500 KOps/s | $\textbf{\color{#35bf28}+18.90\\%}$ | | test_memmaptd_index_op | 2.2353ms | 1.1047ms | 905.2529 Ops/s | 905.0582 Ops/s | $\color{#35bf28}+0.02\\%$ | | test_serialize_model | 0.2116s | 0.1450s | 6.8979 Ops/s | 7.8825 Ops/s | $\textbf{\color{#d91a1a}-12.49\\%}$ | | test_serialize_model_pickle | 0.4667s | 0.3969s | 2.5196 Ops/s | 2.5111 Ops/s | $\color{#35bf28}+0.34\\%$ | | test_serialize_weights | 0.1330s | 0.1294s | 7.7274 Ops/s | 6.9966 Ops/s | $\textbf{\color{#35bf28}+10.45\\%}$ | | test_serialize_weights_returnearly | 0.2751s | 0.1834s | 5.4532 Ops/s | 6.1934 Ops/s | $\textbf{\color{#d91a1a}-11.95\\%}$ | | test_serialize_weights_pickle | 1.2594s | 0.7248s | 1.3797 Ops/s | 2.5002 Ops/s | $\textbf{\color{#d91a1a}-44.82\\%}$ | | test_serialize_weights_filesystem | 0.1532s | 0.1473s | 6.7877 Ops/s | 6.8726 Ops/s | $\color{#d91a1a}-1.24\\%$ | | test_serialize_model_filesystem | 0.1538s | 0.1486s | 6.7301 Ops/s | 5.8919 Ops/s | $\textbf{\color{#35bf28}+14.23\\%}$ | | test_reshape_pytree | 92.5650μs | 39.2437μs | 25.4818 KOps/s | 25.1109 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_reshape_td | 97.6230μs | 48.2068μs | 20.7440 KOps/s | 19.7657 KOps/s | $\color{#35bf28}+4.95\\%$ | | test_view_pytree | 0.1605ms | 38.9113μs | 25.6995 KOps/s | 25.6843 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_view_td | 0.1110ms | 54.4112μs | 18.3786 KOps/s | 17.5240 KOps/s | $\color{#35bf28}+4.88\\%$ | | test_unbind_pytree | 0.1068ms | 36.0112μs | 27.7692 KOps/s | 27.8619 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_unbind_td | 0.4312ms | 47.4296μs | 21.0839 KOps/s | 20.7087 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_split_pytree | 0.1049ms | 39.0064μs | 25.6368 KOps/s | 25.7977 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_split_td | 0.2190ms | 60.5458μs | 16.5164 KOps/s | 16.0254 KOps/s | $\color{#35bf28}+3.06\\%$ | | test_add_pytree | 0.1196ms | 45.3060μs | 22.0721 KOps/s | 22.5411 KOps/s | $\color{#d91a1a}-2.08\\%$ | | test_add_td | 0.1939ms | 89.9598μs | 11.1161 KOps/s | 11.5159 KOps/s | $\color{#d91a1a}-3.47\\%$ | | test_distributed | 0.3043ms | 0.1340ms | 7.4607 KOps/s | 7.5991 KOps/s | $\color{#d91a1a}-1.82\\%$ | | test_tdmodule | 0.1358ms | 18.2222μs | 54.8781 KOps/s | 56.4564 KOps/s | $\color{#d91a1a}-2.80\\%$ | | test_tdmodule_dispatch | 70.3620μs | 36.7107μs | 27.2400 KOps/s | 26.7950 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_tdseq | 56.4370μs | 19.5540μs | 51.1404 KOps/s | 51.2329 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_tdseq_dispatch | 66.2350μs | 40.8660μs | 24.4702 KOps/s | 24.3197 KOps/s | $\color{#35bf28}+0.62\\%$ | | test_instantiation_functorch | 2.7077ms | 1.6461ms | 607.4872 Ops/s | 623.1942 Ops/s | $\color{#d91a1a}-2.52\\%$ | | test_instantiation_td | 1.8577ms | 1.1787ms | 848.4101 Ops/s | 848.8219 Ops/s | $\color{#d91a1a}-0.05\\%$ | | test_exec_functorch | 0.3420ms | 0.1873ms | 5.3401 KOps/s | 5.3934 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_exec_functional_call | 0.3153ms | 0.1786ms | 5.6004 KOps/s | 5.5277 KOps/s | $\color{#35bf28}+1.31\\%$ | | test_exec_td | 0.3586ms | 0.1835ms | 5.4503 KOps/s | 5.4259 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_exec_td_decorator | 0.8917ms | 0.2668ms | 3.7485 KOps/s | 3.7740 KOps/s | $\color{#d91a1a}-0.67\\%$ | | test_vmap_mlp_speed[True-True] | 0.9505ms | 0.6333ms | 1.5791 KOps/s | 1.6151 KOps/s | $\color{#d91a1a}-2.23\\%$ | | test_vmap_mlp_speed[True-False] | 0.8685ms | 0.6099ms | 1.6396 KOps/s | 1.6222 KOps/s | $\color{#35bf28}+1.07\\%$ | | test_vmap_mlp_speed[False-True] | 0.7158ms | 0.5061ms | 1.9759 KOps/s | 1.9713 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_vmap_mlp_speed[False-False] | 0.8086ms | 0.5114ms | 1.9555 KOps/s | 1.9708 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 0.9813ms | 0.7000ms | 1.4286 KOps/s | 1.4191 KOps/s | $\color{#35bf28}+0.67\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 1.1096ms | 0.6960ms | 1.4367 KOps/s | 1.4150 KOps/s | $\color{#35bf28}+1.54\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7825ms | 0.5762ms | 1.7356 KOps/s | 1.7035 KOps/s | $\color{#35bf28}+1.89\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7503ms | 0.5749ms | 1.7395 KOps/s | 1.7029 KOps/s | $\color{#35bf28}+2.15\\%$ | | test_to_module_speed[True] | 2.9406ms | 1.8185ms | 549.8891 Ops/s | 557.1073 Ops/s | $\color{#d91a1a}-1.30\\%$ | | test_to_module_speed[False] | 2.3311ms | 1.7685ms | 565.4617 Ops/s | 569.7961 Ops/s | $\color{#d91a1a}-0.76\\%$ | | test_tc_init | 83.4570μs | 45.6724μs | 21.8951 KOps/s | 22.4488 KOps/s | $\color{#d91a1a}-2.47\\%$ | | test_tc_init_nested | 0.1951ms | 90.6586μs | 11.0304 KOps/s | 11.1056 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_tc_first_layer_tensor | 31.8700μs | 9.0474μs | 110.5289 KOps/s | 109.5269 KOps/s | $\color{#35bf28}+0.91\\%$ | | test_tc_first_layer_nontensor | 59.8820μs | 9.0533μs | 110.4568 KOps/s | 110.3392 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_tc_second_layer_tensor | 44.2830μs | 2.8496μs | 350.9297 KOps/s | 354.0855 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_tc_second_layer_nontensor | 33.2530μs | 10.2028μs | 98.0122 KOps/s | 97.7017 KOps/s | $\color{#35bf28}+0.32\\%$ | | test_unbind | 0.1082s | 14.9165ms | 67.0399 Ops/s | 70.1069 Ops/s | $\color{#d91a1a}-4.37\\%$ | | test_full_like | 20.8370ms | 13.8066ms | 72.4294 Ops/s | 127.1794 Ops/s | $\textbf{\color{#d91a1a}-43.05\\%}$ | | test_zeros_like | 13.9955ms | 7.9949ms | 125.0796 Ops/s | 140.2603 Ops/s | $\textbf{\color{#d91a1a}-10.82\\%}$ | | test_ones_like | 12.7224ms | 7.6286ms | 131.0848 Ops/s | 126.3117 Ops/s | $\color{#35bf28}+3.78\\%$ | | test_clone | 16.2579ms | 9.3800ms | 106.6094 Ops/s | 104.3603 Ops/s | $\color{#35bf28}+2.16\\%$ | | test_squeeze | 65.7430μs | 14.6913μs | 68.0673 KOps/s | 68.2987 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_unsqueeze | 0.3055ms | 97.1156μs | 10.2970 KOps/s | 10.1800 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_split | 0.4467ms | 0.2077ms | 4.8156 KOps/s | 4.7243 KOps/s | $\color{#35bf28}+1.93\\%$ | | test_permute | 0.4509ms | 0.2323ms | 4.3041 KOps/s | 4.4282 KOps/s | $\color{#d91a1a}-2.80\\%$ | | test_stack | 33.3410ms | 26.3065ms | 38.0135 Ops/s | 37.9456 Ops/s | $\color{#35bf28}+0.18\\%$ | | test_cat | 32.8345ms | 25.9465ms | 38.5409 Ops/s | 38.2969 Ops/s | $\color{#35bf28}+0.64\\%$ |
github-actions[bot]
commented
3 months ago
$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests
Total Benchmarks: 219. Improved: $\large\color{#35bf28}32$. Worsened: $\large\color{#d91a1a}6$.
Expand to view detailed results
| Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 0.2060ms | 15.7175μs | 63.6233 KOps/s | 57.1696 KOps/s | $\textbf{\color{#35bf28}+11.29\\%}$ | | test_plain_set_stack_nested | 34.8810μs | 15.8924μs | 62.9233 KOps/s | 56.0269 KOps/s | $\textbf{\color{#35bf28}+12.31\\%}$ | | test_plain_set_nested_inplace | 0.2162ms | 16.9186μs | 59.1064 KOps/s | 52.8282 KOps/s | $\textbf{\color{#35bf28}+11.88\\%}$ | | test_plain_set_stack_nested_inplace | 37.9600μs | 16.9206μs | 59.0994 KOps/s | 52.8633 KOps/s | $\textbf{\color{#35bf28}+11.80\\%}$ | | test_items | 0.2300ms | 4.6902μs | 213.2120 KOps/s | 211.4908 KOps/s | $\color{#35bf28}+0.81\\%$ | | test_items_nested | 0.6047ms | 0.3943ms | 2.5359 KOps/s | 2.5469 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_items_nested_locked | 0.6009ms | 0.3954ms | 2.5289 KOps/s | 2.5314 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_items_nested_leaf | 0.2679ms | 86.4266μs | 11.5705 KOps/s | 11.6092 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_items_stack_nested | 0.5955ms | 0.3957ms | 2.5268 KOps/s | 2.5529 KOps/s | $\color{#d91a1a}-1.02\\%$ | | test_items_stack_nested_leaf | 0.1033ms | 87.7101μs | 11.4012 KOps/s | 11.4265 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_items_stack_nested_locked | 0.5913ms | 0.3944ms | 2.5356 KOps/s | 2.5341 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_keys | 0.2201ms | 4.3913μs | 227.7255 KOps/s | 227.3222 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_keys_nested | 0.2669ms | 66.4931μs | 15.0392 KOps/s | 14.9290 KOps/s | $\color{#35bf28}+0.74\\%$ | | test_keys_nested_locked | 2.1109ms | 72.2084μs | 13.8488 KOps/s | 13.6334 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_keys_nested_leaf | 0.2491ms | 58.0139μs | 17.2372 KOps/s | 17.3957 KOps/s | $\color{#d91a1a}-0.91\\%$ | | test_keys_stack_nested | 0.2614ms | 66.9137μs | 14.9446 KOps/s | 14.9283 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_keys_stack_nested_leaf | 75.9410μs | 58.0578μs | 17.2242 KOps/s | 17.2355 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_keys_stack_nested_locked | 0.2590ms | 72.5753μs | 13.7788 KOps/s | 13.7174 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_values | 64.5277μs | 1.7725μs | 564.1907 KOps/s | 568.6093 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_values_nested | 0.3287ms | 33.7764μs | 29.6065 KOps/s | 29.8207 KOps/s | $\color{#d91a1a}-0.72\\%$ | | test_values_nested_locked | 0.2217ms | 35.7378μs | 27.9816 KOps/s | 28.0791 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_values_nested_leaf | 0.2350ms | 29.8748μs | 33.4731 KOps/s | 33.3727 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_values_stack_nested | 0.2226ms | 33.8456μs | 29.5459 KOps/s | 29.4288 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_values_stack_nested_leaf | 47.1010μs | 30.1079μs | 33.2139 KOps/s | 33.2652 KOps/s | $\color{#d91a1a}-0.15\\%$ | | test_values_stack_nested_locked | 0.2364ms | 35.8531μs | 27.8916 KOps/s | 28.0741 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_membership | 10.7957μs | 0.5479μs | 1.8251 MOps/s | 1.8361 MOps/s | $\color{#d91a1a}-0.59\\%$ | | test_membership_nested | 93.8970μs | 2.0158μs | 496.0867 KOps/s | 483.1771 KOps/s | $\color{#35bf28}+2.67\\%$ | | test_membership_nested_leaf | 98.4370μs | 2.0021μs | 499.4778 KOps/s | 502.0628 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_membership_stacked_nested | 22.8410μs | 2.0564μs | 486.2940 KOps/s | 471.5019 KOps/s | $\color{#35bf28}+3.14\\%$ | | test_membership_stacked_nested_leaf | 0.2242ms | 2.0627μs | 484.8063 KOps/s | 476.1683 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_membership_nested_last | 31.7100μs | 3.0541μs | 327.4323 KOps/s | 329.7391 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_membership_nested_leaf_last | 0.2043ms | 3.0435μs | 328.5683 KOps/s | 332.3512 KOps/s | $\color{#d91a1a}-1.14\\%$ | | test_membership_stacked_nested_last | 15.4900μs | 3.0289μs | 330.1492 KOps/s | 328.3920 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_membership_stacked_nested_leaf_last | 19.6900μs | 3.0064μs | 332.6193 KOps/s | 333.9815 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_nested_getleaf | 0.2124ms | 8.0988μs | 123.4746 KOps/s | 121.8064 KOps/s | $\color{#35bf28}+1.37\\%$ | | test_nested_get | 0.1935ms | 7.6189μs | 131.2527 KOps/s | 129.0554 KOps/s | $\color{#35bf28}+1.70\\%$ | | test_stacked_getleaf | 24.2010μs | 8.1300μs | 123.0014 KOps/s | 121.7784 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_stacked_get | 0.2270ms | 7.6538μs | 130.6534 KOps/s | 129.2572 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_nested_getitemleaf | 0.2137ms | 8.2378μs | 121.3912 KOps/s | 119.8549 KOps/s | $\color{#35bf28}+1.28\\%$ | | test_nested_getitem | 23.7800μs | 7.7895μs | 128.3781 KOps/s | 126.8723 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_stacked_getitemleaf | 0.2082ms | 8.2496μs | 121.2181 KOps/s | 119.5961 KOps/s | $\color{#35bf28}+1.36\\%$ | | test_stacked_getitem | 22.2700μs | 7.7960μs | 128.2706 KOps/s | 126.2890 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_lock_nested | 5.0199ms | 0.4801ms | 2.0827 KOps/s | 2.0824 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_lock_stack_nested | 0.5045ms | 0.4394ms | 2.2757 KOps/s | 2.2894 KOps/s | $\color{#d91a1a}-0.60\\%$ | | test_unlock_nested | 0.8432ms | 0.3951ms | 2.5309 KOps/s | 2.5271 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_unlock_stack_nested | 0.3866ms | 0.3580ms | 2.7934 KOps/s | 2.8071 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_flatten_speed | 0.3080ms | 0.1055ms | 9.4745 KOps/s | 9.4362 KOps/s | $\color{#35bf28}+0.41\\%$ | | test_unflatten_speed | 0.4867ms | 0.2983ms | 3.3522 KOps/s | 3.3652 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_common_ops | 1.6670ms | 1.2893ms | 775.6341 Ops/s | 717.1197 Ops/s | $\textbf{\color{#35bf28}+8.16\\%}$ | | test_creation | 20.6510μs | 2.0047μs | 498.8252 KOps/s | 497.9301 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_creation_empty | 0.2354ms | 15.2204μs | 65.7015 KOps/s | 53.8734 KOps/s | $\textbf{\color{#35bf28}+21.96\\%}$ | | test_creation_nested_1 | 34.0200μs | 17.3221μs | 57.7297 KOps/s | 48.2051 KOps/s | $\textbf{\color{#35bf28}+19.76\\%}$ | | test_creation_nested_2 | 0.2231ms | 20.3954μs | 49.0307 KOps/s | 42.2753 KOps/s | $\textbf{\color{#35bf28}+15.98\\%}$ | | test_clone | 51.5510μs | 30.6543μs | 32.6219 KOps/s | 32.1801 KOps/s | $\color{#35bf28}+1.37\\%$ | | test_getitem[int] | 1.2077ms | 17.6553μs | 56.6401 KOps/s | 59.0950 KOps/s | $\color{#d91a1a}-4.15\\%$ | | test_getitem[slice_int] | 0.1485ms | 29.0814μs | 34.3862 KOps/s | 34.4510 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_getitem[range] | 0.3511ms | 0.1183ms | 8.4541 KOps/s | 8.6973 KOps/s | $\color{#d91a1a}-2.80\\%$ | | test_getitem[tuple] | 0.1495ms | 25.6870μs | 38.9301 KOps/s | 39.0566 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_getitem[list] | 0.3455ms | 0.1061ms | 9.4269 KOps/s | 9.1487 KOps/s | $\color{#35bf28}+3.04\\%$ | | test_setitem_dim[int] | 0.1750ms | 51.1388μs | 19.5546 KOps/s | 16.7833 KOps/s | $\textbf{\color{#35bf28}+16.51\\%}$ | | test_setitem_dim[slice_int] | 95.3510μs | 75.4432μs | 13.2550 KOps/s | 12.7606 KOps/s | $\color{#35bf28}+3.87\\%$ | | test_setitem_dim[range] | 0.2709ms | 0.1390ms | 7.1928 KOps/s | 7.0058 KOps/s | $\color{#35bf28}+2.67\\%$ | | test_setitem_dim[tuple] | 92.3010μs | 68.9214μs | 14.5093 KOps/s | 13.8462 KOps/s | $\color{#35bf28}+4.79\\%$ | | test_setitem | 0.2105ms | 42.5881μs | 23.4808 KOps/s | 22.4299 KOps/s | $\color{#35bf28}+4.68\\%$ | | test_set | 0.2531ms | 41.2408μs | 24.2478 KOps/s | 21.4775 KOps/s | $\textbf{\color{#35bf28}+12.90\\%}$ | | test_set_shared | 92.1938ms | 61.4133μs | 16.2831 KOps/s | 18.2730 KOps/s | $\textbf{\color{#d91a1a}-10.89\\%}$ | | test_update | 0.2619ms | 49.3313μs | 20.2711 KOps/s | 18.4674 KOps/s | $\textbf{\color{#35bf28}+9.77\\%}$ | | test_update_nested | 0.2690ms | 56.6890μs | 17.6401 KOps/s | 15.5427 KOps/s | $\textbf{\color{#35bf28}+13.49\\%}$ | | test_update__nested | 0.2610ms | 62.5311μs | 15.9920 KOps/s | 14.7091 KOps/s | $\textbf{\color{#35bf28}+8.72\\%}$ | | test_set_nested | 0.1868ms | 43.9013μs | 22.7784 KOps/s | 19.6658 KOps/s | $\textbf{\color{#35bf28}+15.83\\%}$ | | test_set_nested_new | 0.4983ms | 48.3549μs | 20.6804 KOps/s | 18.3438 KOps/s | $\textbf{\color{#35bf28}+12.74\\%}$ | | test_select | 0.1043ms | 64.9116μs | 15.4056 KOps/s | 14.3067 KOps/s | $\textbf{\color{#35bf28}+7.68\\%}$ | | test_select_nested | 0.2446ms | 52.0249μs | 19.2216 KOps/s | 19.0828 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_exclude_nested | 0.2621ms | 71.7151μs | 13.9441 KOps/s | 13.8067 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_empty[True] | 0.4796ms | 0.2954ms | 3.3850 KOps/s | 3.3250 KOps/s | $\color{#35bf28}+1.80\\%$ | | test_empty[False] | 20.7123μs | 0.9564μs | 1.0456 MOps/s | 1.0759 MOps/s | $\color{#d91a1a}-2.81\\%$ | | test_to | 59.2410μs | 37.4306μs | 26.7161 KOps/s | 26.4734 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_to_nonblocking | 46.5810μs | 24.0054μs | 41.6573 KOps/s | 41.2358 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_unbind_speed | 1.3119ms | 0.2991ms | 3.3435 KOps/s | 3.2751 KOps/s | $\color{#35bf28}+2.09\\%$ | | test_unbind_speed_stack0 | 0.5106ms | 0.2995ms | 3.3389 KOps/s | 3.2789 KOps/s | $\color{#35bf28}+1.83\\%$ | | test_unbind_speed_stack1 | 91.2092ms | 0.7928ms | 1.2613 KOps/s | 1.2747 KOps/s | $\color{#d91a1a}-1.05\\%$ | | test_split | 93.5992ms | 2.3602ms | 423.6907 Ops/s | 429.1197 Ops/s | $\color{#d91a1a}-1.27\\%$ | | test_chunk | 93.6948ms | 2.3648ms | 422.8673 Ops/s | 427.5689 Ops/s | $\color{#d91a1a}-1.10\\%$ | | test_creation[device0] | 0.3428ms | 0.1112ms | 8.9899 KOps/s | 9.6659 KOps/s | $\textbf{\color{#d91a1a}-6.99\\%}$ | | test_creation_from_tensor | 0.2825ms | 0.1075ms | 9.3062 KOps/s | 9.3394 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_add_one[memmap_tensor0] | 0.1406ms | 8.8143μs | 113.4523 KOps/s | 102.9182 KOps/s | $\textbf{\color{#35bf28}+10.24\\%}$ | | test_contiguous[memmap_tensor0] | 0.1916ms | 2.1657μs | 461.7367 KOps/s | 452.7243 KOps/s | $\color{#35bf28}+1.99\\%$ | | test_stack[memmap_tensor0] | 60.0110μs | 6.8650μs | 145.6666 KOps/s | 145.0987 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_memmaptd_index | 1.1713ms | 0.4345ms | 2.3014 KOps/s | 2.3138 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_memmaptd_index_astensor | 0.7611ms | 0.4978ms | 2.0087 KOps/s | 2.0225 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_memmaptd_index_op | 1.4062ms | 1.0069ms | 993.1548 Ops/s | 935.5185 Ops/s | $\textbf{\color{#35bf28}+6.16\\%}$ | | test_serialize_model | 0.1011s | 96.3487ms | 10.3790 Ops/s | 10.0472 Ops/s | $\color{#35bf28}+3.30\\%$ | | test_serialize_model_pickle | 1.3695s | 1.2392s | 0.8069 Ops/s | 0.8077 Ops/s | $\color{#d91a1a}-0.09\\%$ | | test_serialize_weights | 0.1882s | 0.1031s | 9.7005 Ops/s | 10.2811 Ops/s | $\textbf{\color{#d91a1a}-5.65\\%}$ | | test_serialize_weights_returnearly | 0.2930s | 90.4912ms | 11.0508 Ops/s | 12.1212 Ops/s | $\textbf{\color{#d91a1a}-8.83\\%}$ | | test_serialize_weights_pickle | 1.3513s | 1.2364s | 0.8088 Ops/s | 0.8091 Ops/s | $\color{#d91a1a}-0.03\\%$ | | test_reshape_pytree | 0.2474ms | 38.3151μs | 26.0994 KOps/s | 25.8222 KOps/s | $\color{#35bf28}+1.07\\%$ | | test_reshape_td | 0.1007ms | 44.8938μs | 22.2748 KOps/s | 21.9878 KOps/s | $\color{#35bf28}+1.31\\%$ | | test_view_pytree | 0.2346ms | 38.4427μs | 26.0127 KOps/s | 25.7893 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_view_td | 0.2281ms | 51.0772μs | 19.5782 KOps/s | 19.8490 KOps/s | $\color{#d91a1a}-1.36\\%$ | | test_unbind_pytree | 0.2957ms | 37.7313μs | 26.5032 KOps/s | 27.2396 KOps/s | $\color{#d91a1a}-2.70\\%$ | | test_unbind_td | 0.4416ms | 46.1731μs | 21.6577 KOps/s | 21.3683 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_split_pytree | 78.4920μs | 50.9988μs | 19.6083 KOps/s | 18.6622 KOps/s | $\textbf{\color{#35bf28}+5.07\\%}$ | | test_split_td | 0.4796ms | 60.2943μs | 16.5853 KOps/s | 14.0106 KOps/s | $\textbf{\color{#35bf28}+18.38\\%}$ | | test_add_pytree | 0.2974ms | 59.3595μs | 16.8465 KOps/s | 16.9852 KOps/s | $\color{#d91a1a}-0.82\\%$ | | test_add_td | 0.2343ms | 89.7838μs | 11.1379 KOps/s | 10.4073 KOps/s | $\textbf{\color{#35bf28}+7.02\\%}$ | | test_compile_add_one_nested[tensordict-compile] | 0.4092ms | 0.2106ms | 4.7492 KOps/s | 4.7607 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.3155ms | 0.1758ms | 5.6891 KOps/s | 5.7365 KOps/s | $\color{#d91a1a}-0.83\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.2768ms | 0.1437ms | 6.9596 KOps/s | 6.9558 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.3456ms | 0.1959ms | 5.1048 KOps/s | 5.1538 KOps/s | $\color{#d91a1a}-0.95\\%$ | | test_compile_copy_nested[tensordict-compile] | 0.1713ms | 22.2704μs | 44.9027 KOps/s | 45.2412 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_compile_copy_nested[tensordict-eager] | 84.1110μs | 48.2635μs | 20.7196 KOps/s | 20.4568 KOps/s | $\color{#35bf28}+1.28\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1911ms | 71.9327μs | 13.9019 KOps/s | 13.9965 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_compile_copy_nested[pytree-eager] | 88.8110μs | 59.9610μs | 16.6775 KOps/s | 16.8927 KOps/s | $\color{#d91a1a}-1.27\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.4967ms | 0.3196ms | 3.1288 KOps/s | 3.1208 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.3575ms | 0.2208ms | 4.5281 KOps/s | 4.4944 KOps/s | $\color{#35bf28}+0.75\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.3344ms | 0.1305ms | 7.6622 KOps/s | 7.8270 KOps/s | $\color{#d91a1a}-2.10\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.2103ms | 63.0888μs | 15.8507 KOps/s | 15.8070 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.4184ms | 0.3205ms | 3.1204 KOps/s | 3.1231 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.8080ms | 0.6375ms | 1.5687 KOps/s | 1.5800 KOps/s | $\color{#d91a1a}-0.71\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.4066ms | 0.2705ms | 3.6975 KOps/s | 3.6937 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.4653ms | 0.3235ms | 3.0907 KOps/s | 3.0890 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.2113ms | 76.0016μs | 13.1576 KOps/s | 13.1336 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.2558ms | 0.1304ms | 7.6681 KOps/s | 7.6784 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.6887ms | 0.5338ms | 1.8733 KOps/s | 1.8417 KOps/s | $\color{#35bf28}+1.71\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.4546ms | 0.3194ms | 3.1308 KOps/s | 3.1245 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_compile_copy_flat[tensordict-compile] | 0.1449ms | 18.6557μs | 53.6028 KOps/s | 53.5734 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_compile_copy_flat[tensordict-eager] | 55.2910μs | 32.0492μs | 31.2020 KOps/s | 30.1754 KOps/s | $\color{#35bf28}+3.40\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1517ms | 75.3691μs | 13.2680 KOps/s | 13.2189 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_compile_copy_flat[pytree-eager] | 0.1823ms | 60.6132μs | 16.4981 KOps/s | 16.3040 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.5492ms | 0.9287ms | 1.0768 KOps/s | 1.0922 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.6139ms | 3.3749ms | 296.3073 Ops/s | 292.3578 Ops/s | $\color{#35bf28}+1.35\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.4915ms | 0.9041ms | 1.1061 KOps/s | 1.0499 KOps/s | $\textbf{\color{#35bf28}+5.35\\%}$ | | test_compile_assign_and_add[pytree-eager] | 3.8140ms | 3.3703ms | 296.7125 Ops/s | 295.8981 Ops/s | $\color{#35bf28}+0.28\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.3488ms | 0.1141ms | 8.7651 KOps/s | 9.0841 KOps/s | $\color{#d91a1a}-3.51\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.2772ms | 66.2708μs | 15.0896 KOps/s | 15.8570 KOps/s | $\color{#d91a1a}-4.84\\%$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.2312ms | 0.1034ms | 9.6679 KOps/s | 9.7781 KOps/s | $\color{#d91a1a}-1.13\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 0.2760ms | 48.1412μs | 20.7722 KOps/s | 21.9912 KOps/s | $\textbf{\color{#d91a1a}-5.54\\%}$ | | test_compile_indexing[tensor-pytree-compile] | 0.3132ms | 0.1069ms | 9.3535 KOps/s | 9.8249 KOps/s | $\color{#d91a1a}-4.80\\%$ | | test_compile_indexing[tensor-pytree-eager] | 0.2857ms | 48.0681μs | 20.8038 KOps/s | 22.1712 KOps/s | $\textbf{\color{#d91a1a}-6.17\\%}$ | | test_compile_indexing[slice-tensordict-compile] | 0.3459ms | 0.1389ms | 7.2019 KOps/s | 7.2277 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.3471ms | 26.3132μs | 38.0037 KOps/s | 37.9258 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.3566ms | 0.1352ms | 7.3959 KOps/s | 7.7109 KOps/s | $\color{#d91a1a}-4.08\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 56.6410μs | 22.3149μs | 44.8131 KOps/s | 45.1287 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.3437ms | 0.1352ms | 7.3951 KOps/s | 7.6852 KOps/s | $\color{#d91a1a}-3.77\\%$ | | test_compile_indexing[slice-pytree-eager] | 0.2330ms | 22.3366μs | 44.7696 KOps/s | 41.9017 KOps/s | $\textbf{\color{#35bf28}+6.84\\%}$ | | test_compile_indexing[int-tensordict-compile] | 0.3569ms | 0.1433ms | 6.9770 KOps/s | 7.2127 KOps/s | $\color{#d91a1a}-3.27\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.5057ms | 26.6972μs | 37.4571 KOps/s | 37.7479 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.3527ms | 0.1352ms | 7.3964 KOps/s | 7.6371 KOps/s | $\color{#d91a1a}-3.15\\%$ | | test_compile_indexing[int-tensorclass-eager] | 0.2296ms | 22.3213μs | 44.8002 KOps/s | 45.0505 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_compile_indexing[int-pytree-compile] | 0.3013ms | 0.1323ms | 7.5584 KOps/s | 7.6949 KOps/s | $\color{#d91a1a}-1.77\\%$ | | test_compile_indexing[int-pytree-eager] | 0.2520ms | 22.2166μs | 45.0114 KOps/s | 44.3224 KOps/s | $\color{#35bf28}+1.55\\%$ | | test_mod_add[eager] | 0.1630ms | 38.1668μs | 26.2008 KOps/s | 25.7947 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_mod_add[compile] | 99.3110μs | 67.1530μs | 14.8914 KOps/s | 14.8829 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_mod_add[compile-overhead] | 0.2613ms | 0.1454ms | 6.8791 KOps/s | 6.9650 KOps/s | $\color{#d91a1a}-1.23\\%$ | | test_mod_wrap[eager] | 0.4672ms | 0.2526ms | 3.9584 KOps/s | 3.8570 KOps/s | $\color{#35bf28}+2.63\\%$ | | test_mod_wrap[compile] | 1.2130ms | 0.2942ms | 3.3986 KOps/s | 3.3713 KOps/s | $\color{#35bf28}+0.81\\%$ | | test_mod_wrap[compile-overhead] | 8.0407ms | 4.2431ms | 235.6788 Ops/s | 230.7184 Ops/s | $\color{#35bf28}+2.15\\%$ | | test_mod_wrap_and_backward[eager] | 1.5915ms | 1.4477ms | 690.7313 Ops/s | 723.2797 Ops/s | $\color{#d91a1a}-4.50\\%$ | | test_mod_wrap_and_backward[compile] | 1.6293ms | 1.4632ms | 683.4192 Ops/s | 680.0693 Ops/s | $\color{#35bf28}+0.49\\%$ | | test_mod_wrap_and_backward[compile-overhead] | 1.4660ms | 0.9913ms | 1.0088 KOps/s | 984.7971 Ops/s | $\color{#35bf28}+2.44\\%$ | | test_seq_add[eager] | 0.2492ms | 0.1086ms | 9.2083 KOps/s | 8.8214 KOps/s | $\color{#35bf28}+4.39\\%$ | | test_seq_add[compile] | 0.2685ms | 86.9871μs | 11.4960 KOps/s | 11.7064 KOps/s | $\color{#d91a1a}-1.80\\%$ | | test_seq_add[compile-overhead] | 0.3207ms | 0.1227ms | 8.1532 KOps/s | 8.2800 KOps/s | $\color{#d91a1a}-1.53\\%$ | | test_seq_wrap[eager] | 0.6322ms | 0.4188ms | 2.3877 KOps/s | 2.2347 KOps/s | $\textbf{\color{#35bf28}+6.84\\%}$ | | test_seq_wrap[compile] | 1.4761ms | 0.3262ms | 3.0652 KOps/s | 3.0364 KOps/s | $\color{#35bf28}+0.95\\%$ | | test_seq_wrap[compile-overhead] | 0.3128s | 0.1489s | 6.7154 Ops/s | 6.6944 Ops/s | $\color{#35bf28}+0.31\\%$ | | test_func_call_runtime[False-eager] | 0.8981ms | 0.7504ms | 1.3327 KOps/s | 1.3318 KOps/s | $\color{#35bf28}+0.07\\%$ | | test_func_call_runtime[False-compile] | 0.9932ms | 0.8155ms | 1.2263 KOps/s | 1.2151 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.4982ms | 0.3572ms | 2.7993 KOps/s | 2.7832 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_func_call_runtime[True-eager] | 1.1300ms | 0.9949ms | 1.0051 KOps/s | 997.4993 Ops/s | $\color{#35bf28}+0.77\\%$ | | test_func_call_runtime[True-compile] | 0.9624ms | 0.8549ms | 1.1697 KOps/s | 1.1672 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.4678ms | 0.4021ms | 2.4868 KOps/s | 2.5013 KOps/s | $\color{#d91a1a}-0.58\\%$ | | test_distributed | 2.4214ms | 72.6365μs | 13.7672 KOps/s | 11.4944 KOps/s | $\textbf{\color{#35bf28}+19.77\\%}$ | | test_tdmodule | 38.9700μs | 15.0769μs | 66.3266 KOps/s | 58.9220 KOps/s | $\textbf{\color{#35bf28}+12.57\\%}$ | | test_tdmodule_dispatch | 47.9410μs | 30.0109μs | 33.3213 KOps/s | 29.0605 KOps/s | $\textbf{\color{#35bf28}+14.66\\%}$ | | test_tdseq | 31.5010μs | 15.6538μs | 63.8822 KOps/s | 56.5111 KOps/s | $\textbf{\color{#35bf28}+13.04\\%}$ | | test_tdseq_dispatch | 49.1710μs | 32.2601μs | 30.9981 KOps/s | 27.4090 KOps/s | $\textbf{\color{#35bf28}+13.09\\%}$ | | test_instantiation_functorch | 2.0867ms | 1.9930ms | 501.7623 Ops/s | 497.2410 Ops/s | $\color{#35bf28}+0.91\\%$ | | test_instantiation_td | 2.0276ms | 1.3122ms | 762.0728 Ops/s | 763.8546 Ops/s | $\color{#d91a1a}-0.23\\%$ | | test_exec_functorch | 0.3758ms | 0.2267ms | 4.4106 KOps/s | 4.3806 KOps/s | $\color{#35bf28}+0.68\\%$ | | test_exec_functional_call | 0.3604ms | 0.2221ms | 4.5032 KOps/s | 4.4774 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_exec_td | 0.2804ms | 0.2222ms | 4.5001 KOps/s | 4.4875 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_exec_td_decorator | 0.6444ms | 0.2981ms | 3.3548 KOps/s | 3.3333 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_vmap_mlp_speed[True-True] | 0.8170ms | 0.6736ms | 1.4846 KOps/s | 1.4752 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_vmap_mlp_speed[True-False] | 0.8233ms | 0.6689ms | 1.4950 KOps/s | 1.4794 KOps/s | $\color{#35bf28}+1.06\\%$ | | test_vmap_mlp_speed[False-True] | 0.7484ms | 0.5905ms | 1.6934 KOps/s | 1.6229 KOps/s | $\color{#35bf28}+4.35\\%$ | | test_vmap_mlp_speed[False-False] | 0.7605ms | 0.5906ms | 1.6931 KOps/s | 1.6953 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.4017ms | 0.7527ms | 1.3286 KOps/s | 1.3234 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9053ms | 0.7492ms | 1.3347 KOps/s | 1.3195 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8004ms | 0.6554ms | 1.5259 KOps/s | 1.5224 KOps/s | $\color{#35bf28}+0.23\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7946ms | 0.6546ms | 1.5276 KOps/s | 1.5221 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_vmap_transformer_speed[True-True] | 9.0229ms | 8.8371ms | 113.1595 Ops/s | 112.3106 Ops/s | $\color{#35bf28}+0.76\\%$ | | test_vmap_transformer_speed[True-False] | 10.1253ms | 8.8124ms | 113.4768 Ops/s | 112.7515 Ops/s | $\color{#35bf28}+0.64\\%$ | | test_vmap_transformer_speed[False-True] | 8.9297ms | 8.7175ms | 114.7119 Ops/s | 113.4178 Ops/s | $\color{#35bf28}+1.14\\%$ | | test_vmap_transformer_speed[False-False] | 8.8875ms | 8.7269ms | 114.5888 Ops/s | 113.5793 Ops/s | $\color{#35bf28}+0.89\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 21.3191ms | 21.0845ms | 47.4281 Ops/s | 47.3062 Ops/s | $\color{#35bf28}+0.26\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 21.2998ms | 21.0556ms | 47.4934 Ops/s | 47.2519 Ops/s | $\color{#35bf28}+0.51\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 20.9247ms | 20.8237ms | 48.0222 Ops/s | 47.7384 Ops/s | $\color{#35bf28}+0.59\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 21.0638ms | 20.8565ms | 47.9466 Ops/s | 47.7050 Ops/s | $\color{#35bf28}+0.51\\%$ | | test_to_module_speed[True] | 1.5725ms | 1.4762ms | 677.4163 Ops/s | 674.7740 Ops/s | $\color{#35bf28}+0.39\\%$ | | test_to_module_speed[False] | 1.5751ms | 1.4716ms | 679.5228 Ops/s | 683.6523 Ops/s | $\color{#d91a1a}-0.60\\%$ | | test_tc_init | 54.3910μs | 36.0432μs | 27.7445 KOps/s | 26.2005 KOps/s | $\textbf{\color{#35bf28}+5.89\\%}$ | | test_tc_init_nested | 0.1153ms | 71.2239μs | 14.0402 KOps/s | 13.0300 KOps/s | $\textbf{\color{#35bf28}+7.75\\%}$ | | test_tc_first_layer_tensor | 19.0400μs | 4.0300μs | 248.1391 KOps/s | 250.5494 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_tc_first_layer_nontensor | 19.1900μs | 4.0167μs | 248.9605 KOps/s | 248.8830 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_tc_second_layer_tensor | 14.1052μs | 1.2932μs | 773.2935 KOps/s | 775.2856 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_tc_second_layer_nontensor | 24.2110μs | 4.6214μs | 216.3859 KOps/s | 217.3886 KOps/s | $\color{#d91a1a}-0.46\\%$ | | test_unbind | 0.3230s | 12.1517ms | 82.2931 Ops/s | 75.9071 Ops/s | $\textbf{\color{#35bf28}+8.41\\%}$ | | test_full_like | 0.7600ms | 0.5776ms | 1.7312 KOps/s | 1.7266 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_zeros_like | 0.2736ms | 0.1978ms | 5.0552 KOps/s | 5.0534 KOps/s | $\color{#35bf28}+0.04\\%$ | | test_ones_like | 0.3425ms | 0.1977ms | 5.0583 KOps/s | 5.0612 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_clone | 0.4962ms | 0.4142ms | 2.4140 KOps/s | 2.4093 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_squeeze | 31.2510μs | 11.6621μs | 85.7480 KOps/s | 83.8382 KOps/s | $\color{#35bf28}+2.28\\%$ | | test_unsqueeze | 0.2627ms | 82.7068μs | 12.0909 KOps/s | 11.8320 KOps/s | $\color{#35bf28}+2.19\\%$ | | test_split | 0.4820ms | 0.1785ms | 5.6022 KOps/s | 5.4328 KOps/s | $\color{#35bf28}+3.12\\%$ | | test_permute | 0.3013ms | 0.1895ms | 5.2773 KOps/s | 5.2204 KOps/s | $\color{#35bf28}+1.09\\%$ | | test_stack | 1.2824ms | 0.9107ms | 1.0980 KOps/s | 1.1108 KOps/s | $\color{#d91a1a}-1.15\\%$ | | test_cat | 1.3507ms | 1.2319ms | 811.7697 Ops/s | 811.8104 Ops/s | $-0.01\\%$ |
Description
add_custom_mapping
for themappings
functionmappings
dict outside of the function to reduce overhead"none"
option in the mappings to allow for no transformation to occur.