pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
832 stars 74 forks source link

[Feature] from_modules expand_identical kwarg #911

Closed vmoens closed 3 months ago

github-actions[bot] commented 3 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 47.9890μs | 23.1346μs | 43.2253 KOps/s | 42.5180 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_plain_set_stack_nested | 56.9160μs | 23.3865μs | 42.7597 KOps/s | 42.5296 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_plain_set_nested_inplace | 60.6030μs | 25.1798μs | 39.7145 KOps/s | 39.4347 KOps/s | $\color{#35bf28}+0.71\\%$ | | test_plain_set_stack_nested_inplace | 60.5030μs | 25.1248μs | 39.8012 KOps/s | 39.6354 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_items | 16.9520μs | 2.6125μs | 382.7770 KOps/s | 378.1491 KOps/s | $\color{#35bf28}+1.22\\%$ | | test_items_nested | 0.5156ms | 0.3667ms | 2.7272 KOps/s | 2.7661 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_items_nested_locked | 1.3875ms | 0.3663ms | 2.7301 KOps/s | 2.7566 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_items_nested_leaf | 0.1635ms | 86.7250μs | 11.5307 KOps/s | 11.2564 KOps/s | $\color{#35bf28}+2.44\\%$ | | test_items_stack_nested | 0.6266ms | 0.3670ms | 2.7247 KOps/s | 2.7495 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_items_stack_nested_leaf | 0.1701ms | 88.4321μs | 11.3081 KOps/s | 11.3087 KOps/s | $-0.01\\%$ | | test_items_stack_nested_locked | 0.5832ms | 0.3679ms | 2.7185 KOps/s | 2.7511 KOps/s | $\color{#d91a1a}-1.19\\%$ | | test_keys | 17.8440μs | 3.9601μs | 252.5192 KOps/s | 250.4890 KOps/s | $\color{#35bf28}+0.81\\%$ | | test_keys_nested | 0.2594ms | 0.1437ms | 6.9614 KOps/s | 6.8706 KOps/s | $\color{#35bf28}+1.32\\%$ | | test_keys_nested_locked | 0.7761ms | 0.1510ms | 6.6246 KOps/s | 6.6266 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_keys_nested_leaf | 0.1894ms | 0.1229ms | 8.1367 KOps/s | 7.9478 KOps/s | $\color{#35bf28}+2.38\\%$ | | test_keys_stack_nested | 0.3063ms | 0.1449ms | 6.9028 KOps/s | 6.7881 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_keys_stack_nested_leaf | 0.2081ms | 0.1244ms | 8.0366 KOps/s | 7.9696 KOps/s | $\color{#35bf28}+0.84\\%$ | | test_keys_stack_nested_locked | 0.2939ms | 0.1519ms | 6.5837 KOps/s | 6.6516 KOps/s | $\color{#d91a1a}-1.02\\%$ | | test_values | 5.7557μs | 1.1477μs | 871.2892 KOps/s | 851.4647 KOps/s | $\color{#35bf28}+2.33\\%$ | | test_values_nested | 94.5760μs | 50.0082μs | 19.9967 KOps/s | 19.8068 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_values_nested_locked | 98.4730μs | 50.2364μs | 19.9059 KOps/s | 19.7084 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_values_nested_leaf | 92.7430μs | 45.1688μs | 22.1392 KOps/s | 21.7956 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_values_stack_nested | 0.1057ms | 51.1355μs | 19.5559 KOps/s | 19.2068 KOps/s | $\color{#35bf28}+1.82\\%$ | | test_values_stack_nested_leaf | 85.0880μs | 44.8937μs | 22.2748 KOps/s | 21.7305 KOps/s | $\color{#35bf28}+2.50\\%$ | | test_values_stack_nested_locked | 0.1428ms | 50.9696μs | 19.6195 KOps/s | 19.4534 KOps/s | $\color{#35bf28}+0.85\\%$ | | test_membership | 4.9507μs | 0.7548μs | 1.3249 MOps/s | 1.2420 MOps/s | $\textbf{\color{#35bf28}+6.67\\%}$ | | test_membership_nested | 24.3650μs | 2.6213μs | 381.4947 KOps/s | 368.5423 KOps/s | $\color{#35bf28}+3.51\\%$ | | test_membership_nested_leaf | 26.5400μs | 2.6576μs | 376.2806 KOps/s | 370.6125 KOps/s | $\color{#35bf28}+1.53\\%$ | | test_membership_stacked_nested | 24.6360μs | 2.6349μs | 379.5157 KOps/s | 371.0964 KOps/s | $\color{#35bf28}+2.27\\%$ | | test_membership_stacked_nested_leaf | 30.5160μs | 2.6703μs | 374.4855 KOps/s | 370.1891 KOps/s | $\color{#35bf28}+1.16\\%$ | | test_membership_nested_last | 28.1530μs | 4.0000μs | 249.9980 KOps/s | 248.3669 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_membership_nested_leaf_last | 27.9920μs | 3.9836μs | 251.0265 KOps/s | 247.5793 KOps/s | $\color{#35bf28}+1.39\\%$ | | test_membership_stacked_nested_last | 34.0240μs | 7.5332μs | 132.7454 KOps/s | 247.1101 KOps/s | $\textbf{\color{#d91a1a}-46.28\\%}$ | | test_membership_stacked_nested_leaf_last | 0.1428ms | 7.9847μs | 125.2393 KOps/s | 245.9261 KOps/s | $\textbf{\color{#d91a1a}-49.07\\%}$ | | test_nested_getleaf | 36.6680μs | 10.8825μs | 91.8905 KOps/s | 93.1421 KOps/s | $\color{#d91a1a}-1.34\\%$ | | test_nested_get | 32.2900μs | 10.3834μs | 96.3078 KOps/s | 95.4187 KOps/s | $\color{#35bf28}+0.93\\%$ | | test_stacked_getleaf | 38.5620μs | 10.8248μs | 92.3806 KOps/s | 87.1255 KOps/s | $\textbf{\color{#35bf28}+6.03\\%}$ | | test_stacked_get | 28.0320μs | 10.2522μs | 97.5400 KOps/s | 94.8731 KOps/s | $\color{#35bf28}+2.81\\%$ | | test_nested_getitemleaf | 43.5710μs | 11.4157μs | 87.5986 KOps/s | 85.5973 KOps/s | $\color{#35bf28}+2.34\\%$ | | test_nested_getitem | 49.0280μs | 10.4425μs | 95.7628 KOps/s | 92.8310 KOps/s | $\color{#35bf28}+3.16\\%$ | | test_stacked_getitemleaf | 0.1702ms | 11.9067μs | 83.9860 KOps/s | 87.0051 KOps/s | $\color{#d91a1a}-3.47\\%$ | | test_stacked_getitem | 33.6330μs | 10.5165μs | 95.0884 KOps/s | 94.0309 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_lock_nested | 1.2542ms | 0.5138ms | 1.9464 KOps/s | 1.6824 KOps/s | $\textbf{\color{#35bf28}+15.69\\%}$ | | test_lock_stack_nested | 1.8940ms | 0.4818ms | 2.0757 KOps/s | 2.0783 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_unlock_nested | 0.8027ms | 0.4350ms | 2.2988 KOps/s | 2.3399 KOps/s | $\color{#d91a1a}-1.76\\%$ | | test_unlock_stack_nested | 0.7131ms | 0.3951ms | 2.5312 KOps/s | 2.5407 KOps/s | $\color{#d91a1a}-0.37\\%$ | | test_flatten_speed | 0.6115ms | 0.1079ms | 9.2721 KOps/s | 9.3786 KOps/s | $\color{#d91a1a}-1.14\\%$ | | test_unflatten_speed | 1.0046ms | 0.4563ms | 2.1914 KOps/s | 2.1955 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_common_ops | 5.0869ms | 1.1801ms | 847.3864 Ops/s | 805.8173 Ops/s | $\textbf{\color{#35bf28}+5.16\\%}$ | | test_creation | 15.4580μs | 2.5205μs | 396.7415 KOps/s | 399.5331 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_creation_empty | 55.0730μs | 20.5055μs | 48.7673 KOps/s | 48.4989 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_creation_nested_1 | 63.8480μs | 24.2766μs | 41.1919 KOps/s | 41.9261 KOps/s | $\color{#d91a1a}-1.75\\%$ | | test_creation_nested_2 | 92.2520μs | 28.1893μs | 35.4745 KOps/s | 36.0820 KOps/s | $\color{#d91a1a}-1.68\\%$ | | test_clone | 65.0910μs | 17.7550μs | 56.3223 KOps/s | 59.0401 KOps/s | $\color{#d91a1a}-4.60\\%$ | | test_getitem[int] | 1.3031ms | 12.9214μs | 77.3910 KOps/s | 80.1417 KOps/s | $\color{#d91a1a}-3.43\\%$ | | test_getitem[slice_int] | 0.1366ms | 34.3209μs | 29.1367 KOps/s | 30.5954 KOps/s | $\color{#d91a1a}-4.77\\%$ | | test_getitem[range] | 0.1619ms | 58.3166μs | 17.1478 KOps/s | 16.8105 KOps/s | $\color{#35bf28}+2.01\\%$ | | test_getitem[tuple] | 0.1251ms | 27.4851μs | 36.3833 KOps/s | 37.3770 KOps/s | $\color{#d91a1a}-2.66\\%$ | | test_getitem[list] | 0.1813ms | 53.5021μs | 18.6909 KOps/s | 18.2725 KOps/s | $\color{#35bf28}+2.29\\%$ | | test_setitem_dim[int] | 73.5270μs | 35.4770μs | 28.1873 KOps/s | 28.1081 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_setitem_dim[slice_int] | 0.1247ms | 75.3576μs | 13.2701 KOps/s | 13.3901 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_setitem_dim[range] | 0.1749ms | 94.0498μs | 10.6327 KOps/s | 10.3400 KOps/s | $\color{#35bf28}+2.83\\%$ | | test_setitem_dim[tuple] | 0.1028ms | 61.7032μs | 16.2066 KOps/s | 16.1306 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_setitem | 84.9090μs | 32.0430μs | 31.2081 KOps/s | 32.7126 KOps/s | $\color{#d91a1a}-4.60\\%$ | | test_set | 80.2790μs | 30.8765μs | 32.3871 KOps/s | 33.5999 KOps/s | $\color{#d91a1a}-3.61\\%$ | | test_set_shared | 3.0448ms | 0.2164ms | 4.6204 KOps/s | 4.5231 KOps/s | $\color{#35bf28}+2.15\\%$ | | test_update | 0.1410ms | 39.0498μs | 25.6083 KOps/s | 26.3296 KOps/s | $\color{#d91a1a}-2.74\\%$ | | test_update_nested | 0.1053ms | 50.2181μs | 19.9131 KOps/s | 20.2989 KOps/s | $\color{#d91a1a}-1.90\\%$ | | test_update__nested | 96.6300μs | 35.6208μs | 28.0735 KOps/s | 28.8125 KOps/s | $\color{#d91a1a}-2.56\\%$ | | test_set_nested | 83.4150μs | 33.2221μs | 30.1004 KOps/s | 31.4621 KOps/s | $\color{#d91a1a}-4.33\\%$ | | test_set_nested_new | 90.7090μs | 38.4862μs | 25.9833 KOps/s | 27.0837 KOps/s | $\color{#d91a1a}-4.06\\%$ | | test_select | 0.1556ms | 56.0216μs | 17.8502 KOps/s | 18.3562 KOps/s | $\color{#d91a1a}-2.76\\%$ | | test_select_nested | 0.9738ms | 61.5470μs | 16.2477 KOps/s | 16.4532 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_exclude_nested | 0.1586ms | 80.4035μs | 12.4373 KOps/s | 12.2985 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_empty[True] | 0.5368ms | 0.3417ms | 2.9269 KOps/s | 2.9012 KOps/s | $\color{#35bf28}+0.89\\%$ | | test_empty[False] | 7.9497μs | 1.2262μs | 815.5287 KOps/s | 785.2401 KOps/s | $\color{#35bf28}+3.86\\%$ | | test_unbind_speed | 0.5865ms | 0.3281ms | 3.0479 KOps/s | 3.0862 KOps/s | $\color{#d91a1a}-1.24\\%$ | | test_unbind_speed_stack0 | 0.4579ms | 0.3160ms | 3.1646 KOps/s | 3.1687 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_unbind_speed_stack1 | 78.9196ms | 0.8020ms | 1.2469 KOps/s | 1.3095 KOps/s | $\color{#d91a1a}-4.78\\%$ | | test_split | 77.4570ms | 2.2799ms | 438.6123 Ops/s | 438.6565 Ops/s | $\color{#d91a1a}-0.01\\%$ | | test_chunk | 77.6320ms | 2.2771ms | 439.1481 Ops/s | 442.0022 Ops/s | $\color{#d91a1a}-0.65\\%$ | | test_creation[device0] | 0.2350ms | 0.1210ms | 8.2675 KOps/s | 8.0849 KOps/s | $\color{#35bf28}+2.26\\%$ | | test_creation_from_tensor | 4.4125ms | 0.1224ms | 8.1720 KOps/s | 8.2962 KOps/s | $\color{#d91a1a}-1.50\\%$ | | test_add_one[memmap_tensor0] | 0.1887ms | 8.1383μs | 122.8761 KOps/s | 125.8527 KOps/s | $\color{#d91a1a}-2.37\\%$ | | test_contiguous[memmap_tensor0] | 23.8250μs | 2.1979μs | 454.9825 KOps/s | 453.6411 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_stack[memmap_tensor0] | 34.0130μs | 6.1540μs | 162.4948 KOps/s | 169.2464 KOps/s | $\color{#d91a1a}-3.99\\%$ | | test_memmaptd_index | 1.1517ms | 0.4443ms | 2.2506 KOps/s | 2.2436 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_memmaptd_index_astensor | 0.7656ms | 0.5168ms | 1.9350 KOps/s | 1.8071 KOps/s | $\textbf{\color{#35bf28}+7.07\\%}$ | | test_memmaptd_index_op | 1.8612ms | 1.1114ms | 899.7748 Ops/s | 921.5409 Ops/s | $\color{#d91a1a}-2.36\\%$ | | test_serialize_model | 0.2114s | 0.1408s | 7.1035 Ops/s | 7.7341 Ops/s | $\textbf{\color{#d91a1a}-8.15\\%}$ | | test_serialize_model_pickle | 0.5028s | 0.4106s | 2.4356 Ops/s | 2.5355 Ops/s | $\color{#d91a1a}-3.94\\%$ | | test_serialize_weights | 0.1426s | 0.1283s | 7.7924 Ops/s | 7.1624 Ops/s | $\textbf{\color{#35bf28}+8.80\\%}$ | | test_serialize_weights_returnearly | 0.1776s | 0.1676s | 5.9673 Ops/s | 6.1107 Ops/s | $\color{#d91a1a}-2.35\\%$ | | test_serialize_weights_pickle | 0.4831s | 0.4015s | 2.4908 Ops/s | 2.5339 Ops/s | $\color{#d91a1a}-1.70\\%$ | | test_serialize_weights_filesystem | 0.1522s | 0.1424s | 7.0205 Ops/s | 6.9226 Ops/s | $\color{#35bf28}+1.41\\%$ | | test_serialize_model_filesystem | 0.1613s | 0.1487s | 6.7259 Ops/s | 6.1264 Ops/s | $\textbf{\color{#35bf28}+9.78\\%}$ | | test_reshape_pytree | 93.2830μs | 41.3964μs | 24.1567 KOps/s | 24.3423 KOps/s | $\color{#d91a1a}-0.76\\%$ | | test_reshape_td | 91.8510μs | 49.7429μs | 20.1034 KOps/s | 19.9925 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_view_pytree | 82.2230μs | 39.6692μs | 25.2084 KOps/s | 25.4828 KOps/s | $\color{#d91a1a}-1.08\\%$ | | test_view_td | 0.1225ms | 55.9412μs | 17.8759 KOps/s | 17.5068 KOps/s | $\color{#35bf28}+2.11\\%$ | | test_unbind_pytree | 97.4810μs | 36.2415μs | 27.5927 KOps/s | 28.2273 KOps/s | $\color{#d91a1a}-2.25\\%$ | | test_unbind_td | 0.3863ms | 48.2591μs | 20.7215 KOps/s | 21.0779 KOps/s | $\color{#d91a1a}-1.69\\%$ | | test_split_pytree | 0.1834ms | 39.3918μs | 25.3860 KOps/s | 25.4532 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_split_td | 0.6360ms | 63.9391μs | 15.6399 KOps/s | 16.1949 KOps/s | $\color{#d91a1a}-3.43\\%$ | | test_add_pytree | 0.1121ms | 44.5795μs | 22.4318 KOps/s | 21.9943 KOps/s | $\color{#35bf28}+1.99\\%$ | | test_add_td | 0.1760ms | 87.8301μs | 11.3856 KOps/s | 11.4732 KOps/s | $\color{#d91a1a}-0.76\\%$ | | test_distributed | 0.3688ms | 0.1333ms | 7.5020 KOps/s | 7.4955 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_tdmodule | 38.9630μs | 18.3845μs | 54.3936 KOps/s | 54.4706 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_tdmodule_dispatch | 73.2670μs | 38.1798μs | 26.1918 KOps/s | 26.5152 KOps/s | $\color{#d91a1a}-1.22\\%$ | | test_tdseq | 45.8550μs | 20.4464μs | 48.9084 KOps/s | 48.7261 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_tdseq_dispatch | 69.3490μs | 43.3697μs | 23.0576 KOps/s | 23.8518 KOps/s | $\color{#d91a1a}-3.33\\%$ | | test_instantiation_functorch | 1.8030ms | 1.5916ms | 628.3023 Ops/s | 611.2044 Ops/s | $\color{#35bf28}+2.80\\%$ | | test_instantiation_td | 2.6964ms | 1.2087ms | 827.3201 Ops/s | 865.5087 Ops/s | $\color{#d91a1a}-4.41\\%$ | | test_exec_functorch | 0.2928ms | 0.1847ms | 5.4132 KOps/s | 5.5218 KOps/s | $\color{#d91a1a}-1.97\\%$ | | test_exec_functional_call | 5.8400ms | 0.1780ms | 5.6165 KOps/s | 5.7361 KOps/s | $\color{#d91a1a}-2.09\\%$ | | test_exec_td | 0.2954ms | 0.1855ms | 5.3904 KOps/s | 5.7570 KOps/s | $\textbf{\color{#d91a1a}-6.37\\%}$ | | test_exec_td_decorator | 0.7022ms | 0.2599ms | 3.8470 KOps/s | 3.8558 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_vmap_mlp_speed[True-True] | 1.8699ms | 0.6254ms | 1.5990 KOps/s | 1.6085 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_vmap_mlp_speed[True-False] | 0.7059ms | 0.5998ms | 1.6673 KOps/s | 1.6202 KOps/s | $\color{#35bf28}+2.91\\%$ | | test_vmap_mlp_speed[False-True] | 0.7106ms | 0.4954ms | 2.0187 KOps/s | 1.9587 KOps/s | $\color{#35bf28}+3.06\\%$ | | test_vmap_mlp_speed[False-False] | 0.7048ms | 0.4954ms | 2.0187 KOps/s | 1.9738 KOps/s | $\color{#35bf28}+2.28\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.2764ms | 0.7012ms | 1.4261 KOps/s | 1.4044 KOps/s | $\color{#35bf28}+1.55\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 1.3146ms | 0.7048ms | 1.4187 KOps/s | 1.4095 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.9198ms | 0.5834ms | 1.7141 KOps/s | 1.7105 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.9101ms | 0.5843ms | 1.7115 KOps/s | 1.7111 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_to_module_speed[True] | 2.4937ms | 1.8667ms | 535.7162 Ops/s | 553.8411 Ops/s | $\color{#d91a1a}-3.27\\%$ | | test_to_module_speed[False] | 2.0749ms | 1.8383ms | 543.9744 Ops/s | 567.2628 Ops/s | $\color{#d91a1a}-4.11\\%$ | | test_tc_init | 88.7050μs | 45.6718μs | 21.8954 KOps/s | 22.8922 KOps/s | $\color{#d91a1a}-4.35\\%$ | | test_tc_init_nested | 0.1735ms | 94.3546μs | 10.5983 KOps/s | 11.4463 KOps/s | $\textbf{\color{#d91a1a}-7.41\\%}$ | | test_tc_first_layer_tensor | 44.1920μs | 9.6152μs | 104.0025 KOps/s | 108.1275 KOps/s | $\color{#d91a1a}-3.81\\%$ | | test_tc_first_layer_nontensor | 57.1160μs | 9.3772μs | 106.6420 KOps/s | 108.6628 KOps/s | $\color{#d91a1a}-1.86\\%$ | | test_tc_second_layer_tensor | 42.3590μs | 2.9235μs | 342.0509 KOps/s | 348.4032 KOps/s | $\color{#d91a1a}-1.82\\%$ | | test_tc_second_layer_nontensor | 39.1130μs | 10.5855μs | 94.4689 KOps/s | 95.7683 KOps/s | $\color{#d91a1a}-1.36\\%$ | | test_unbind | 8.9434ms | 8.7289ms | 114.5621 Ops/s | 69.8449 Ops/s | $\textbf{\color{#35bf28}+64.02\\%}$ | | test_full_like | 10.1084ms | 7.6558ms | 130.6198 Ops/s | 132.0580 Ops/s | $\color{#d91a1a}-1.09\\%$ | | test_zeros_like | 14.0229ms | 6.4522ms | 154.9856 Ops/s | 133.7796 Ops/s | $\textbf{\color{#35bf28}+15.85\\%}$ | | test_ones_like | 13.6877ms | 7.6355ms | 130.9667 Ops/s | 132.2644 Ops/s | $\color{#d91a1a}-0.98\\%$ | | test_clone | 20.6080ms | 9.7000ms | 103.0927 Ops/s | 98.6638 Ops/s | $\color{#35bf28}+4.49\\%$ | | test_squeeze | 85.4190μs | 15.1300μs | 66.0940 KOps/s | 68.9748 KOps/s | $\color{#d91a1a}-4.18\\%$ | | test_unsqueeze | 0.2007ms | 0.1012ms | 9.8861 KOps/s | 9.6478 KOps/s | $\color{#35bf28}+2.47\\%$ | | test_split | 0.3666ms | 0.2105ms | 4.7502 KOps/s | 4.7259 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_permute | 0.3563ms | 0.2295ms | 4.3572 KOps/s | 4.3410 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_stack | 31.4041ms | 24.6276ms | 40.6048 Ops/s | 39.0892 Ops/s | $\color{#35bf28}+3.88\\%$ | | test_cat | 31.7140ms | 24.3596ms | 41.0515 Ops/s | 39.0554 Ops/s | $\textbf{\color{#35bf28}+5.11\\%}$ |
github-actions[bot] commented 3 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ------------------------------------ | | test_plain_set_nested | 0.1503ms | 16.1151μs | 62.0536 KOps/s | 60.5532 KOps/s | $\color{#35bf28}+2.48\\%$ | | test_plain_set_stack_nested | 44.0520μs | 16.3405μs | 61.1977 KOps/s | 60.8847 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_plain_set_nested_inplace | 43.2030μs | 17.1602μs | 58.2744 KOps/s | 56.8305 KOps/s | $\color{#35bf28}+2.54\\%$ | | test_plain_set_stack_nested_inplace | 48.3930μs | 17.2750μs | 57.8873 KOps/s | 57.1290 KOps/s | $\color{#35bf28}+1.33\\%$ | | test_items | 15.6510μs | 4.6197μs | 216.4662 KOps/s | 215.0275 KOps/s | $\color{#35bf28}+0.67\\%$ | | test_items_nested | 0.5028ms | 0.3945ms | 2.5348 KOps/s | 2.5119 KOps/s | $\color{#35bf28}+0.91\\%$ | | test_items_nested_locked | 0.4186ms | 0.3954ms | 2.5293 KOps/s | 2.5163 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_items_nested_leaf | 0.1022ms | 85.6951μs | 11.6693 KOps/s | 11.6972 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_items_stack_nested | 0.4446ms | 0.3965ms | 2.5219 KOps/s | 2.5682 KOps/s | $\color{#d91a1a}-1.80\\%$ | | test_items_stack_nested_leaf | 0.1038ms | 86.4380μs | 11.5690 KOps/s | 11.6552 KOps/s | $\color{#d91a1a}-0.74\\%$ | | test_items_stack_nested_locked | 0.4585ms | 0.3971ms | 2.5180 KOps/s | 2.5475 KOps/s | $\color{#d91a1a}-1.16\\%$ | | test_keys | 17.2610μs | 4.3697μs | 228.8481 KOps/s | 229.6237 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_keys_nested | 89.8450μs | 66.9640μs | 14.9334 KOps/s | 15.1213 KOps/s | $\color{#d91a1a}-1.24\\%$ | | test_keys_nested_locked | 2.1728ms | 72.5651μs | 13.7807 KOps/s | 13.6387 KOps/s | $\color{#35bf28}+1.04\\%$ | | test_keys_nested_leaf | 76.5240μs | 57.1717μs | 17.4912 KOps/s | 17.2024 KOps/s | $\color{#35bf28}+1.68\\%$ | | test_keys_stack_nested | 91.5550μs | 66.9998μs | 14.9254 KOps/s | 15.0768 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_keys_stack_nested_leaf | 75.4640μs | 57.4133μs | 17.4176 KOps/s | 17.3997 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_keys_stack_nested_locked | 95.4560μs | 71.1789μs | 14.0491 KOps/s | 14.1107 KOps/s | $\color{#d91a1a}-0.44\\%$ | | test_values | 8.5340μs | 1.7595μs | 568.3377 KOps/s | 566.7445 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_values_nested | 51.8430μs | 33.7316μs | 29.6458 KOps/s | 29.6333 KOps/s | $\color{#35bf28}+0.04\\%$ | | test_values_nested_locked | 57.3930μs | 35.3748μs | 28.2687 KOps/s | 27.6907 KOps/s | $\color{#35bf28}+2.09\\%$ | | test_values_nested_leaf | 50.6330μs | 29.9890μs | 33.3456 KOps/s | 33.3724 KOps/s | $\color{#d91a1a}-0.08\\%$ | | test_values_stack_nested | 51.3030μs | 34.0513μs | 29.3675 KOps/s | 28.8446 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_values_stack_nested_leaf | 54.1830μs | 30.2004μs | 33.1122 KOps/s | 32.6056 KOps/s | $\color{#35bf28}+1.55\\%$ | | test_values_stack_nested_locked | 52.0720μs | 35.5123μs | 28.1593 KOps/s | 27.1221 KOps/s | $\color{#35bf28}+3.82\\%$ | | test_membership | 1.5156μs | 0.5506μs | 1.8161 MOps/s | 1.7751 MOps/s | $\color{#35bf28}+2.31\\%$ | | test_membership_nested | 25.3410μs | 2.0758μs | 481.7498 KOps/s | 505.6741 KOps/s | $\color{#d91a1a}-4.73\\%$ | | test_membership_nested_leaf | 12.4805μs | 2.0185μs | 495.4256 KOps/s | 518.1331 KOps/s | $\color{#d91a1a}-4.38\\%$ | | test_membership_stacked_nested | 13.8300μs | 2.0593μs | 485.5928 KOps/s | 498.8071 KOps/s | $\color{#d91a1a}-2.65\\%$ | | test_membership_stacked_nested_leaf | 21.2810μs | 2.0495μs | 487.9237 KOps/s | 496.0931 KOps/s | $\color{#d91a1a}-1.65\\%$ | | test_membership_nested_last | 15.6610μs | 2.9572μs | 338.1563 KOps/s | 337.7853 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_membership_nested_leaf_last | 31.0510μs | 2.9503μs | 338.9491 KOps/s | 339.1804 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_membership_stacked_nested_last | 22.3810μs | 2.9640μs | 337.3868 KOps/s | 108.6829 KOps/s | $\textbf{\color{#35bf28}+210.43\\%}$ | | test_membership_stacked_nested_leaf_last | 31.8420μs | 2.9747μs | 336.1703 KOps/s | 109.0206 KOps/s | $\textbf{\color{#35bf28}+208.35\\%}$ | | test_nested_getleaf | 30.5510μs | 7.9978μs | 125.0351 KOps/s | 124.3522 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_nested_get | 26.9120μs | 7.5349μs | 132.7154 KOps/s | 131.6263 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_stacked_getleaf | 34.2320μs | 8.0269μs | 124.5816 KOps/s | 123.1861 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_stacked_get | 20.7310μs | 7.5537μs | 132.3861 KOps/s | 131.6090 KOps/s | $\color{#35bf28}+0.59\\%$ | | test_nested_getitemleaf | 30.5310μs | 8.2073μs | 121.8428 KOps/s | 121.7109 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_nested_getitem | 25.8210μs | 7.7179μs | 129.5696 KOps/s | 129.6423 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_stacked_getitemleaf | 24.1010μs | 8.1871μs | 122.1435 KOps/s | 120.1426 KOps/s | $\color{#35bf28}+1.67\\%$ | | test_stacked_getitem | 29.3810μs | 7.7332μs | 129.3122 KOps/s | 128.8823 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_lock_nested | 7.0491ms | 0.4827ms | 2.0718 KOps/s | 2.0889 KOps/s | $\color{#d91a1a}-0.82\\%$ | | test_lock_stack_nested | 0.4663ms | 0.4342ms | 2.3029 KOps/s | 2.3717 KOps/s | $\color{#d91a1a}-2.90\\%$ | | test_unlock_nested | 0.8590ms | 0.3952ms | 2.5304 KOps/s | 2.5360 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_unlock_stack_nested | 0.3955ms | 0.3533ms | 2.8301 KOps/s | 2.9272 KOps/s | $\color{#d91a1a}-3.32\\%$ | | test_flatten_speed | 0.4030ms | 0.1064ms | 9.4024 KOps/s | 9.3839 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_unflatten_speed | 0.3181ms | 0.2931ms | 3.4119 KOps/s | 3.3725 KOps/s | $\color{#35bf28}+1.17\\%$ | | test_common_ops | 1.5125ms | 1.2825ms | 779.7463 Ops/s | 749.2621 Ops/s | $\color{#35bf28}+4.07\\%$ | | test_creation | 15.4110μs | 1.9727μs | 506.9118 KOps/s | 506.1312 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_creation_empty | 44.7530μs | 15.9105μs | 62.8517 KOps/s | 60.7270 KOps/s | $\color{#35bf28}+3.50\\%$ | | test_creation_nested_1 | 44.4720μs | 17.7768μs | 56.2532 KOps/s | 54.4193 KOps/s | $\color{#35bf28}+3.37\\%$ | | test_creation_nested_2 | 43.1320μs | 20.5124μs | 48.7511 KOps/s | 47.9906 KOps/s | $\color{#35bf28}+1.58\\%$ | | test_clone | 90.9950μs | 29.1828μs | 34.2668 KOps/s | 31.5411 KOps/s | $\textbf{\color{#35bf28}+8.64\\%}$ | | test_getitem[int] | 1.1373ms | 16.7624μs | 59.6573 KOps/s | 57.7972 KOps/s | $\color{#35bf28}+3.22\\%$ | | test_getitem[slice_int] | 0.1568ms | 29.0661μs | 34.4043 KOps/s | 31.9388 KOps/s | $\textbf{\color{#35bf28}+7.72\\%}$ | | test_getitem[range] | 0.2901ms | 0.1149ms | 8.7049 KOps/s | 8.7305 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_getitem[tuple] | 0.1562ms | 24.7785μs | 40.3575 KOps/s | 39.5676 KOps/s | $\color{#35bf28}+2.00\\%$ | | test_getitem[list] | 0.2332ms | 0.1044ms | 9.5797 KOps/s | 9.6300 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_setitem_dim[int] | 70.1640μs | 51.3111μs | 19.4890 KOps/s | 19.9335 KOps/s | $\color{#d91a1a}-2.23\\%$ | | test_setitem_dim[slice_int] | 98.5760μs | 77.5812μs | 12.8897 KOps/s | 13.3516 KOps/s | $\color{#d91a1a}-3.46\\%$ | | test_setitem_dim[range] | 0.3087ms | 0.1402ms | 7.1308 KOps/s | 7.2756 KOps/s | $\color{#d91a1a}-1.99\\%$ | | test_setitem_dim[tuple] | 0.1073ms | 70.1597μs | 14.2532 KOps/s | 14.7160 KOps/s | $\color{#d91a1a}-3.15\\%$ | | test_setitem | 75.8040μs | 41.9050μs | 23.8635 KOps/s | 23.4159 KOps/s | $\color{#35bf28}+1.91\\%$ | | test_set | 79.6940μs | 40.8289μs | 24.4924 KOps/s | 23.4228 KOps/s | $\color{#35bf28}+4.57\\%$ | | test_set_shared | 0.3917ms | 52.9694μs | 18.8788 KOps/s | 18.8961 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_update | 76.3540μs | 49.0549μs | 20.3853 KOps/s | 20.0044 KOps/s | $\color{#35bf28}+1.90\\%$ | | test_update_nested | 88.4350μs | 57.7283μs | 17.3225 KOps/s | 16.7255 KOps/s | $\color{#35bf28}+3.57\\%$ | | test_update__nested | 85.6950μs | 60.3602μs | 16.5672 KOps/s | 15.5767 KOps/s | $\textbf{\color{#35bf28}+6.36\\%}$ | | test_set_nested | 74.3240μs | 43.8046μs | 22.8287 KOps/s | 21.7203 KOps/s | $\textbf{\color{#35bf28}+5.10\\%}$ | | test_set_nested_new | 74.8040μs | 46.9945μs | 21.2791 KOps/s | 20.0185 KOps/s | $\textbf{\color{#35bf28}+6.30\\%}$ | | test_select | 0.1479ms | 62.4507μs | 16.0126 KOps/s | 15.3570 KOps/s | $\color{#35bf28}+4.27\\%$ | | test_select_nested | 75.1240μs | 52.5391μs | 19.0335 KOps/s | 19.1845 KOps/s | $\color{#d91a1a}-0.79\\%$ | | test_exclude_nested | 91.6360μs | 71.9874μs | 13.8913 KOps/s | 14.1696 KOps/s | $\color{#d91a1a}-1.96\\%$ | | test_empty[True] | 0.3186ms | 0.2954ms | 3.3852 KOps/s | 3.4127 KOps/s | $\color{#d91a1a}-0.81\\%$ | | test_empty[False] | 2.9482μs | 0.9763μs | 1.0243 MOps/s | 1.0678 MOps/s | $\color{#d91a1a}-4.07\\%$ | | test_to | 69.5040μs | 36.2015μs | 27.6232 KOps/s | 27.0540 KOps/s | $\color{#35bf28}+2.10\\%$ | | test_to_nonblocking | 49.1730μs | 23.0876μs | 43.3133 KOps/s | 43.7317 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_unbind_speed | 1.2864ms | 0.3000ms | 3.3336 KOps/s | 3.2666 KOps/s | $\color{#35bf28}+2.05\\%$ | | test_unbind_speed_stack0 | 0.3459ms | 0.2925ms | 3.4184 KOps/s | 3.3618 KOps/s | $\color{#35bf28}+1.68\\%$ | | test_unbind_speed_stack1 | 89.5938ms | 0.7807ms | 1.2809 KOps/s | 1.3126 KOps/s | $\color{#d91a1a}-2.42\\%$ | | test_split | 91.0505ms | 2.3208ms | 430.8933 Ops/s | 432.3662 Ops/s | $\color{#d91a1a}-0.34\\%$ | | test_chunk | 93.6567ms | 2.3320ms | 428.8196 Ops/s | 428.9476 Ops/s | $\color{#d91a1a}-0.03\\%$ | | test_creation[device0] | 0.1812ms | 0.1012ms | 9.8795 KOps/s | 9.7374 KOps/s | $\color{#35bf28}+1.46\\%$ | | test_creation_from_tensor | 0.1543ms | 0.1013ms | 9.8728 KOps/s | 9.9989 KOps/s | $\color{#d91a1a}-1.26\\%$ | | test_add_one[memmap_tensor0] | 0.1546ms | 8.9755μs | 111.4147 KOps/s | 109.6679 KOps/s | $\color{#35bf28}+1.59\\%$ | | test_contiguous[memmap_tensor0] | 28.8010μs | 2.1450μs | 466.1927 KOps/s | 456.0233 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_stack[memmap_tensor0] | 28.8820μs | 6.6566μs | 150.2265 KOps/s | 145.6105 KOps/s | $\color{#35bf28}+3.17\\%$ | | test_memmaptd_index | 1.1215ms | 0.4228ms | 2.3651 KOps/s | 2.3795 KOps/s | $\color{#d91a1a}-0.60\\%$ | | test_memmaptd_index_astensor | 0.7444ms | 0.4881ms | 2.0489 KOps/s | 2.0453 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_memmaptd_index_op | 1.3928ms | 1.0096ms | 990.4644 Ops/s | 972.6125 Ops/s | $\color{#35bf28}+1.84\\%$ | | test_serialize_model | 98.2257ms | 94.5216ms | 10.5796 Ops/s | 10.2507 Ops/s | $\color{#35bf28}+3.21\\%$ | | test_serialize_model_pickle | 1.3481s | 1.2361s | 0.8090 Ops/s | 0.8073 Ops/s | $\color{#35bf28}+0.22\\%$ | | test_serialize_weights | 0.1874s | 0.1025s | 9.7586 Ops/s | 9.3361 Ops/s | $\color{#35bf28}+4.53\\%$ | | test_serialize_weights_returnearly | 0.2961s | 86.8233ms | 11.5176 Ops/s | 11.5587 Ops/s | $\color{#d91a1a}-0.36\\%$ | | test_serialize_weights_pickle | 1.3486s | 1.2365s | 0.8087 Ops/s | 0.8085 Ops/s | $\color{#35bf28}+0.03\\%$ | | test_reshape_pytree | 67.8140μs | 38.4009μs | 26.0411 KOps/s | 26.0292 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_reshape_td | 67.4230μs | 43.4809μs | 22.9986 KOps/s | 22.9846 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_view_pytree | 59.5240μs | 37.1411μs | 26.9244 KOps/s | 26.4767 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_view_td | 70.5240μs | 47.8034μs | 20.9190 KOps/s | 20.6744 KOps/s | $\color{#35bf28}+1.18\\%$ | | test_unbind_pytree | 60.8640μs | 36.4035μs | 27.4699 KOps/s | 27.1035 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_unbind_td | 0.3825ms | 45.5273μs | 21.9648 KOps/s | 21.4838 KOps/s | $\color{#35bf28}+2.24\\%$ | | test_split_pytree | 0.3557ms | 50.3481μs | 19.8617 KOps/s | 19.8289 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_split_td | 0.1681ms | 58.1451μs | 17.1984 KOps/s | 16.9217 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_add_pytree | 93.0950μs | 59.8685μs | 16.7033 KOps/s | 16.9145 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_add_td | 0.1284ms | 91.0626μs | 10.9815 KOps/s | 9.8495 KOps/s | $\textbf{\color{#35bf28}+11.49\\%}$ | | test_compile_add_one_nested[tensordict-compile] | 0.4089ms | 0.2074ms | 4.8213 KOps/s | 4.8450 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.2620ms | 0.1765ms | 5.6644 KOps/s | 5.8389 KOps/s | $\color{#d91a1a}-2.99\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.1832ms | 0.1427ms | 7.0083 KOps/s | 6.9756 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.2556ms | 0.1951ms | 5.1253 KOps/s | 5.1692 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_compile_copy_nested[tensordict-compile] | 46.9920μs | 22.2679μs | 44.9076 KOps/s | 44.9561 KOps/s | $\color{#d91a1a}-0.11\\%$ | | test_compile_copy_nested[tensordict-eager] | 74.6640μs | 48.5825μs | 20.5836 KOps/s | 20.4688 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1025ms | 71.8815μs | 13.9118 KOps/s | 14.0386 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_compile_copy_nested[pytree-eager] | 80.4340μs | 59.8818μs | 16.6996 KOps/s | 16.7849 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.4118ms | 0.3206ms | 3.1187 KOps/s | 3.1125 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.2692ms | 0.2217ms | 4.5104 KOps/s | 4.5411 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.2061ms | 0.1288ms | 7.7658 KOps/s | 7.8060 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.1265ms | 64.1869μs | 15.5795 KOps/s | 15.9490 KOps/s | $\color{#d91a1a}-2.32\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.3732ms | 0.3202ms | 3.1230 KOps/s | 3.1417 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.6909ms | 0.6274ms | 1.5938 KOps/s | 1.5938 KOps/s | $+0.00\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.3194ms | 0.2719ms | 3.6782 KOps/s | 3.7439 KOps/s | $\color{#d91a1a}-1.76\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.3624ms | 0.3235ms | 3.0912 KOps/s | 3.0940 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.1722ms | 78.3853μs | 12.7575 KOps/s | 13.2040 KOps/s | $\color{#d91a1a}-3.38\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.2591ms | 0.1290ms | 7.7546 KOps/s | 7.7577 KOps/s | $\color{#d91a1a}-0.04\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.5887ms | 0.5307ms | 1.8844 KOps/s | 1.8606 KOps/s | $\color{#35bf28}+1.28\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.3646ms | 0.3206ms | 3.1192 KOps/s | 3.1353 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_compile_copy_flat[tensordict-compile] | 42.7620μs | 18.6984μs | 53.4804 KOps/s | 49.9210 KOps/s | $\textbf{\color{#35bf28}+7.13\\%}$ | | test_compile_copy_flat[tensordict-eager] | 53.5530μs | 32.1989μs | 31.0570 KOps/s | 31.0680 KOps/s | $\color{#d91a1a}-0.04\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1073ms | 74.9095μs | 13.3494 KOps/s | 13.3583 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_compile_copy_flat[pytree-eager] | 87.8050μs | 60.6165μs | 16.4972 KOps/s | 16.4582 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.5256ms | 0.9203ms | 1.0867 KOps/s | 1.0856 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.4465ms | 3.3030ms | 302.7596 Ops/s | 300.4911 Ops/s | $\color{#35bf28}+0.75\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.4870ms | 0.9023ms | 1.1082 KOps/s | 1.0923 KOps/s | $\color{#35bf28}+1.46\\%$ | | test_compile_assign_and_add[pytree-eager] | 3.3601ms | 3.2918ms | 303.7806 Ops/s | 300.8587 Ops/s | $\color{#35bf28}+0.97\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.1383ms | 0.1094ms | 9.1445 KOps/s | 9.0807 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.2340ms | 62.6558μs | 15.9602 KOps/s | 15.4238 KOps/s | $\color{#35bf28}+3.48\\%$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.1332ms | 0.1015ms | 9.8555 KOps/s | 9.7545 KOps/s | $\color{#35bf28}+1.04\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 82.9550μs | 45.6193μs | 21.9205 KOps/s | 21.9240 KOps/s | $\color{#d91a1a}-0.02\\%$ | | test_compile_indexing[tensor-pytree-compile] | 0.1484ms | 0.1036ms | 9.6566 KOps/s | 9.6404 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_compile_indexing[tensor-pytree-eager] | 91.3150μs | 45.5959μs | 21.9318 KOps/s | 22.2766 KOps/s | $\color{#d91a1a}-1.55\\%$ | | test_compile_indexing[slice-tensordict-compile] | 0.1801ms | 0.1382ms | 7.2356 KOps/s | 7.2331 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.1885ms | 26.2222μs | 38.1356 KOps/s | 38.1632 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.2305ms | 0.1299ms | 7.6967 KOps/s | 7.6960 KOps/s | $+0.01\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 52.8730μs | 22.3485μs | 44.7457 KOps/s | 45.7893 KOps/s | $\color{#d91a1a}-2.28\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.1704ms | 0.1295ms | 7.7245 KOps/s | 7.3912 KOps/s | $\color{#35bf28}+4.51\\%$ | | test_compile_indexing[slice-pytree-eager] | 49.9730μs | 22.2336μs | 44.9770 KOps/s | 44.1257 KOps/s | $\color{#35bf28}+1.93\\%$ | | test_compile_indexing[int-tensordict-compile] | 0.2058ms | 0.1373ms | 7.2831 KOps/s | 7.2632 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.5280ms | 26.2072μs | 38.1574 KOps/s | 39.0316 KOps/s | $\color{#d91a1a}-2.24\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.1712ms | 0.1293ms | 7.7326 KOps/s | 7.4475 KOps/s | $\color{#35bf28}+3.83\\%$ | | test_compile_indexing[int-tensorclass-eager] | 53.3830μs | 21.7753μs | 45.9237 KOps/s | 45.8961 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_compile_indexing[int-pytree-compile] | 0.1840ms | 0.1290ms | 7.7528 KOps/s | 7.5564 KOps/s | $\color{#35bf28}+2.60\\%$ | | test_compile_indexing[int-pytree-eager] | 55.1730μs | 22.2112μs | 45.0224 KOps/s | 46.2377 KOps/s | $\color{#d91a1a}-2.63\\%$ | | test_mod_add[eager] | 81.9350μs | 37.4959μs | 26.6696 KOps/s | 26.1562 KOps/s | $\color{#35bf28}+1.96\\%$ | | test_mod_add[compile] | 0.2223ms | 66.5021μs | 15.0371 KOps/s | 14.8957 KOps/s | $\color{#35bf28}+0.95\\%$ | | test_mod_add[compile-overhead] | 0.2776ms | 0.1471ms | 6.7991 KOps/s | 6.8597 KOps/s | $\color{#d91a1a}-0.88\\%$ | | test_mod_wrap[eager] | 0.3496ms | 0.2536ms | 3.9438 KOps/s | 3.8417 KOps/s | $\color{#35bf28}+2.66\\%$ | | test_mod_wrap[compile] | 1.2290ms | 0.2895ms | 3.4547 KOps/s | 3.3895 KOps/s | $\color{#35bf28}+1.92\\%$ | | test_mod_wrap[compile-overhead] | 8.1905ms | 4.3093ms | 232.0582 Ops/s | 227.7363 Ops/s | $\color{#35bf28}+1.90\\%$ | | test_mod_wrap_and_backward[eager] | 1.5305ms | 1.4265ms | 701.0390 Ops/s | 695.9611 Ops/s | $\color{#35bf28}+0.73\\%$ | | test_mod_wrap_and_backward[compile] | 1.5696ms | 1.4356ms | 696.5696 Ops/s | 749.9832 Ops/s | $\textbf{\color{#d91a1a}-7.12\\%}$ | | test_mod_wrap_and_backward[compile-overhead] | 1.4473ms | 0.9902ms | 1.0099 KOps/s | 1.0411 KOps/s | $\color{#d91a1a}-3.00\\%$ | | test_seq_add[eager] | 0.1585ms | 0.1083ms | 9.2319 KOps/s | 9.3196 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_seq_add[compile] | 0.2078ms | 84.4950μs | 11.8350 KOps/s | 12.0624 KOps/s | $\color{#d91a1a}-1.89\\%$ | | test_seq_add[compile-overhead] | 0.1592ms | 0.1215ms | 8.2290 KOps/s | 8.0952 KOps/s | $\color{#35bf28}+1.65\\%$ | | test_seq_wrap[eager] | 0.4815ms | 0.4182ms | 2.3910 KOps/s | 2.3390 KOps/s | $\color{#35bf28}+2.22\\%$ | | test_seq_wrap[compile] | 1.4694ms | 0.3193ms | 3.1321 KOps/s | 3.1191 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_seq_wrap[compile-overhead] | 0.3082s | 0.1475s | 6.7819 Ops/s | 6.8147 Ops/s | $\color{#d91a1a}-0.48\\%$ | | test_func_call_runtime[False-eager] | 0.7784ms | 0.7425ms | 1.3468 KOps/s | 1.3438 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_func_call_runtime[False-compile] | 0.8627ms | 0.8010ms | 1.2484 KOps/s | 1.2114 KOps/s | $\color{#35bf28}+3.05\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.4103ms | 0.3564ms | 2.8061 KOps/s | 2.7911 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_func_call_runtime[True-eager] | 1.0494ms | 0.9838ms | 1.0165 KOps/s | 993.1818 Ops/s | $\color{#35bf28}+2.35\\%$ | | test_func_call_runtime[True-compile] | 0.8865ms | 0.8416ms | 1.1882 KOps/s | 1.1683 KOps/s | $\color{#35bf28}+1.71\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.4539ms | 0.3975ms | 2.5158 KOps/s | 2.4943 KOps/s | $\color{#35bf28}+0.86\\%$ | | test_distributed | 0.2678ms | 67.9732μs | 14.7117 KOps/s | 13.9846 KOps/s | $\textbf{\color{#35bf28}+5.20\\%}$ | | test_tdmodule | 40.1520μs | 15.2450μs | 65.5954 KOps/s | 63.2390 KOps/s | $\color{#35bf28}+3.73\\%$ | | test_tdmodule_dispatch | 46.8520μs | 31.2317μs | 32.0188 KOps/s | 30.6596 KOps/s | $\color{#35bf28}+4.43\\%$ | | test_tdseq | 31.0420μs | 15.8082μs | 63.2585 KOps/s | 60.4443 KOps/s | $\color{#35bf28}+4.66\\%$ | | test_tdseq_dispatch | 53.1730μs | 33.0989μs | 30.2125 KOps/s | 29.1364 KOps/s | $\color{#35bf28}+3.69\\%$ | | test_instantiation_functorch | 2.0828ms | 2.0056ms | 498.6105 Ops/s | 502.3022 Ops/s | $\color{#d91a1a}-0.73\\%$ | | test_instantiation_td | 2.0019ms | 1.2939ms | 772.8581 Ops/s | 779.5555 Ops/s | $\color{#d91a1a}-0.86\\%$ | | test_exec_functorch | 0.2796ms | 0.2209ms | 4.5266 KOps/s | 4.5544 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_exec_functional_call | 0.3648ms | 0.2173ms | 4.6028 KOps/s | 4.5375 KOps/s | $\color{#35bf28}+1.44\\%$ | | test_exec_td | 0.2428ms | 0.2164ms | 4.6219 KOps/s | 4.4410 KOps/s | $\color{#35bf28}+4.07\\%$ | | test_exec_td_decorator | 1.0915ms | 0.2933ms | 3.4094 KOps/s | 3.3420 KOps/s | $\color{#35bf28}+2.02\\%$ | | test_vmap_mlp_speed[True-True] | 0.8101ms | 0.6694ms | 1.4938 KOps/s | 1.4714 KOps/s | $\color{#35bf28}+1.52\\%$ | | test_vmap_mlp_speed[True-False] | 0.7281ms | 0.6680ms | 1.4970 KOps/s | 1.4807 KOps/s | $\color{#35bf28}+1.10\\%$ | | test_vmap_mlp_speed[False-True] | 0.6295ms | 0.5879ms | 1.7009 KOps/s | 1.6971 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_vmap_mlp_speed[False-False] | 0.6555ms | 0.5896ms | 1.6962 KOps/s | 1.6869 KOps/s | $\color{#35bf28}+0.55\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.2639ms | 0.7506ms | 1.3323 KOps/s | 1.3346 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8644ms | 0.7495ms | 1.3342 KOps/s | 1.3397 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8871ms | 0.6546ms | 1.5276 KOps/s | 1.5254 KOps/s | $\color{#35bf28}+0.14\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8142ms | 0.6549ms | 1.5269 KOps/s | 1.5346 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_vmap_transformer_speed[True-True] | 8.8948ms | 8.8067ms | 113.5498 Ops/s | 113.4149 Ops/s | $\color{#35bf28}+0.12\\%$ | | test_vmap_transformer_speed[True-False] | 8.8767ms | 8.8076ms | 113.5377 Ops/s | 113.6520 Ops/s | $\color{#d91a1a}-0.10\\%$ | | test_vmap_transformer_speed[False-True] | 9.1055ms | 8.7650ms | 114.0904 Ops/s | 114.9823 Ops/s | $\color{#d91a1a}-0.78\\%$ | | test_vmap_transformer_speed[False-False] | 8.8007ms | 8.7055ms | 114.8702 Ops/s | 114.8687 Ops/s | $+0.00\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 21.0758ms | 21.0062ms | 47.6050 Ops/s | 47.4779 Ops/s | $\color{#35bf28}+0.27\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 21.0788ms | 20.9918ms | 47.6376 Ops/s | 47.5675 Ops/s | $\color{#35bf28}+0.15\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 21.2681ms | 20.8074ms | 48.0598 Ops/s | 48.0192 Ops/s | $\color{#35bf28}+0.08\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 20.9405ms | 20.8057ms | 48.0638 Ops/s | 47.9699 Ops/s | $\color{#35bf28}+0.20\\%$ | | test_to_module_speed[True] | 2.8941ms | 1.4815ms | 675.0067 Ops/s | 672.7747 Ops/s | $\color{#35bf28}+0.33\\%$ | | test_to_module_speed[False] | 1.9151ms | 1.4656ms | 682.3212 Ops/s | 682.1307 Ops/s | $\color{#35bf28}+0.03\\%$ | | test_tc_init | 51.0730μs | 33.8570μs | 29.5360 KOps/s | 28.8001 KOps/s | $\color{#35bf28}+2.56\\%$ | | test_tc_init_nested | 0.2006ms | 71.3604μs | 14.0134 KOps/s | 14.2723 KOps/s | $\color{#d91a1a}-1.81\\%$ | | test_tc_first_layer_tensor | 17.4910μs | 4.0045μs | 249.7209 KOps/s | 247.8738 KOps/s | $\color{#35bf28}+0.75\\%$ | | test_tc_first_layer_nontensor | 16.5110μs | 4.0213μs | 248.6787 KOps/s | 246.4074 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_tc_second_layer_tensor | 30.7068μs | 1.3028μs | 767.5665 KOps/s | 775.0990 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_tc_second_layer_nontensor | 17.4910μs | 4.6231μs | 216.3066 KOps/s | 218.6052 KOps/s | $\color{#d91a1a}-1.05\\%$ | | test_unbind | 0.3183s | 12.9437ms | 77.2578 Ops/s | 82.5423 Ops/s | $\textbf{\color{#d91a1a}-6.40\\%}$ | | test_full_like | 0.6619ms | 0.5774ms | 1.7318 KOps/s | 1.7318 KOps/s | $+0.00\\%$ | | test_zeros_like | 0.2659ms | 0.1977ms | 5.0582 KOps/s | 5.0559 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_ones_like | 0.2179ms | 0.1975ms | 5.0630 KOps/s | 5.0580 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_clone | 0.4441ms | 0.4146ms | 2.4118 KOps/s | 2.4141 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_squeeze | 28.2210μs | 11.6828μs | 85.5963 KOps/s | 84.9616 KOps/s | $\color{#35bf28}+0.75\\%$ | | test_unsqueeze | 0.2612ms | 81.1712μs | 12.3196 KOps/s | 11.9810 KOps/s | $\color{#35bf28}+2.83\\%$ | | test_split | 0.4694ms | 0.1815ms | 5.5107 KOps/s | 5.5152 KOps/s | $\color{#d91a1a}-0.08\\%$ | | test_permute | 0.3005ms | 0.1919ms | 5.2119 KOps/s | 5.1288 KOps/s | $\color{#35bf28}+1.62\\%$ | | test_stack | 1.2518ms | 0.9072ms | 1.1023 KOps/s | 1.1345 KOps/s | $\color{#d91a1a}-2.84\\%$ | | test_cat | 1.2490ms | 1.2316ms | 811.9562 Ops/s | 812.0408 Ops/s | $\color{#d91a1a}-0.01\\%$ |