issues
search
pytorch
/
tensordict
TensorDict is a pytorch dedicated tensor container.
MIT License
832
stars
74
forks
source link
[Feature] construct tds with kwargs
#905
Closed
vmoens
closed
3 months ago
vmoens
commented
3 months ago
cc @shagunsodhani
github-actions[bot]
commented
3 months ago
$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests
Total Benchmarks: 144. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}10$.
Expand to view detailed results
| Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 59.2110μs | 24.7128μs | 40.4649 KOps/s | 44.5703 KOps/s | $\textbf{\color{#d91a1a}-9.21\\%}$ | | test_plain_set_stack_nested | 48.0400μs | 22.6658μs | 44.1192 KOps/s | 44.8510 KOps/s | $\color{#d91a1a}-1.63\\%$ | | test_plain_set_nested_inplace | 64.4820μs | 24.5780μs | 40.6868 KOps/s | 41.1400 KOps/s | $\color{#d91a1a}-1.10\\%$ | | test_plain_set_stack_nested_inplace | 58.3000μs | 24.6449μs | 40.5764 KOps/s | 41.0057 KOps/s | $\color{#d91a1a}-1.05\\%$ | | test_items | 21.4710μs | 2.6894μs | 371.8347 KOps/s | 372.3094 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_items_nested | 1.3829ms | 0.3606ms | 2.7728 KOps/s | 2.7373 KOps/s | $\color{#35bf28}+1.30\\%$ | | test_items_nested_locked | 0.7446ms | 0.3570ms | 2.8008 KOps/s | 2.7397 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_items_nested_leaf | 0.1704ms | 88.2105μs | 11.3365 KOps/s | 11.5544 KOps/s | $\color{#d91a1a}-1.89\\%$ | | test_items_stack_nested | 1.4204ms | 0.3619ms | 2.7636 KOps/s | 2.7061 KOps/s | $\color{#35bf28}+2.12\\%$ | | test_items_stack_nested_leaf | 0.1461ms | 87.6086μs | 11.4144 KOps/s | 11.6508 KOps/s | $\color{#d91a1a}-2.03\\%$ | | test_items_stack_nested_locked | 0.7029ms | 0.3629ms | 2.7557 KOps/s | 2.7060 KOps/s | $\color{#35bf28}+1.84\\%$ | | test_keys | 38.8620μs | 3.9544μs | 252.8821 KOps/s | 251.4712 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_keys_nested | 0.2443ms | 0.1451ms | 6.8920 KOps/s | 6.9894 KOps/s | $\color{#d91a1a}-1.39\\%$ | | test_keys_nested_locked | 0.7586ms | 0.1526ms | 6.5521 KOps/s | 6.7631 KOps/s | $\color{#d91a1a}-3.12\\%$ | | test_keys_nested_leaf | 0.2138ms | 0.1259ms | 7.9444 KOps/s | 8.2372 KOps/s | $\color{#d91a1a}-3.56\\%$ | | test_keys_stack_nested | 0.2826ms | 0.1468ms | 6.8140 KOps/s | 7.0335 KOps/s | $\color{#d91a1a}-3.12\\%$ | | test_keys_stack_nested_leaf | 0.5682ms | 0.1358ms | 7.3647 KOps/s | 8.2763 KOps/s | $\textbf{\color{#d91a1a}-11.01\\%}$ | | test_keys_stack_nested_locked | 0.7555ms | 0.1570ms | 6.3679 KOps/s | 6.7976 KOps/s | $\textbf{\color{#d91a1a}-6.32\\%}$ | | test_values | 5.8985μs | 1.1702μs | 854.5910 KOps/s | 863.2432 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_values_nested | 97.5430μs | 49.4645μs | 20.2165 KOps/s | 20.1042 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_values_nested_locked | 0.1065ms | 50.0307μs | 19.9877 KOps/s | 20.0674 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_values_nested_leaf | 91.0510μs | 44.4701μs | 22.4870 KOps/s | 22.5635 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_values_stack_nested | 0.1020ms | 51.1892μs | 19.5354 KOps/s | 19.3391 KOps/s | $\color{#35bf28}+1.01\\%$ | | test_values_stack_nested_leaf | 80.8820μs | 44.3776μs | 22.5339 KOps/s | 22.6599 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_values_stack_nested_locked | 0.3426ms | 51.1430μs | 19.5530 KOps/s | 19.7961 KOps/s | $\color{#d91a1a}-1.23\\%$ | | test_membership | 18.4159μs | 0.7316μs | 1.3668 MOps/s | 1.3820 MOps/s | $\color{#d91a1a}-1.10\\%$ | | test_membership_nested | 34.7250μs | 2.7325μs | 365.9647 KOps/s | 329.2802 KOps/s | $\textbf{\color{#35bf28}+11.14\\%}$ | | test_membership_nested_leaf | 26.7900μs | 2.7365μs | 365.4308 KOps/s | 378.6977 KOps/s | $\color{#d91a1a}-3.50\\%$ | | test_membership_stacked_nested | 63.8090μs | 2.7175μs | 367.9919 KOps/s | 378.7418 KOps/s | $\color{#d91a1a}-2.84\\%$ | | test_membership_stacked_nested_leaf | 17.3930μs | 2.8053μs | 356.4653 KOps/s | 375.4961 KOps/s | $\textbf{\color{#d91a1a}-5.07\\%}$ | | test_membership_nested_last | 56.0660μs | 4.0711μs | 245.6336 KOps/s | 254.1154 KOps/s | $\color{#d91a1a}-3.34\\%$ | | test_membership_nested_leaf_last | 26.4400μs | 4.1159μs | 242.9574 KOps/s | 251.7331 KOps/s | $\color{#d91a1a}-3.49\\%$ | | test_membership_stacked_nested_last | 21.4610μs | 6.5828μs | 151.9120 KOps/s | 77.6034 KOps/s | $\textbf{\color{#35bf28}+95.75\\%}$ | | test_membership_stacked_nested_leaf_last | 27.5720μs | 6.6584μs | 150.1863 KOps/s | 77.6917 KOps/s | $\textbf{\color{#35bf28}+93.31\\%}$ | | test_nested_getleaf | 36.9400μs | 10.8980μs | 91.7598 KOps/s | 93.4812 KOps/s | $\color{#d91a1a}-1.84\\%$ | | test_nested_get | 34.1040μs | 10.3469μs | 96.6475 KOps/s | 98.7434 KOps/s | $\color{#d91a1a}-2.12\\%$ | | test_stacked_getleaf | 37.4410μs | 10.7903μs | 92.6762 KOps/s | 95.0233 KOps/s | $\color{#d91a1a}-2.47\\%$ | | test_stacked_get | 40.9870μs | 10.2874μs | 97.2062 KOps/s | 99.4094 KOps/s | $\color{#d91a1a}-2.22\\%$ | | test_nested_getitemleaf | 38.0510μs | 11.4249μs | 87.5281 KOps/s | 88.7426 KOps/s | $\color{#d91a1a}-1.37\\%$ | | test_nested_getitem | 44.2630μs | 10.4904μs | 95.3257 KOps/s | 96.6275 KOps/s | $\color{#d91a1a}-1.35\\%$ | | test_stacked_getitemleaf | 33.7730μs | 11.3837μs | 87.8449 KOps/s | 90.8373 KOps/s | $\color{#d91a1a}-3.29\\%$ | | test_stacked_getitem | 32.1110μs | 10.3688μs | 96.4431 KOps/s | 98.3504 KOps/s | $\color{#d91a1a}-1.94\\%$ | | test_lock_nested | 1.1819ms | 0.5132ms | 1.9486 KOps/s | 1.7103 KOps/s | $\textbf{\color{#35bf28}+13.94\\%}$ | | test_lock_stack_nested | 1.0372ms | 0.4740ms | 2.1097 KOps/s | 2.1818 KOps/s | $\color{#d91a1a}-3.31\\%$ | | test_unlock_nested | 0.7467ms | 0.4330ms | 2.3096 KOps/s | 1.9902 KOps/s | $\textbf{\color{#35bf28}+16.05\\%}$ | | test_unlock_stack_nested | 0.6126ms | 0.3807ms | 2.6265 KOps/s | 2.6781 KOps/s | $\color{#d91a1a}-1.92\\%$ | | test_flatten_speed | 0.1860ms | 0.1054ms | 9.4914 KOps/s | 9.4262 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_unflatten_speed | 0.7898ms | 0.4419ms | 2.2631 KOps/s | 2.2455 KOps/s | $\color{#35bf28}+0.78\\%$ | | test_common_ops | 1.8340ms | 1.1163ms | 895.8543 Ops/s | 877.8409 Ops/s | $\color{#35bf28}+2.05\\%$ | | test_creation | 72.2660μs | 2.4995μs | 400.0739 KOps/s | 412.7022 KOps/s | $\color{#d91a1a}-3.06\\%$ | | test_creation_empty | 52.1770μs | 19.4755μs | 51.3465 KOps/s | 51.4398 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_creation_nested_1 | 49.7630μs | 23.1096μs | 43.2720 KOps/s | 43.4008 KOps/s | $\color{#d91a1a}-0.30\\%$ | | test_creation_nested_2 | 80.8510μs | 27.0914μs | 36.9121 KOps/s | 37.5331 KOps/s | $\color{#d91a1a}-1.65\\%$ | | test_clone | 60.0530μs | 17.6662μs | 56.6052 KOps/s | 57.5933 KOps/s | $\color{#d91a1a}-1.72\\%$ | | test_getitem[int] | 0.9095ms | 12.5656μs | 79.5826 KOps/s | 77.9519 KOps/s | $\color{#35bf28}+2.09\\%$ | | test_getitem[slice_int] | 0.1547ms | 32.5788μs | 30.6948 KOps/s | 30.2595 KOps/s | $\color{#35bf28}+1.44\\%$ | | test_getitem[range] | 0.1570ms | 57.0841μs | 17.5180 KOps/s | 17.9279 KOps/s | $\color{#d91a1a}-2.29\\%$ | | test_getitem[tuple] | 0.1231ms | 27.3342μs | 36.5843 KOps/s | 37.7045 KOps/s | $\color{#d91a1a}-2.97\\%$ | | test_getitem[list] | 0.1583ms | 52.0157μs | 19.2250 KOps/s | 19.3339 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_setitem_dim[int] | 56.0960μs | 34.5355μs | 28.9557 KOps/s | 29.0821 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_setitem_dim[slice_int] | 0.1084ms | 71.4128μs | 14.0031 KOps/s | 13.9272 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_setitem_dim[range] | 0.1415ms | 91.4636μs | 10.9333 KOps/s | 10.7064 KOps/s | $\color{#35bf28}+2.12\\%$ | | test_setitem_dim[tuple] | 96.8410μs | 58.0129μs | 17.2375 KOps/s | 16.7270 KOps/s | $\color{#35bf28}+3.05\\%$ | | test_setitem | 81.2630μs | 30.0760μs | 33.2491 KOps/s | 33.5233 KOps/s | $\color{#d91a1a}-0.82\\%$ | | test_set | 72.5760μs | 29.2994μs | 34.1304 KOps/s | 34.4363 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_set_shared | 7.1371ms | 0.2152ms | 4.6471 KOps/s | 4.6206 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_update | 0.1374ms | 37.2151μs | 26.8708 KOps/s | 27.5811 KOps/s | $\color{#d91a1a}-2.58\\%$ | | test_update_nested | 0.1050ms | 47.0580μs | 21.2504 KOps/s | 21.2544 KOps/s | $\color{#d91a1a}-0.02\\%$ | | test_update__nested | 0.1004ms | 35.4786μs | 28.1860 KOps/s | 28.9575 KOps/s | $\color{#d91a1a}-2.66\\%$ | | test_set_nested | 83.4360μs | 32.4157μs | 30.8492 KOps/s | 30.7079 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_set_nested_new | 0.1050ms | 36.8206μs | 27.1587 KOps/s | 26.5946 KOps/s | $\color{#35bf28}+2.12\\%$ | | test_select | 0.1181ms | 54.2929μs | 18.4186 KOps/s | 18.1743 KOps/s | $\color{#35bf28}+1.34\\%$ | | test_select_nested | 0.9158ms | 61.4601μs | 16.2707 KOps/s | 16.3335 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_exclude_nested | 0.1670ms | 81.7788μs | 12.2281 KOps/s | 12.3904 KOps/s | $\color{#d91a1a}-1.31\\%$ | | test_empty[True] | 0.5527ms | 0.3439ms | 2.9075 KOps/s | 2.9186 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_empty[False] | 8.2005μs | 1.2339μs | 810.4230 KOps/s | 793.7493 KOps/s | $\color{#35bf28}+2.10\\%$ | | test_unbind_speed | 0.3688ms | 0.3268ms | 3.0603 KOps/s | 3.0843 KOps/s | $\color{#d91a1a}-0.78\\%$ | | test_unbind_speed_stack0 | 0.4492ms | 0.3087ms | 3.2399 KOps/s | 3.2942 KOps/s | $\color{#d91a1a}-1.65\\%$ | | test_unbind_speed_stack1 | 74.6803ms | 0.8547ms | 1.1700 KOps/s | 1.3610 KOps/s | $\textbf{\color{#d91a1a}-14.04\\%}$ | | test_split | 74.7467ms | 2.2784ms | 438.8950 Ops/s | 419.0525 Ops/s | $\color{#35bf28}+4.74\\%$ | | test_chunk | 73.1804ms | 2.2807ms | 438.4543 Ops/s | 481.0021 Ops/s | $\textbf{\color{#d91a1a}-8.85\\%}$ | | test_creation[device0] | 4.1911ms | 0.1215ms | 8.2308 KOps/s | 8.2046 KOps/s | $\color{#35bf28}+0.32\\%$ | | test_creation_from_tensor | 0.2514ms | 0.1201ms | 8.3251 KOps/s | 8.0836 KOps/s | $\color{#35bf28}+2.99\\%$ | | test_add_one[memmap_tensor0] | 0.1680ms | 8.1093μs | 123.3153 KOps/s | 129.1603 KOps/s | $\color{#d91a1a}-4.53\\%$ | | test_contiguous[memmap_tensor0] | 21.5410μs | 2.2262μs | 449.2033 KOps/s | 452.5625 KOps/s | $\color{#d91a1a}-0.74\\%$ | | test_stack[memmap_tensor0] | 45.1350μs | 6.0987μs | 163.9690 KOps/s | 165.0888 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_memmaptd_index | 1.1263ms | 0.4403ms | 2.2710 KOps/s | 2.0489 KOps/s | $\textbf{\color{#35bf28}+10.84\\%}$ | | test_memmaptd_index_astensor | 0.9390ms | 0.5208ms | 1.9202 KOps/s | 1.9692 KOps/s | $\color{#d91a1a}-2.49\\%$ | | test_memmaptd_index_op | 1.8670ms | 1.0999ms | 909.1989 Ops/s | 938.0953 Ops/s | $\color{#d91a1a}-3.08\\%$ | | test_serialize_model | 0.1441s | 0.1288s | 7.7622 Ops/s | 7.7494 Ops/s | $\color{#35bf28}+0.16\\%$ | | test_serialize_model_pickle | 0.4617s | 0.3915s | 2.5544 Ops/s | 2.5026 Ops/s | $\color{#35bf28}+2.07\\%$ | | test_serialize_weights | 0.1336s | 0.1243s | 8.0446 Ops/s | 8.0282 Ops/s | $\color{#35bf28}+0.20\\%$ | | test_serialize_weights_returnearly | 0.1916s | 0.1694s | 5.9016 Ops/s | 6.0989 Ops/s | $\color{#d91a1a}-3.24\\%$ | | test_serialize_weights_pickle | 0.4405s | 0.3961s | 2.5248 Ops/s | 2.4727 Ops/s | $\color{#35bf28}+2.11\\%$ | | test_serialize_weights_filesystem | 0.2064s | 0.1522s | 6.5713 Ops/s | 6.7198 Ops/s | $\color{#d91a1a}-2.21\\%$ | | test_serialize_model_filesystem | 0.1650s | 0.1510s | 6.6211 Ops/s | 6.1462 Ops/s | $\textbf{\color{#35bf28}+7.73\\%}$ | | test_reshape_pytree | 94.6170μs | 39.8971μs | 25.0645 KOps/s | 25.7236 KOps/s | $\color{#d91a1a}-2.56\\%$ | | test_reshape_td | 0.1301ms | 50.3225μs | 19.8718 KOps/s | 19.9370 KOps/s | $\color{#d91a1a}-0.33\\%$ | | test_view_pytree | 94.7380μs | 39.5240μs | 25.3011 KOps/s | 26.1834 KOps/s | $\color{#d91a1a}-3.37\\%$ | | test_view_td | 0.1195ms | 55.9446μs | 17.8748 KOps/s | 18.1045 KOps/s | $\color{#d91a1a}-1.27\\%$ | | test_unbind_pytree | 73.7980μs | 36.5348μs | 27.3712 KOps/s | 27.7282 KOps/s | $\color{#d91a1a}-1.29\\%$ | | test_unbind_td | 0.3897ms | 48.4462μs | 20.6414 KOps/s | 21.0461 KOps/s | $\color{#d91a1a}-1.92\\%$ | | test_split_pytree | 0.1218ms | 40.1753μs | 24.8909 KOps/s | 26.2401 KOps/s | $\textbf{\color{#d91a1a}-5.14\\%}$ | | test_split_td | 0.2255ms | 62.9236μs | 15.8923 KOps/s | 16.7276 KOps/s | $\color{#d91a1a}-4.99\\%$ | | test_add_pytree | 0.1207ms | 46.6428μs | 21.4395 KOps/s | 22.1447 KOps/s | $\color{#d91a1a}-3.18\\%$ | | test_add_td | 0.1853ms | 86.7765μs | 11.5239 KOps/s | 11.7868 KOps/s | $\color{#d91a1a}-2.23\\%$ | | test_distributed | 0.2366ms | 0.1277ms | 7.8281 KOps/s | 7.6811 KOps/s | $\color{#35bf28}+1.91\\%$ | | test_tdmodule | 44.7540μs | 17.1490μs | 58.3124 KOps/s | 54.2998 KOps/s | $\textbf{\color{#35bf28}+7.39\\%}$ | | test_tdmodule_dispatch | 55.0930μs | 36.1987μs | 27.6253 KOps/s | 27.8633 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_tdseq | 38.8030μs | 18.9893μs | 52.6613 KOps/s | 50.8433 KOps/s | $\color{#35bf28}+3.58\\%$ | | test_tdseq_dispatch | 72.5960μs | 40.5201μs | 24.6791 KOps/s | 24.5161 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_instantiation_functorch | 1.8378ms | 1.5770ms | 634.0996 Ops/s | 640.4111 Ops/s | $\color{#d91a1a}-0.99\\%$ | | test_instantiation_td | 2.0541ms | 1.1589ms | 862.8998 Ops/s | 874.9198 Ops/s | $\color{#d91a1a}-1.37\\%$ | | test_exec_functorch | 0.3431ms | 0.1818ms | 5.5006 KOps/s | 5.4857 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_exec_functional_call | 0.3316ms | 0.1718ms | 5.8191 KOps/s | 5.7342 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_exec_td | 0.3210ms | 0.1693ms | 5.9081 KOps/s | 5.7784 KOps/s | $\color{#35bf28}+2.24\\%$ | | test_exec_td_decorator | 0.6832ms | 0.2591ms | 3.8602 KOps/s | 3.9159 KOps/s | $\color{#d91a1a}-1.42\\%$ | | test_vmap_mlp_speed[True-True] | 0.9250ms | 0.6099ms | 1.6395 KOps/s | 1.6362 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_vmap_mlp_speed[True-False] | 0.8669ms | 0.6010ms | 1.6640 KOps/s | 1.6504 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_vmap_mlp_speed[False-True] | 0.7971ms | 0.4939ms | 2.0247 KOps/s | 2.0109 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_vmap_mlp_speed[False-False] | 0.8319ms | 0.4976ms | 2.0096 KOps/s | 2.0040 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.2064ms | 0.7023ms | 1.4239 KOps/s | 1.4227 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 1.0794ms | 0.6989ms | 1.4309 KOps/s | 1.4261 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7683ms | 0.5753ms | 1.7382 KOps/s | 1.7362 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8927ms | 0.5788ms | 1.7278 KOps/s | 1.7370 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_to_module_speed[True] | 1.9359ms | 1.8250ms | 547.9411 Ops/s | 557.9302 Ops/s | $\color{#d91a1a}-1.79\\%$ | | test_to_module_speed[False] | 2.5905ms | 1.7933ms | 557.6423 Ops/s | 568.8645 Ops/s | $\color{#d91a1a}-1.97\\%$ | | test_tc_init | 91.8820μs | 44.9755μs | 22.2343 KOps/s | 21.0874 KOps/s | $\textbf{\color{#35bf28}+5.44\\%}$ | | test_tc_init_nested | 0.1729ms | 89.9637μs | 11.1156 KOps/s | 10.7069 KOps/s | $\color{#35bf28}+3.82\\%$ | | test_tc_first_layer_tensor | 31.7500μs | 9.4583μs | 105.7270 KOps/s | 106.0861 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_tc_first_layer_nontensor | 45.1450μs | 9.3746μs | 106.6710 KOps/s | 106.6608 KOps/s | $+0.01\\%$ | | test_tc_second_layer_tensor | 19.1760μs | 2.8760μs | 347.7061 KOps/s | 322.0044 KOps/s | $\textbf{\color{#35bf28}+7.98\\%}$ | | test_tc_second_layer_nontensor | 50.7550μs | 10.3295μs | 96.8102 KOps/s | 93.0959 KOps/s | $\color{#35bf28}+3.99\\%$ | | test_unbind | 96.8791ms | 13.7051ms | 72.9657 Ops/s | 73.9247 Ops/s | $\color{#d91a1a}-1.30\\%$ | | test_full_like | 15.2566ms | 11.2641ms | 88.7777 Ops/s | 142.1328 Ops/s | $\textbf{\color{#d91a1a}-37.54\\%}$ | | test_zeros_like | 11.3651ms | 7.1958ms | 138.9693 Ops/s | 150.2620 Ops/s | $\textbf{\color{#d91a1a}-7.52\\%}$ | | test_ones_like | 14.4524ms | 7.6742ms | 130.3073 Ops/s | 137.7774 Ops/s | $\textbf{\color{#d91a1a}-5.42\\%}$ | | test_clone | 14.5175ms | 9.2613ms | 107.9759 Ops/s | 105.1128 Ops/s | $\color{#35bf28}+2.72\\%$ | | test_squeeze | 74.5400μs | 14.0334μs | 71.2584 KOps/s | 69.7920 KOps/s | $\color{#35bf28}+2.10\\%$ | | test_unsqueeze | 0.1863ms | 96.3483μs | 10.3790 KOps/s | 10.2368 KOps/s | $\color{#35bf28}+1.39\\%$ | | test_split | 0.4215ms | 0.2038ms | 4.9075 KOps/s | 4.9427 KOps/s | $\color{#d91a1a}-0.71\\%$ | | test_permute | 0.3838ms | 0.2257ms | 4.4312 KOps/s | 4.4695 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_stack | 30.9242ms | 23.4712ms | 42.6054 Ops/s | 40.8231 Ops/s | $\color{#35bf28}+4.37\\%$ | | test_cat | 32.9757ms | 23.3577ms | 42.8124 Ops/s | 41.6321 Ops/s | $\color{#35bf28}+2.83\\%$ |
github-actions[bot]
commented
3 months ago
$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests
Total Benchmarks: 219. Improved: $\large\color{#35bf28}29$. Worsened: $\large\color{#d91a1a}11$.
Expand to view detailed results
| Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 0.5090ms | 16.6195μs | 60.1703 KOps/s | 56.5551 KOps/s | $\textbf{\color{#35bf28}+6.39\\%}$ | | test_plain_set_stack_nested | 35.4200μs | 16.4517μs | 60.7839 KOps/s | 56.0119 KOps/s | $\textbf{\color{#35bf28}+8.52\\%}$ | | test_plain_set_nested_inplace | 56.0010μs | 17.5223μs | 57.0702 KOps/s | 52.9905 KOps/s | $\textbf{\color{#35bf28}+7.70\\%}$ | | test_plain_set_stack_nested_inplace | 0.1106ms | 17.4168μs | 57.4158 KOps/s | 53.0402 KOps/s | $\textbf{\color{#35bf28}+8.25\\%}$ | | test_items | 21.2000μs | 4.7507μs | 210.4965 KOps/s | 213.8858 KOps/s | $\color{#d91a1a}-1.58\\%$ | | test_items_nested | 0.5384ms | 0.3937ms | 2.5400 KOps/s | 2.5762 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_items_nested_locked | 0.4167ms | 0.3932ms | 2.5435 KOps/s | 2.5485 KOps/s | $\color{#d91a1a}-0.20\\%$ | | test_items_nested_leaf | 0.1504ms | 85.7604μs | 11.6604 KOps/s | 11.6039 KOps/s | $\color{#35bf28}+0.49\\%$ | | test_items_stack_nested | 0.4976ms | 0.3933ms | 2.5424 KOps/s | 2.5186 KOps/s | $\color{#35bf28}+0.95\\%$ | | test_items_stack_nested_leaf | 0.1017ms | 85.7451μs | 11.6625 KOps/s | 11.6197 KOps/s | $\color{#35bf28}+0.37\\%$ | | test_items_stack_nested_locked | 0.4428ms | 0.3967ms | 2.5208 KOps/s | 2.4861 KOps/s | $\color{#35bf28}+1.40\\%$ | | test_keys | 0.1952ms | 4.3755μs | 228.5435 KOps/s | 229.1246 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_keys_nested | 0.1200ms | 65.7932μs | 15.1991 KOps/s | 15.1717 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_keys_nested_locked | 2.0356ms | 70.6319μs | 14.1579 KOps/s | 13.9934 KOps/s | $\color{#35bf28}+1.18\\%$ | | test_keys_nested_leaf | 81.2820μs | 56.3798μs | 17.7368 KOps/s | 17.7549 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_keys_stack_nested | 0.1848ms | 65.1245μs | 15.3552 KOps/s | 15.1783 KOps/s | $\color{#35bf28}+1.17\\%$ | | test_keys_stack_nested_leaf | 0.1582ms | 55.9691μs | 17.8670 KOps/s | 17.5410 KOps/s | $\color{#35bf28}+1.86\\%$ | | test_keys_stack_nested_locked | 0.2278ms | 70.8603μs | 14.1123 KOps/s | 13.7722 KOps/s | $\color{#35bf28}+2.47\\%$ | | test_values | 10.8270μs | 1.7594μs | 568.3635 KOps/s | 568.4983 KOps/s | $\color{#d91a1a}-0.02\\%$ | | test_values_nested | 0.2166ms | 33.6101μs | 29.7529 KOps/s | 29.9493 KOps/s | $\color{#d91a1a}-0.66\\%$ | | test_values_nested_locked | 0.1906ms | 35.8029μs | 27.9307 KOps/s | 28.2112 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_values_nested_leaf | 0.2065ms | 30.0882μs | 33.2356 KOps/s | 33.4866 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_values_stack_nested | 0.2043ms | 34.5320μs | 28.9586 KOps/s | 29.2019 KOps/s | $\color{#d91a1a}-0.83\\%$ | | test_values_stack_nested_leaf | 0.2052ms | 30.6739μs | 32.6010 KOps/s | 32.5686 KOps/s | $\color{#35bf28}+0.10\\%$ | | test_values_stack_nested_locked | 0.2312ms | 36.4584μs | 27.4285 KOps/s | 27.7621 KOps/s | $\color{#d91a1a}-1.20\\%$ | | test_membership | 10.3742μs | 0.5528μs | 1.8091 MOps/s | 1.8767 MOps/s | $\color{#d91a1a}-3.61\\%$ | | test_membership_nested | 69.3410μs | 2.0549μs | 486.6524 KOps/s | 482.3404 KOps/s | $\color{#35bf28}+0.89\\%$ | | test_membership_nested_leaf | 14.9600μs | 1.9653μs | 508.8179 KOps/s | 506.4709 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_membership_stacked_nested | 24.3900μs | 2.0177μs | 495.6194 KOps/s | 485.0108 KOps/s | $\color{#35bf28}+2.19\\%$ | | test_membership_stacked_nested_leaf | 19.8900μs | 2.0413μs | 489.8861 KOps/s | 490.9202 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_membership_nested_last | 19.6400μs | 2.9639μs | 337.3933 KOps/s | 338.2473 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_membership_nested_leaf_last | 24.3910μs | 2.9731μs | 336.3518 KOps/s | 336.0680 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_membership_stacked_nested_last | 28.9500μs | 9.1200μs | 109.6495 KOps/s | 330.6756 KOps/s | $\textbf{\color{#d91a1a}-66.84\\%}$ | | test_membership_stacked_nested_leaf_last | 28.6210μs | 9.1175μs | 109.6794 KOps/s | 331.6173 KOps/s | $\textbf{\color{#d91a1a}-66.93\\%}$ | | test_nested_getleaf | 23.4700μs | 7.9712μs | 125.4511 KOps/s | 124.4719 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_nested_get | 71.6210μs | 7.5153μs | 133.0613 KOps/s | 133.0599 KOps/s | $+0.00\\%$ | | test_stacked_getleaf | 37.6400μs | 7.9946μs | 125.0847 KOps/s | 124.2547 KOps/s | $\color{#35bf28}+0.67\\%$ | | test_stacked_get | 40.9310μs | 7.5377μs | 132.6664 KOps/s | 132.8505 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_nested_getitemleaf | 32.1110μs | 8.1646μs | 122.4794 KOps/s | 122.5207 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_nested_getitem | 35.5300μs | 7.6889μs | 130.0577 KOps/s | 129.6568 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_stacked_getitemleaf | 25.2000μs | 8.1119μs | 123.2755 KOps/s | 122.3839 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_stacked_getitem | 24.9100μs | 7.6642μs | 130.4775 KOps/s | 129.7410 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_lock_nested | 2.6036ms | 0.4827ms | 2.0715 KOps/s | 2.1146 KOps/s | $\color{#d91a1a}-2.04\\%$ | | test_lock_stack_nested | 0.5373ms | 0.4319ms | 2.3155 KOps/s | 2.2889 KOps/s | $\color{#35bf28}+1.16\\%$ | | test_unlock_nested | 0.8346ms | 0.3933ms | 2.5426 KOps/s | 2.5442 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_unlock_stack_nested | 0.3665ms | 0.3417ms | 2.9265 KOps/s | 2.8277 KOps/s | $\color{#35bf28}+3.49\\%$ | | test_flatten_speed | 0.1979ms | 0.1044ms | 9.5742 KOps/s | 9.5202 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_unflatten_speed | 0.4221ms | 0.2920ms | 3.4245 KOps/s | 3.4734 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_common_ops | 1.6774ms | 1.2998ms | 769.3560 Ops/s | 748.9320 Ops/s | $\color{#35bf28}+2.73\\%$ | | test_creation | 20.7100μs | 2.0015μs | 499.6217 KOps/s | 522.9320 KOps/s | $\color{#d91a1a}-4.46\\%$ | | test_creation_empty | 0.8107ms | 16.5613μs | 60.3818 KOps/s | 53.7273 KOps/s | $\textbf{\color{#35bf28}+12.39\\%}$ | | test_creation_nested_1 | 37.2100μs | 18.2748μs | 54.7201 KOps/s | 48.0875 KOps/s | $\textbf{\color{#35bf28}+13.79\\%}$ | | test_creation_nested_2 | 46.1310μs | 21.1642μs | 47.2496 KOps/s | 42.3135 KOps/s | $\textbf{\color{#35bf28}+11.67\\%}$ | | test_clone | 0.1855ms | 30.0577μs | 33.2693 KOps/s | 33.6692 KOps/s | $\color{#d91a1a}-1.19\\%$ | | test_getitem[int] | 1.0671ms | 16.7242μs | 59.7937 KOps/s | 58.9602 KOps/s | $\color{#35bf28}+1.41\\%$ | | test_getitem[slice_int] | 0.1574ms | 28.9169μs | 34.5818 KOps/s | 35.1288 KOps/s | $\color{#d91a1a}-1.56\\%$ | | test_getitem[range] | 0.2531ms | 0.1183ms | 8.4549 KOps/s | 8.6323 KOps/s | $\color{#d91a1a}-2.06\\%$ | | test_getitem[tuple] | 0.1591ms | 25.7139μs | 38.8895 KOps/s | 39.8493 KOps/s | $\color{#d91a1a}-2.41\\%$ | | test_getitem[list] | 0.2760ms | 0.1077ms | 9.2842 KOps/s | 9.3694 KOps/s | $\color{#d91a1a}-0.91\\%$ | | test_setitem_dim[int] | 0.1876ms | 51.7044μs | 19.3407 KOps/s | 18.0753 KOps/s | $\textbf{\color{#35bf28}+7.00\\%}$ | | test_setitem_dim[slice_int] | 0.2234ms | 77.0044μs | 12.9863 KOps/s | 12.5027 KOps/s | $\color{#35bf28}+3.87\\%$ | | test_setitem_dim[range] | 0.2812ms | 0.1409ms | 7.0978 KOps/s | 6.9361 KOps/s | $\color{#35bf28}+2.33\\%$ | | test_setitem_dim[tuple] | 0.2063ms | 70.5581μs | 14.1727 KOps/s | 13.8072 KOps/s | $\color{#35bf28}+2.65\\%$ | | test_setitem | 0.2052ms | 42.8966μs | 23.3119 KOps/s | 22.2709 KOps/s | $\color{#35bf28}+4.67\\%$ | | test_set | 0.2038ms | 42.1535μs | 23.7228 KOps/s | 21.9773 KOps/s | $\textbf{\color{#35bf28}+7.94\\%}$ | | test_set_shared | 0.3854ms | 53.9451μs | 18.5374 KOps/s | 18.7183 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_update | 0.2231ms | 50.7841μs | 19.6912 KOps/s | 17.5773 KOps/s | $\textbf{\color{#35bf28}+12.03\\%}$ | | test_update_nested | 0.2372ms | 57.8015μs | 17.3006 KOps/s | 15.5072 KOps/s | $\textbf{\color{#35bf28}+11.56\\%}$ | | test_update__nested | 0.2335ms | 60.1519μs | 16.6246 KOps/s | 15.5088 KOps/s | $\textbf{\color{#35bf28}+7.19\\%}$ | | test_set_nested | 0.2018ms | 44.6793μs | 22.3817 KOps/s | 20.0758 KOps/s | $\textbf{\color{#35bf28}+11.49\\%}$ | | test_set_nested_new | 0.2486ms | 48.4487μs | 20.6404 KOps/s | 18.7764 KOps/s | $\textbf{\color{#35bf28}+9.93\\%}$ | | test_select | 0.2270ms | 63.0433μs | 15.8621 KOps/s | 14.6141 KOps/s | $\textbf{\color{#35bf28}+8.54\\%}$ | | test_select_nested | 0.2514ms | 52.6195μs | 19.0043 KOps/s | 18.7721 KOps/s | $\color{#35bf28}+1.24\\%$ | | test_exclude_nested | 0.2709ms | 71.6363μs | 13.9594 KOps/s | 14.0182 KOps/s | $\color{#d91a1a}-0.42\\%$ | | test_empty[True] | 0.4839ms | 0.2964ms | 3.3742 KOps/s | 3.3812 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_empty[False] | 2.4201μs | 0.9099μs | 1.0990 MOps/s | 1.1015 MOps/s | $\color{#d91a1a}-0.23\\%$ | | test_to | 0.2247ms | 37.9842μs | 26.3267 KOps/s | 26.1547 KOps/s | $\color{#35bf28}+0.66\\%$ | | test_to_nonblocking | 0.1886ms | 23.3342μs | 42.8556 KOps/s | 41.6065 KOps/s | $\color{#35bf28}+3.00\\%$ | | test_unbind_speed | 0.4036ms | 0.2994ms | 3.3402 KOps/s | 3.2964 KOps/s | $\color{#35bf28}+1.33\\%$ | | test_unbind_speed_stack0 | 0.3479ms | 0.2964ms | 3.3739 KOps/s | 3.2924 KOps/s | $\color{#35bf28}+2.48\\%$ | | test_unbind_speed_stack1 | 97.1569ms | 0.8455ms | 1.1827 KOps/s | 1.2780 KOps/s | $\textbf{\color{#d91a1a}-7.46\\%}$ | | test_split | 93.7280ms | 2.2989ms | 434.9996 Ops/s | 428.4256 Ops/s | $\color{#35bf28}+1.53\\%$ | | test_chunk | 2.2575ms | 2.1164ms | 472.5112 Ops/s | 426.6657 Ops/s | $\textbf{\color{#35bf28}+10.75\\%}$ | | test_creation[device0] | 0.2539ms | 0.1052ms | 9.5094 KOps/s | 9.4852 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_creation_from_tensor | 0.2495ms | 0.1008ms | 9.9247 KOps/s | 9.7338 KOps/s | $\color{#35bf28}+1.96\\%$ | | test_add_one[memmap_tensor0] | 22.4210μs | 9.4786μs | 105.5013 KOps/s | 109.0492 KOps/s | $\color{#d91a1a}-3.25\\%$ | | test_contiguous[memmap_tensor0] | 26.8710μs | 2.1957μs | 455.4316 KOps/s | 456.8826 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_stack[memmap_tensor0] | 34.7500μs | 6.4592μs | 154.8185 KOps/s | 150.4926 KOps/s | $\color{#35bf28}+2.87\\%$ | | test_memmaptd_index | 1.1571ms | 0.4292ms | 2.3298 KOps/s | 2.3259 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_memmaptd_index_astensor | 0.7590ms | 0.5019ms | 1.9925 KOps/s | 2.0148 KOps/s | $\color{#d91a1a}-1.11\\%$ | | test_memmaptd_index_op | 1.4719ms | 1.0483ms | 953.8894 Ops/s | 932.1557 Ops/s | $\color{#35bf28}+2.33\\%$ | | test_serialize_model | 0.1001s | 97.1267ms | 10.2958 Ops/s | 9.9985 Ops/s | $\color{#35bf28}+2.97\\%$ | | test_serialize_model_pickle | 1.3504s | 1.2376s | 0.8080 Ops/s | 0.8177 Ops/s | $\color{#d91a1a}-1.19\\%$ | | test_serialize_weights | 96.8207ms | 92.3820ms | 10.8246 Ops/s | 9.0342 Ops/s | $\textbf{\color{#35bf28}+19.82\\%}$ | | test_serialize_weights_returnearly | 0.2036s | 79.0797ms | 12.6455 Ops/s | 14.4021 Ops/s | $\textbf{\color{#d91a1a}-12.20\\%}$ | | test_serialize_weights_pickle | 1.4122s | 1.2465s | 0.8023 Ops/s | 0.8022 Ops/s | $+0.00\\%$ | | test_reshape_pytree | 0.1630ms | 38.7517μs | 25.8053 KOps/s | 25.6538 KOps/s | $\color{#35bf28}+0.59\\%$ | | test_reshape_td | 0.1840ms | 44.1936μs | 22.6277 KOps/s | 22.8092 KOps/s | $\color{#d91a1a}-0.80\\%$ | | test_view_pytree | 0.2490ms | 38.5554μs | 25.9367 KOps/s | 26.6282 KOps/s | $\color{#d91a1a}-2.60\\%$ | | test_view_td | 0.1881ms | 51.4872μs | 19.4223 KOps/s | 19.7999 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_unbind_pytree | 0.2392ms | 37.0791μs | 26.9694 KOps/s | 26.8659 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_unbind_td | 0.4093ms | 44.9330μs | 22.2553 KOps/s | 19.0493 KOps/s | $\textbf{\color{#35bf28}+16.83\\%}$ | | test_split_pytree | 0.2469ms | 49.6447μs | 20.1432 KOps/s | 20.3918 KOps/s | $\color{#d91a1a}-1.22\\%$ | | test_split_td | 98.1091ms | 70.8286μs | 14.1186 KOps/s | 16.8354 KOps/s | $\textbf{\color{#d91a1a}-16.14\\%}$ | | test_add_pytree | 0.3008ms | 58.4332μs | 17.1136 KOps/s | 16.6272 KOps/s | $\color{#35bf28}+2.93\\%$ | | test_add_td | 0.2946ms | 89.4064μs | 11.1849 KOps/s | 9.7973 KOps/s | $\textbf{\color{#35bf28}+14.16\\%}$ | | test_compile_add_one_nested[tensordict-compile] | 0.4130ms | 0.2102ms | 4.7581 KOps/s | 4.7382 KOps/s | $\color{#35bf28}+0.42\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.3022ms | 0.1715ms | 5.8292 KOps/s | 5.6963 KOps/s | $\color{#35bf28}+2.33\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.2952ms | 0.1449ms | 6.9017 KOps/s | 6.8911 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.3584ms | 0.1925ms | 5.1945 KOps/s | 5.1628 KOps/s | $\color{#35bf28}+0.61\\%$ | | test_compile_copy_nested[tensordict-compile] | 0.1491ms | 21.3573μs | 46.8225 KOps/s | 46.4775 KOps/s | $\color{#35bf28}+0.74\\%$ | | test_compile_copy_nested[tensordict-eager] | 0.1348ms | 48.3221μs | 20.6944 KOps/s | 20.4275 KOps/s | $\color{#35bf28}+1.31\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1427ms | 71.5975μs | 13.9670 KOps/s | 13.8072 KOps/s | $\color{#35bf28}+1.16\\%$ | | test_compile_copy_nested[pytree-eager] | 84.6810μs | 59.1848μs | 16.8962 KOps/s | 16.7231 KOps/s | $\color{#35bf28}+1.04\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.4700ms | 0.3257ms | 3.0702 KOps/s | 3.0882 KOps/s | $\color{#d91a1a}-0.58\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.3679ms | 0.2249ms | 4.4456 KOps/s | 4.4966 KOps/s | $\color{#d91a1a}-1.13\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.2746ms | 0.1298ms | 7.7042 KOps/s | 7.6946 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.2481ms | 65.9970μs | 15.1522 KOps/s | 15.6580 KOps/s | $\color{#d91a1a}-3.23\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.4783ms | 0.3240ms | 3.0861 KOps/s | 3.1137 KOps/s | $\color{#d91a1a}-0.89\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.8854ms | 0.6982ms | 1.4323 KOps/s | 1.5838 KOps/s | $\textbf{\color{#d91a1a}-9.56\\%}$ | | test_compile_add_self_flat[tensordict-eager] | 0.4223ms | 0.2750ms | 3.6369 KOps/s | 3.6433 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.4892ms | 0.3286ms | 3.0431 KOps/s | 3.0737 KOps/s | $\color{#d91a1a}-1.00\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.2455ms | 78.7893μs | 12.6921 KOps/s | 12.8166 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.3117ms | 0.1349ms | 7.4129 KOps/s | 7.5587 KOps/s | $\color{#d91a1a}-1.93\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.8012ms | 0.5925ms | 1.6877 KOps/s | 1.8342 KOps/s | $\textbf{\color{#d91a1a}-7.98\\%}$ | | test_compile_add_self_flat[pytree-compile] | 0.5218ms | 0.3242ms | 3.0846 KOps/s | 3.0829 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_compile_copy_flat[tensordict-compile] | 0.2184ms | 18.8474μs | 53.0576 KOps/s | 53.2477 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_compile_copy_flat[tensordict-eager] | 0.2274ms | 32.0889μs | 31.1634 KOps/s | 31.3219 KOps/s | $\color{#d91a1a}-0.51\\%$ | | test_compile_copy_flat[pytree-compile] | 0.2724ms | 74.5280μs | 13.4178 KOps/s | 13.3429 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_compile_copy_flat[pytree-eager] | 0.1784ms | 60.5621μs | 16.5120 KOps/s | 16.4809 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.6890ms | 0.9581ms | 1.0438 KOps/s | 1.0302 KOps/s | $\color{#35bf28}+1.32\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.6678ms | 3.3929ms | 294.7356 Ops/s | 292.2913 Ops/s | $\color{#35bf28}+0.84\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.6272ms | 0.9352ms | 1.0693 KOps/s | 1.0615 KOps/s | $\color{#35bf28}+0.74\\%$ | | test_compile_assign_and_add[pytree-eager] | 3.6307ms | 3.3647ms | 297.2038 Ops/s | 297.3196 Ops/s | $\color{#d91a1a}-0.04\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.2623ms | 0.1110ms | 9.0109 KOps/s | 9.0501 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.3093ms | 63.4986μs | 15.7484 KOps/s | 15.7859 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.2487ms | 0.1025ms | 9.7519 KOps/s | 9.7170 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 0.1961ms | 45.1393μs | 22.1536 KOps/s | 21.7991 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_compile_indexing[tensor-pytree-compile] | 0.2835ms | 0.1031ms | 9.7031 KOps/s | 9.4049 KOps/s | $\color{#35bf28}+3.17\\%$ | | test_compile_indexing[tensor-pytree-eager] | 0.2489ms | 45.3281μs | 22.0614 KOps/s | 21.8703 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_compile_indexing[slice-tensordict-compile] | 0.2981ms | 0.1392ms | 7.1844 KOps/s | 7.1848 KOps/s | $-0.01\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.2881ms | 25.8694μs | 38.6557 KOps/s | 38.8792 KOps/s | $\color{#d91a1a}-0.57\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.2890ms | 0.1318ms | 7.5871 KOps/s | 7.6339 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 0.1083ms | 21.8464μs | 45.7741 KOps/s | 44.9772 KOps/s | $\color{#35bf28}+1.77\\%$ | | test_compile_indexing[slice-pytree-compile] | 0.2989ms | 0.1312ms | 7.6199 KOps/s | 7.4630 KOps/s | $\color{#35bf28}+2.10\\%$ | | test_compile_indexing[slice-pytree-eager] | 0.1134ms | 21.8939μs | 45.6748 KOps/s | 44.9439 KOps/s | $\color{#35bf28}+1.63\\%$ | | test_compile_indexing[int-tensordict-compile] | 0.2984ms | 0.1381ms | 7.2401 KOps/s | 7.1725 KOps/s | $\color{#35bf28}+0.94\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.5153ms | 25.7502μs | 38.8346 KOps/s | 38.4640 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.3011ms | 0.1315ms | 7.6040 KOps/s | 7.6362 KOps/s | $\color{#d91a1a}-0.42\\%$ | | test_compile_indexing[int-tensorclass-eager] | 0.2048ms | 21.9396μs | 45.5796 KOps/s | 44.5488 KOps/s | $\color{#35bf28}+2.31\\%$ | | test_compile_indexing[int-pytree-compile] | 0.3132ms | 0.1315ms | 7.6021 KOps/s | 7.6406 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_compile_indexing[int-pytree-eager] | 86.5120μs | 21.7859μs | 45.9012 KOps/s | 44.9884 KOps/s | $\color{#35bf28}+2.03\\%$ | | test_mod_add[eager] | 0.2391ms | 37.5338μs | 26.6426 KOps/s | 25.6598 KOps/s | $\color{#35bf28}+3.83\\%$ | | test_mod_add[compile] | 0.2633ms | 70.4331μs | 14.1979 KOps/s | 14.4562 KOps/s | $\color{#d91a1a}-1.79\\%$ | | test_mod_add[compile-overhead] | 0.2681ms | 0.1484ms | 6.7393 KOps/s | 6.7559 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_mod_wrap[eager] | 0.4617ms | 0.2816ms | 3.5508 KOps/s | 3.8369 KOps/s | $\textbf{\color{#d91a1a}-7.45\\%}$ | | test_mod_wrap[compile] | 0.5281ms | 0.3206ms | 3.1191 KOps/s | 3.1889 KOps/s | $\color{#d91a1a}-2.19\\%$ | | test_mod_wrap[compile-overhead] | 8.1451ms | 4.3167ms | 231.6581 Ops/s | 229.4009 Ops/s | $\color{#35bf28}+0.98\\%$ | | test_mod_wrap_and_backward[eager] | 1.8105ms | 1.4904ms | 670.9462 Ops/s | 670.4979 Ops/s | $\color{#35bf28}+0.07\\%$ | | test_mod_wrap_and_backward[compile] | 1.7778ms | 1.4998ms | 666.7633 Ops/s | 719.6270 Ops/s | $\textbf{\color{#d91a1a}-7.35\\%}$ | | test_mod_wrap_and_backward[compile-overhead] | 1.4929ms | 1.0191ms | 981.2845 Ops/s | 1.0144 KOps/s | $\color{#d91a1a}-3.26\\%$ | | test_seq_add[eager] | 0.2547ms | 0.1091ms | 9.1640 KOps/s | 8.8223 KOps/s | $\color{#35bf28}+3.87\\%$ | | test_seq_add[compile] | 0.2197ms | 85.3169μs | 11.7210 KOps/s | 11.6965 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_seq_add[compile-overhead] | 0.2666ms | 0.1230ms | 8.1286 KOps/s | 8.0563 KOps/s | $\color{#35bf28}+0.90\\%$ | | test_seq_wrap[eager] | 0.6643ms | 0.4513ms | 2.2161 KOps/s | 2.2653 KOps/s | $\color{#d91a1a}-2.17\\%$ | | test_seq_wrap[compile] | 1.5764ms | 0.3364ms | 2.9728 KOps/s | 2.8739 KOps/s | $\color{#35bf28}+3.44\\%$ | | test_seq_wrap[compile-overhead] | 0.3204s | 0.1523s | 6.5672 Ops/s | 6.5363 Ops/s | $\color{#35bf28}+0.47\\%$ | | test_func_call_runtime[False-eager] | 0.9169ms | 0.7619ms | 1.3124 KOps/s | 1.3011 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_func_call_runtime[False-compile] | 1.0538ms | 0.8532ms | 1.1721 KOps/s | 1.1876 KOps/s | $\color{#d91a1a}-1.31\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.5221ms | 0.3760ms | 2.6593 KOps/s | 2.6885 KOps/s | $\color{#d91a1a}-1.09\\%$ | | test_func_call_runtime[True-eager] | 1.1750ms | 1.0197ms | 980.7043 Ops/s | 977.4349 Ops/s | $\color{#35bf28}+0.33\\%$ | | test_func_call_runtime[True-compile] | 1.0405ms | 0.8797ms | 1.1367 KOps/s | 1.1438 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.5539ms | 0.4159ms | 2.4047 KOps/s | 2.4233 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_distributed | 0.3212ms | 68.5427μs | 14.5894 KOps/s | 13.2109 KOps/s | $\textbf{\color{#35bf28}+10.44\\%}$ | | test_tdmodule | 0.1477ms | 15.2184μs | 65.7101 KOps/s | 57.0015 KOps/s | $\textbf{\color{#35bf28}+15.28\\%}$ | | test_tdmodule_dispatch | 48.5110μs | 31.3029μs | 31.9459 KOps/s | 27.7744 KOps/s | $\textbf{\color{#35bf28}+15.02\\%}$ | | test_tdseq | 32.5210μs | 16.0448μs | 62.3256 KOps/s | 56.9918 KOps/s | $\textbf{\color{#35bf28}+9.36\\%}$ | | test_tdseq_dispatch | 53.8320μs | 33.8467μs | 29.5450 KOps/s | 27.1236 KOps/s | $\textbf{\color{#35bf28}+8.93\\%}$ | | test_instantiation_functorch | 2.3068ms | 2.0062ms | 498.4563 Ops/s | 505.3960 Ops/s | $\color{#d91a1a}-1.37\\%$ | | test_instantiation_td | 2.0655ms | 1.3037ms | 767.0336 Ops/s | 763.2129 Ops/s | $\color{#35bf28}+0.50\\%$ | | test_exec_functorch | 0.4247ms | 0.2279ms | 4.3871 KOps/s | 4.1780 KOps/s | $\textbf{\color{#35bf28}+5.00\\%}$ | | test_exec_functional_call | 0.3964ms | 0.2248ms | 4.4484 KOps/s | 4.1863 KOps/s | $\textbf{\color{#35bf28}+6.26\\%}$ | | test_exec_td | 0.3842ms | 0.2262ms | 4.4218 KOps/s | 4.1476 KOps/s | $\textbf{\color{#35bf28}+6.61\\%}$ | | test_exec_td_decorator | 0.8703ms | 0.3024ms | 3.3071 KOps/s | 3.2590 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_vmap_mlp_speed[True-True] | 0.8756ms | 0.6760ms | 1.4792 KOps/s | 1.4376 KOps/s | $\color{#35bf28}+2.90\\%$ | | test_vmap_mlp_speed[True-False] | 0.8744ms | 0.6813ms | 1.4678 KOps/s | 1.4504 KOps/s | $\color{#35bf28}+1.20\\%$ | | test_vmap_mlp_speed[False-True] | 0.8272ms | 0.6350ms | 1.5747 KOps/s | 1.6209 KOps/s | $\color{#d91a1a}-2.85\\%$ | | test_vmap_mlp_speed[False-False] | 0.8232ms | 0.6362ms | 1.5718 KOps/s | 1.5835 KOps/s | $\color{#d91a1a}-0.74\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.4714ms | 0.7671ms | 1.3036 KOps/s | 1.2940 KOps/s | $\color{#35bf28}+0.74\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.9988ms | 0.7841ms | 1.2753 KOps/s | 1.3074 KOps/s | $\color{#d91a1a}-2.45\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.8655ms | 0.6779ms | 1.4751 KOps/s | 1.4738 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8908ms | 0.6827ms | 1.4649 KOps/s | 1.4532 KOps/s | $\color{#35bf28}+0.80\\%$ | | test_vmap_transformer_speed[True-True] | 9.4473ms | 8.9293ms | 111.9906 Ops/s | 110.5081 Ops/s | $\color{#35bf28}+1.34\\%$ | | test_vmap_transformer_speed[True-False] | 9.1825ms | 8.9347ms | 111.9234 Ops/s | 111.6223 Ops/s | $\color{#35bf28}+0.27\\%$ | | test_vmap_transformer_speed[False-True] | 9.3670ms | 8.8584ms | 112.8872 Ops/s | 111.9201 Ops/s | $\color{#35bf28}+0.86\\%$ | | test_vmap_transformer_speed[False-False] | 9.3869ms | 8.8691ms | 112.7508 Ops/s | 111.7533 Ops/s | $\color{#35bf28}+0.89\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 22.3558ms | 21.4006ms | 46.7276 Ops/s | 46.8078 Ops/s | $\color{#d91a1a}-0.17\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 21.6197ms | 21.3439ms | 46.8518 Ops/s | 46.7803 Ops/s | $\color{#35bf28}+0.15\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 22.4020ms | 21.1918ms | 47.1881 Ops/s | 46.7166 Ops/s | $\color{#35bf28}+1.01\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 21.4249ms | 21.1274ms | 47.3320 Ops/s | 47.3618 Ops/s | $\color{#d91a1a}-0.06\\%$ | | test_to_module_speed[True] | 1.6079ms | 1.4781ms | 676.5656 Ops/s | 682.2409 Ops/s | $\color{#d91a1a}-0.83\\%$ | | test_to_module_speed[False] | 1.6016ms | 1.4792ms | 676.0542 Ops/s | 686.3647 Ops/s | $\color{#d91a1a}-1.50\\%$ | | test_tc_init | 0.1707ms | 35.9799μs | 27.7933 KOps/s | 25.7408 KOps/s | $\textbf{\color{#35bf28}+7.97\\%}$ | | test_tc_init_nested | 0.1049ms | 72.3905μs | 13.8140 KOps/s | 12.6096 KOps/s | $\textbf{\color{#35bf28}+9.55\\%}$ | | test_tc_first_layer_tensor | 21.0010μs | 3.9739μs | 251.6431 KOps/s | 252.1159 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_tc_first_layer_nontensor | 19.4900μs | 3.9848μs | 250.9541 KOps/s | 250.7549 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_tc_second_layer_tensor | 32.5980μs | 1.2851μs | 778.1625 KOps/s | 773.8295 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_tc_second_layer_nontensor | 20.9910μs | 4.5814μs | 218.2727 KOps/s | 218.9633 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_unbind | 0.3322s | 13.5363ms | 73.8753 Ops/s | 75.1806 Ops/s | $\color{#d91a1a}-1.74\\%$ | | test_full_like | 0.7619ms | 0.5782ms | 1.7296 KOps/s | 1.7317 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_zeros_like | 0.3437ms | 0.1982ms | 5.0446 KOps/s | 5.0446 KOps/s | $+0.00\\%$ | | test_ones_like | 0.3476ms | 0.1980ms | 5.0497 KOps/s | 5.0505 KOps/s | $\color{#d91a1a}-0.02\\%$ | | test_clone | 0.5672ms | 0.4146ms | 2.4122 KOps/s | 2.4050 KOps/s | $\color{#35bf28}+0.30\\%$ | | test_squeeze | 0.1668ms | 12.8108μs | 78.0589 KOps/s | 84.5515 KOps/s | $\textbf{\color{#d91a1a}-7.68\\%}$ | | test_unsqueeze | 0.2581ms | 85.6629μs | 11.6737 KOps/s | 11.9451 KOps/s | $\color{#d91a1a}-2.27\\%$ | | test_split | 0.4821ms | 0.1829ms | 5.4674 KOps/s | 5.7737 KOps/s | $\textbf{\color{#d91a1a}-5.30\\%}$ | | test_permute | 0.3236ms | 0.1994ms | 5.0142 KOps/s | 5.2550 KOps/s | $\color{#d91a1a}-4.58\\%$ | | test_stack | 1.3772ms | 0.8964ms | 1.1156 KOps/s | 1.0927 KOps/s | $\color{#35bf28}+2.10\\%$ | | test_cat | 1.3844ms | 1.2325ms | 811.3293 Ops/s | 811.2078 Ops/s | $\color{#35bf28}+0.01\\%$ |
cc @shagunsodhani