pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
832 stars 74 forks source link

[Feature] default for add, sub, pow, clamp_min, clamp_max, div and mul #921

Closed vmoens closed 3 months ago

vmoens commented 3 months ago

Stack from ghstack (oldest at bottom):

github-actions[bot] commented 3 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}22$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 53.5500μs | 22.0632μs | 45.3242 KOps/s | 45.2906 KOps/s | $\color{#35bf28}+0.07\\%$ | | test_plain_set_stack_nested | 60.2130μs | 22.3821μs | 44.6785 KOps/s | 44.5506 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_plain_set_nested_inplace | 79.4490μs | 24.2385μs | 41.2566 KOps/s | 41.2798 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_plain_set_stack_nested_inplace | 70.3620μs | 24.3140μs | 41.1286 KOps/s | 41.2423 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_items | 16.1300μs | 2.8085μs | 356.0616 KOps/s | 381.8184 KOps/s | $\textbf{\color{#d91a1a}-6.75\\%}$ | | test_items_nested | 0.7140ms | 0.3396ms | 2.9450 KOps/s | 2.9928 KOps/s | $\color{#d91a1a}-1.60\\%$ | | test_items_nested_locked | 0.5667ms | 0.3402ms | 2.9393 KOps/s | 2.9966 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_items_nested_leaf | 0.1620ms | 83.6689μs | 11.9519 KOps/s | 11.9825 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_items_stack_nested | 0.4848ms | 0.3426ms | 2.9189 KOps/s | 2.9698 KOps/s | $\color{#d91a1a}-1.71\\%$ | | test_items_stack_nested_leaf | 0.1647ms | 85.2717μs | 11.7272 KOps/s | 11.9241 KOps/s | $\color{#d91a1a}-1.65\\%$ | | test_items_stack_nested_locked | 0.7052ms | 0.3460ms | 2.8902 KOps/s | 2.9898 KOps/s | $\color{#d91a1a}-3.33\\%$ | | test_keys | 26.2990μs | 4.5792μs | 218.3801 KOps/s | 256.4534 KOps/s | $\textbf{\color{#d91a1a}-14.85\\%}$ | | test_keys_nested | 0.2779ms | 0.1463ms | 6.8352 KOps/s | 6.8997 KOps/s | $\color{#d91a1a}-0.93\\%$ | | test_keys_nested_locked | 0.6993ms | 0.1495ms | 6.6906 KOps/s | 6.6941 KOps/s | $\color{#d91a1a}-0.05\\%$ | | test_keys_nested_leaf | 0.4138ms | 0.1252ms | 7.9897 KOps/s | 7.9738 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_keys_stack_nested | 0.3042ms | 0.1450ms | 6.8962 KOps/s | 6.9754 KOps/s | $\color{#d91a1a}-1.14\\%$ | | test_keys_stack_nested_leaf | 0.2414ms | 0.1245ms | 8.0293 KOps/s | 8.1696 KOps/s | $\color{#d91a1a}-1.72\\%$ | | test_keys_stack_nested_locked | 0.2609ms | 0.1496ms | 6.6849 KOps/s | 6.5085 KOps/s | $\color{#35bf28}+2.71\\%$ | | test_values | 17.1848μs | 1.1494μs | 870.0075 KOps/s | 841.2045 KOps/s | $\color{#35bf28}+3.42\\%$ | | test_values_nested | 92.1520μs | 50.3088μs | 19.8773 KOps/s | 20.0949 KOps/s | $\color{#d91a1a}-1.08\\%$ | | test_values_nested_locked | 0.1011ms | 50.4086μs | 19.8379 KOps/s | 20.0089 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_values_nested_leaf | 0.1152ms | 45.5286μs | 21.9642 KOps/s | 22.2070 KOps/s | $\color{#d91a1a}-1.09\\%$ | | test_values_stack_nested | 0.1141ms | 51.3667μs | 19.4679 KOps/s | 19.6997 KOps/s | $\color{#d91a1a}-1.18\\%$ | | test_values_stack_nested_leaf | 98.3640μs | 45.2032μs | 22.1223 KOps/s | 22.2087 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_values_stack_nested_locked | 0.1034ms | 52.8536μs | 18.9202 KOps/s | 19.3444 KOps/s | $\color{#d91a1a}-2.19\\%$ | | test_membership | 17.4830μs | 0.9354μs | 1.0691 MOps/s | 1.3014 MOps/s | $\textbf{\color{#d91a1a}-17.85\\%}$ | | test_membership_nested | 46.7770μs | 2.5832μs | 387.1120 KOps/s | 386.2761 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_membership_nested_leaf | 27.6020μs | 2.6114μs | 382.9353 KOps/s | 382.8412 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_membership_stacked_nested | 30.0870μs | 2.5996μs | 384.6812 KOps/s | 386.1958 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_membership_stacked_nested_leaf | 19.3970μs | 2.6088μs | 383.3109 KOps/s | 382.8578 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_membership_nested_last | 41.6270μs | 3.9292μs | 254.5053 KOps/s | 254.0937 KOps/s | $\color{#35bf28}+0.16\\%$ | | test_membership_nested_leaf_last | 42.6430μs | 3.9154μs | 255.4042 KOps/s | 256.8543 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_membership_stacked_nested_last | 32.4800μs | 4.5181μs | 221.3302 KOps/s | 145.8002 KOps/s | $\textbf{\color{#35bf28}+51.80\\%}$ | | test_membership_stacked_nested_leaf_last | 20.0070μs | 4.5582μs | 219.3843 KOps/s | 146.8077 KOps/s | $\textbf{\color{#35bf28}+49.44\\%}$ | | test_nested_getleaf | 65.8430μs | 10.4210μs | 95.9605 KOps/s | 96.2879 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_nested_get | 52.7790μs | 9.8059μs | 101.9797 KOps/s | 102.5660 KOps/s | $\color{#d91a1a}-0.57\\%$ | | test_stacked_getleaf | 24.7760μs | 10.2973μs | 97.1130 KOps/s | 98.2355 KOps/s | $\color{#d91a1a}-1.14\\%$ | | test_stacked_get | 35.1350μs | 9.7761μs | 102.2907 KOps/s | 102.7901 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_nested_getitemleaf | 32.7210μs | 11.0143μs | 90.7911 KOps/s | 92.9255 KOps/s | $\color{#d91a1a}-2.30\\%$ | | test_nested_getitem | 45.8050μs | 10.1585μs | 98.4393 KOps/s | 101.0219 KOps/s | $\color{#d91a1a}-2.56\\%$ | | test_stacked_getitemleaf | 40.1750μs | 10.7814μs | 92.7523 KOps/s | 93.6839 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_stacked_getitem | 53.2900μs | 10.0305μs | 99.6959 KOps/s | 100.0361 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_lock_nested | 85.7392ms | 0.5804ms | 1.7229 KOps/s | 2.0109 KOps/s | $\textbf{\color{#d91a1a}-14.32\\%}$ | | test_lock_stack_nested | 0.7077ms | 0.4616ms | 2.1663 KOps/s | 2.1926 KOps/s | $\color{#d91a1a}-1.20\\%$ | | test_unlock_nested | 87.1936ms | 0.5062ms | 1.9754 KOps/s | 2.4065 KOps/s | $\textbf{\color{#d91a1a}-17.91\\%}$ | | test_unlock_stack_nested | 0.5883ms | 0.3759ms | 2.6604 KOps/s | 2.6778 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_flatten_speed | 0.5561ms | 0.1041ms | 9.6016 KOps/s | 9.8245 KOps/s | $\color{#d91a1a}-2.27\\%$ | | test_unflatten_speed | 0.9714ms | 0.4332ms | 2.3086 KOps/s | 2.3365 KOps/s | $\color{#d91a1a}-1.19\\%$ | | test_common_ops | 4.6293ms | 1.1058ms | 904.3169 Ops/s | 899.8118 Ops/s | $\color{#35bf28}+0.50\\%$ | | test_creation | 41.5780μs | 2.0385μs | 490.5609 KOps/s | 497.5989 KOps/s | $\color{#d91a1a}-1.41\\%$ | | test_creation_empty | 45.8460μs | 18.9809μs | 52.6847 KOps/s | 50.5077 KOps/s | $\color{#35bf28}+4.31\\%$ | | test_creation_nested_1 | 73.8080μs | 22.5457μs | 44.3544 KOps/s | 44.6096 KOps/s | $\color{#d91a1a}-0.57\\%$ | | test_creation_nested_2 | 73.2660μs | 25.8408μs | 38.6985 KOps/s | 38.8912 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_clone | 66.0440μs | 16.4790μs | 60.6833 KOps/s | 59.6483 KOps/s | $\color{#35bf28}+1.74\\%$ | | test_getitem[int] | 1.2335ms | 16.3808μs | 61.0470 KOps/s | 59.4883 KOps/s | $\color{#35bf28}+2.62\\%$ | | test_getitem[slice_int] | 0.1347ms | 31.2450μs | 32.0052 KOps/s | 31.8743 KOps/s | $\color{#35bf28}+0.41\\%$ | | test_getitem[range] | 0.1909ms | 56.1761μs | 17.8012 KOps/s | 17.9093 KOps/s | $\color{#d91a1a}-0.60\\%$ | | test_getitem[tuple] | 0.1171ms | 25.0878μs | 39.8600 KOps/s | 39.6674 KOps/s | $\color{#35bf28}+0.49\\%$ | | test_getitem[list] | 0.2127ms | 50.7928μs | 19.6878 KOps/s | 19.6222 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_setitem_dim[int] | 84.3080μs | 42.1788μs | 23.7086 KOps/s | 23.3900 KOps/s | $\color{#35bf28}+1.36\\%$ | | test_setitem_dim[slice_int] | 0.1447ms | 72.2802μs | 13.8350 KOps/s | 13.6414 KOps/s | $\color{#35bf28}+1.42\\%$ | | test_setitem_dim[range] | 0.1498ms | 92.7737μs | 10.7789 KOps/s | 10.8526 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_setitem_dim[tuple] | 0.1090ms | 59.3511μs | 16.8489 KOps/s | 16.4625 KOps/s | $\color{#35bf28}+2.35\\%$ | | test_setitem | 95.6690μs | 29.5899μs | 33.7953 KOps/s | 32.8926 KOps/s | $\color{#35bf28}+2.74\\%$ | | test_set | 0.1023ms | 28.5776μs | 34.9924 KOps/s | 33.3710 KOps/s | $\color{#35bf28}+4.86\\%$ | | test_set_shared | 3.0909ms | 0.2176ms | 4.5956 KOps/s | 4.6145 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_update | 0.1937ms | 36.7938μs | 27.1785 KOps/s | 26.2462 KOps/s | $\color{#35bf28}+3.55\\%$ | | test_update_nested | 0.1966ms | 46.3176μs | 21.5901 KOps/s | 20.8711 KOps/s | $\color{#35bf28}+3.44\\%$ | | test_update__nested | 92.8140μs | 33.3202μs | 30.0118 KOps/s | 28.9989 KOps/s | $\color{#35bf28}+3.49\\%$ | | test_set_nested | 0.1468ms | 31.3113μs | 31.9374 KOps/s | 30.6515 KOps/s | $\color{#35bf28}+4.20\\%$ | | test_set_nested_new | 0.1964ms | 36.2237μs | 27.6062 KOps/s | 26.9820 KOps/s | $\color{#35bf28}+2.31\\%$ | | test_select | 0.1896ms | 52.8783μs | 18.9114 KOps/s | 18.8030 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_select_nested | 0.1266ms | 58.3701μs | 17.1321 KOps/s | 17.1475 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_exclude_nested | 0.1632ms | 76.6998μs | 13.0378 KOps/s | 13.0309 KOps/s | $\color{#35bf28}+0.05\\%$ | | test_empty[True] | 0.3622ms | 0.3186ms | 3.1389 KOps/s | 3.1032 KOps/s | $\color{#35bf28}+1.15\\%$ | | test_empty[False] | 10.7675μs | 1.1553μs | 865.5967 KOps/s | 860.2120 KOps/s | $\color{#35bf28}+0.63\\%$ | | test_unbind_speed | 0.5124ms | 0.3065ms | 3.2622 KOps/s | 3.1604 KOps/s | $\color{#35bf28}+3.22\\%$ | | test_unbind_speed_stack0 | 0.6113ms | 0.2991ms | 3.3435 KOps/s | 3.3405 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_unbind_speed_stack1 | 91.4239ms | 0.7824ms | 1.2782 KOps/s | 1.5108 KOps/s | $\textbf{\color{#d91a1a}-15.39\\%}$ | | test_split | 89.0250ms | 2.1346ms | 468.4742 Ops/s | 465.0925 Ops/s | $\color{#35bf28}+0.73\\%$ | | test_chunk | 91.4661ms | 2.1359ms | 468.1858 Ops/s | 462.7433 Ops/s | $\color{#35bf28}+1.18\\%$ | | test_creation[device0] | 0.2844ms | 0.1176ms | 8.5056 KOps/s | 8.3911 KOps/s | $\color{#35bf28}+1.37\\%$ | | test_creation_from_tensor | 4.1504ms | 0.1198ms | 8.3482 KOps/s | 8.2447 KOps/s | $\color{#35bf28}+1.26\\%$ | | test_add_one[memmap_tensor0] | 0.2866ms | 7.6710μs | 130.3611 KOps/s | 132.1619 KOps/s | $\color{#d91a1a}-1.36\\%$ | | test_contiguous[memmap_tensor0] | 32.1700μs | 1.9633μs | 509.3441 KOps/s | 499.1077 KOps/s | $\color{#35bf28}+2.05\\%$ | | test_stack[memmap_tensor0] | 40.9870μs | 5.5887μs | 178.9314 KOps/s | 173.2577 KOps/s | $\color{#35bf28}+3.27\\%$ | | test_memmaptd_index | 1.1556ms | 0.4058ms | 2.4643 KOps/s | 2.4779 KOps/s | $\color{#d91a1a}-0.55\\%$ | | test_memmaptd_index_astensor | 0.7505ms | 0.4832ms | 2.0696 KOps/s | 2.0677 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_memmaptd_index_op | 1.7174ms | 1.0587ms | 944.5388 Ops/s | 926.5642 Ops/s | $\color{#35bf28}+1.94\\%$ | | test_serialize_model | 0.1355s | 0.1203s | 8.3120 Ops/s | 7.6292 Ops/s | $\textbf{\color{#35bf28}+8.95\\%}$ | | test_serialize_model_pickle | 0.4775s | 0.3932s | 2.5434 Ops/s | 2.5094 Ops/s | $\color{#35bf28}+1.35\\%$ | | test_serialize_weights | 0.1194s | 0.1163s | 8.5959 Ops/s | 8.3804 Ops/s | $\color{#35bf28}+2.57\\%$ | | test_serialize_weights_returnearly | 0.1783s | 0.1597s | 6.2615 Ops/s | 6.3123 Ops/s | $\color{#d91a1a}-0.81\\%$ | | test_serialize_weights_pickle | 0.5086s | 0.4318s | 2.3160 Ops/s | 2.4759 Ops/s | $\textbf{\color{#d91a1a}-6.46\\%}$ | | test_serialize_weights_filesystem | 0.1533s | 0.1406s | 7.1147 Ops/s | 6.8676 Ops/s | $\color{#35bf28}+3.60\\%$ | | test_serialize_model_filesystem | 0.1539s | 0.1471s | 6.7998 Ops/s | 6.6851 Ops/s | $\color{#35bf28}+1.72\\%$ | | test_reshape_pytree | 81.2820μs | 39.4908μs | 25.3224 KOps/s | 25.2099 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_reshape_td | 0.1127ms | 48.2731μs | 20.7155 KOps/s | 21.5373 KOps/s | $\color{#d91a1a}-3.82\\%$ | | test_view_pytree | 0.1169ms | 39.4281μs | 25.3626 KOps/s | 24.8947 KOps/s | $\color{#35bf28}+1.88\\%$ | | test_view_td | 0.1262ms | 53.7053μs | 18.6201 KOps/s | 18.5278 KOps/s | $\color{#35bf28}+0.50\\%$ | | test_unbind_pytree | 79.0870μs | 37.0855μs | 26.9647 KOps/s | 27.0870 KOps/s | $\color{#d91a1a}-0.45\\%$ | | test_unbind_td | 0.3643ms | 45.4431μs | 22.0055 KOps/s | 21.0684 KOps/s | $\color{#35bf28}+4.45\\%$ | | test_split_pytree | 84.7890μs | 39.1310μs | 25.5552 KOps/s | 25.2078 KOps/s | $\color{#35bf28}+1.38\\%$ | | test_split_td | 0.5017ms | 58.3226μs | 17.1460 KOps/s | 17.2962 KOps/s | $\color{#d91a1a}-0.87\\%$ | | test_add_pytree | 0.1095ms | 47.0137μs | 21.2704 KOps/s | 21.4040 KOps/s | $\color{#d91a1a}-0.62\\%$ | | test_add_td | 0.2145ms | 85.5111μs | 11.6944 KOps/s | 11.5768 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_compile_add_one_nested[tensordict-compile] | 0.1075ms | 53.1913μs | 18.8001 KOps/s | 18.8762 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.3939ms | 0.1902ms | 5.2566 KOps/s | 4.5214 KOps/s | $\textbf{\color{#35bf28}+16.26\\%}$ | | test_compile_add_one_nested[pytree-compile] | 0.1150ms | 53.8375μs | 18.5744 KOps/s | 18.5539 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.3375ms | 0.1444ms | 6.9236 KOps/s | 6.8323 KOps/s | $\color{#35bf28}+1.34\\%$ | | test_compile_copy_nested[tensordict-compile] | 83.4460μs | 20.3269μs | 49.1959 KOps/s | 49.6793 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_compile_copy_nested[tensordict-eager] | 0.1442ms | 64.9540μs | 15.3955 KOps/s | 15.4685 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_compile_copy_nested[pytree-compile] | 0.1851ms | 80.0095μs | 12.4985 KOps/s | 12.7853 KOps/s | $\color{#d91a1a}-2.24\\%$ | | test_compile_copy_nested[pytree-eager] | 0.1356ms | 71.7163μs | 13.9438 KOps/s | 14.0090 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.2543ms | 0.1702ms | 5.8757 KOps/s | 5.7589 KOps/s | $\color{#35bf28}+2.03\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.4076ms | 0.1918ms | 5.2138 KOps/s | 5.2314 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 92.3920μs | 37.5765μs | 26.6124 KOps/s | 25.6550 KOps/s | $\color{#35bf28}+3.73\\%$ | | test_compile_add_one_flat[tensorclass-eager] | 0.5382ms | 68.7411μs | 14.5473 KOps/s | 14.2045 KOps/s | $\color{#35bf28}+2.41\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.3606ms | 0.1703ms | 5.8725 KOps/s | 5.7200 KOps/s | $\color{#35bf28}+2.67\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.4276ms | 0.2886ms | 3.4651 KOps/s | 3.3739 KOps/s | $\color{#35bf28}+2.70\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.4465ms | 0.2039ms | 4.9045 KOps/s | 4.9162 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.4518ms | 0.1710ms | 5.8475 KOps/s | 5.7186 KOps/s | $\color{#35bf28}+2.25\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.1400ms | 61.2041μs | 16.3388 KOps/s | 15.9952 KOps/s | $\color{#35bf28}+2.15\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.1174ms | 38.1252μs | 26.2294 KOps/s | 24.6461 KOps/s | $\textbf{\color{#35bf28}+6.42\\%}$ | | test_compile_add_self_flat[pytree-eager] | 0.3975ms | 0.2366ms | 4.2267 KOps/s | 4.1215 KOps/s | $\color{#35bf28}+2.55\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.2909ms | 0.1716ms | 5.8273 KOps/s | 5.7718 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_compile_copy_flat[tensordict-compile] | 0.2039ms | 0.1055ms | 9.4823 KOps/s | 9.3155 KOps/s | $\color{#35bf28}+1.79\\%$ | | test_compile_copy_flat[tensordict-eager] | 0.1347ms | 56.1424μs | 17.8119 KOps/s | 17.7570 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1911ms | 79.5758μs | 12.5666 KOps/s | 12.5493 KOps/s | $\color{#35bf28}+0.14\\%$ | | test_compile_copy_flat[pytree-eager] | 0.1343ms | 70.8373μs | 14.1169 KOps/s | 13.9437 KOps/s | $\color{#35bf28}+1.24\\%$ | | test_compile_assign_and_add[tensordict-compile] | 0.2859ms | 0.1863ms | 5.3667 KOps/s | 5.3026 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_compile_assign_and_add[tensordict-eager] | 2.5846ms | 1.6534ms | 604.8324 Ops/s | 596.9397 Ops/s | $\color{#35bf28}+1.32\\%$ | | test_compile_assign_and_add[pytree-compile] | 0.2830ms | 0.1827ms | 5.4738 KOps/s | 5.3153 KOps/s | $\color{#35bf28}+2.98\\%$ | | test_compile_assign_and_add[pytree-eager] | 1.3158ms | 1.0786ms | 927.1040 Ops/s | 919.2338 Ops/s | $\color{#35bf28}+0.86\\%$ | | test_compile_assign_and_add_stack[compile] | 0.4979ms | 0.3963ms | 2.5235 KOps/s | 2.3588 KOps/s | $\textbf{\color{#35bf28}+6.98\\%}$ | | test_compile_assign_and_add_stack[eager] | 3.9579ms | 3.8149ms | 262.1296 Ops/s | 255.2078 Ops/s | $\color{#35bf28}+2.71\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 77.7060μs | 31.8489μs | 31.3982 KOps/s | 31.3095 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.7629ms | 47.6239μs | 20.9979 KOps/s | 20.6637 KOps/s | $\color{#35bf28}+1.62\\%$ | | test_compile_indexing[tensor-tensorclass-compile] | 76.3830μs | 27.8010μs | 35.9699 KOps/s | 34.8544 KOps/s | $\color{#35bf28}+3.20\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 75.4310μs | 30.5043μs | 32.7822 KOps/s | 32.7864 KOps/s | $\color{#d91a1a}-0.01\\%$ | | test_compile_indexing[tensor-pytree-compile] | 0.1168ms | 27.3147μs | 36.6104 KOps/s | 35.4346 KOps/s | $\color{#35bf28}+3.32\\%$ | | test_compile_indexing[tensor-pytree-eager] | 76.9940μs | 30.0595μs | 33.2674 KOps/s | 32.8831 KOps/s | $\color{#35bf28}+1.17\\%$ | | test_compile_indexing[slice-tensordict-compile] | 0.1449ms | 70.5686μs | 14.1706 KOps/s | 13.8352 KOps/s | $\color{#35bf28}+2.42\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.3466ms | 27.6085μs | 36.2207 KOps/s | 35.5791 KOps/s | $\color{#35bf28}+1.80\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.1871ms | 66.9261μs | 14.9419 KOps/s | 14.7517 KOps/s | $\color{#35bf28}+1.29\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 79.0680μs | 23.8997μs | 41.8416 KOps/s | 39.6272 KOps/s | $\textbf{\color{#35bf28}+5.59\\%}$ | | test_compile_indexing[slice-pytree-compile] | 0.1520ms | 66.4576μs | 15.0472 KOps/s | 14.8225 KOps/s | $\color{#35bf28}+1.52\\%$ | | test_compile_indexing[slice-pytree-eager] | 73.0570μs | 24.0981μs | 41.4971 KOps/s | 40.1676 KOps/s | $\color{#35bf28}+3.31\\%$ | | test_compile_indexing[int-tensordict-compile] | 0.1600ms | 70.8684μs | 14.1107 KOps/s | 13.9266 KOps/s | $\color{#35bf28}+1.32\\%$ | | test_compile_indexing[int-tensordict-eager] | 1.1052ms | 27.8834μs | 35.8636 KOps/s | 35.8424 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.1311ms | 66.3355μs | 15.0749 KOps/s | 14.9426 KOps/s | $\color{#35bf28}+0.89\\%$ | | test_compile_indexing[int-tensorclass-eager] | 71.6040μs | 24.1404μs | 41.4244 KOps/s | 40.9083 KOps/s | $\color{#35bf28}+1.26\\%$ | | test_compile_indexing[int-pytree-compile] | 0.1290ms | 65.9184μs | 15.1703 KOps/s | 14.9262 KOps/s | $\color{#35bf28}+1.64\\%$ | | test_compile_indexing[int-pytree-eager] | 69.1400μs | 23.9100μs | 41.8235 KOps/s | 41.1272 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_mod_add[eager] | 99.0350μs | 25.0329μs | 39.9475 KOps/s | 38.7008 KOps/s | $\color{#35bf28}+3.22\\%$ | | test_mod_add[compile] | 95.6490μs | 35.0912μs | 28.4972 KOps/s | 26.9549 KOps/s | $\textbf{\color{#35bf28}+5.72\\%}$ | | test_mod_add[compile-overhead] | 75.7320μs | 35.5615μs | 28.1203 KOps/s | 27.5240 KOps/s | $\color{#35bf28}+2.17\\%$ | | test_mod_wrap[eager] | 0.4009ms | 0.2070ms | 4.8298 KOps/s | 4.8244 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_mod_wrap[compile] | 1.5497ms | 0.2246ms | 4.4515 KOps/s | 4.4141 KOps/s | $\color{#35bf28}+0.85\\%$ | | test_mod_wrap[compile-overhead] | 0.3447ms | 0.2201ms | 4.5431 KOps/s | 4.5177 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_mod_wrap_and_backward[eager] | 13.3145ms | 11.7731ms | 84.9395 Ops/s | 90.7533 Ops/s | $\textbf{\color{#d91a1a}-6.41\\%}$ | | test_mod_wrap_and_backward[compile] | 16.9860ms | 11.6758ms | 85.6471 Ops/s | 86.2809 Ops/s | $\color{#d91a1a}-0.73\\%$ | | test_mod_wrap_and_backward[compile-overhead] | 14.9347ms | 11.4604ms | 87.2569 Ops/s | 83.6167 Ops/s | $\color{#35bf28}+4.35\\%$ | | test_seq_add[eager] | 0.1727ms | 86.9443μs | 11.5016 KOps/s | 10.9721 KOps/s | $\color{#35bf28}+4.83\\%$ | | test_seq_add[compile] | 0.1477ms | 58.5400μs | 17.0823 KOps/s | 15.8921 KOps/s | $\textbf{\color{#35bf28}+7.49\\%}$ | | test_seq_add[compile-overhead] | 0.1451ms | 58.0856μs | 17.2160 KOps/s | 16.5428 KOps/s | $\color{#35bf28}+4.07\\%$ | | test_seq_wrap[eager] | 0.6011ms | 0.3710ms | 2.6954 KOps/s | 2.5905 KOps/s | $\color{#35bf28}+4.05\\%$ | | test_seq_wrap[compile] | 0.6217ms | 0.2565ms | 3.8979 KOps/s | 3.7942 KOps/s | $\color{#35bf28}+2.74\\%$ | | test_seq_wrap[compile-overhead] | 0.3563ms | 0.2564ms | 3.8995 KOps/s | 3.7795 KOps/s | $\color{#35bf28}+3.18\\%$ | | test_func_call_runtime[False-eager] | 0.6594ms | 0.5134ms | 1.9478 KOps/s | 1.8757 KOps/s | $\color{#35bf28}+3.85\\%$ | | test_func_call_runtime[False-compile] | 0.9043ms | 0.4874ms | 2.0516 KOps/s | 2.0059 KOps/s | $\color{#35bf28}+2.28\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.6630ms | 0.4851ms | 2.0612 KOps/s | 2.0168 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_func_call_runtime[True-eager] | 1.5644ms | 0.7367ms | 1.3575 KOps/s | 1.3337 KOps/s | $\color{#35bf28}+1.79\\%$ | | test_func_call_runtime[True-compile] | 0.7176ms | 0.4983ms | 2.0070 KOps/s | 1.9598 KOps/s | $\color{#35bf28}+2.41\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.6443ms | 0.4989ms | 2.0045 KOps/s | 1.9534 KOps/s | $\color{#35bf28}+2.62\\%$ | | test_func_call_cm_runtime[False-eager] | 0.8512ms | 0.5057ms | 1.9773 KOps/s | 1.8818 KOps/s | $\textbf{\color{#35bf28}+5.07\\%}$ | | test_func_call_cm_runtime[False-compile] | 0.8223ms | 0.4891ms | 2.0445 KOps/s | 2.0171 KOps/s | $\color{#35bf28}+1.36\\%$ | | test_func_call_cm_runtime[False-compile-overhead] | 0.7559ms | 0.4842ms | 2.0653 KOps/s | 2.0160 KOps/s | $\color{#35bf28}+2.44\\%$ | | test_func_call_cm_runtime[True-eager] | 1.0262ms | 0.8599ms | 1.1629 KOps/s | 1.1307 KOps/s | $\color{#35bf28}+2.85\\%$ | | test_func_call_cm_runtime[True-compile] | 0.9530ms | 0.8023ms | 1.2464 KOps/s | 1.1893 KOps/s | $\color{#35bf28}+4.80\\%$ | | test_func_call_cm_runtime[True-compile-overhead] | 1.3600ms | 0.8056ms | 1.2414 KOps/s | 1.1846 KOps/s | $\color{#35bf28}+4.80\\%$ | | test_distributed | 0.2389ms | 0.1299ms | 7.6970 KOps/s | 7.5031 KOps/s | $\color{#35bf28}+2.59\\%$ | | test_tdmodule | 37.2100μs | 17.0995μs | 58.4813 KOps/s | 57.1184 KOps/s | $\color{#35bf28}+2.39\\%$ | | test_tdmodule_dispatch | 61.1550μs | 36.2661μs | 27.5740 KOps/s | 26.3916 KOps/s | $\color{#35bf28}+4.48\\%$ | | test_tdseq | 35.3660μs | 19.7350μs | 50.6713 KOps/s | 51.1539 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_tdseq_dispatch | 59.6210μs | 40.7149μs | 24.5610 KOps/s | 23.7816 KOps/s | $\color{#35bf28}+3.28\\%$ | | test_instantiation_functorch | 2.2729ms | 1.6590ms | 602.7762 Ops/s | 611.4203 Ops/s | $\color{#d91a1a}-1.41\\%$ | | test_instantiation_td | 1.9548ms | 1.1783ms | 848.6993 Ops/s | 841.6533 Ops/s | $\color{#35bf28}+0.84\\%$ | | test_exec_functorch | 0.4323ms | 0.1809ms | 5.5285 KOps/s | 5.5364 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_exec_functional_call | 0.3154ms | 0.1669ms | 5.9921 KOps/s | 5.9488 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_exec_td | 0.3152ms | 0.1709ms | 5.8508 KOps/s | 5.5392 KOps/s | $\textbf{\color{#35bf28}+5.63\\%}$ | | test_exec_td_decorator | 0.6445ms | 0.2223ms | 4.4979 KOps/s | 4.4954 KOps/s | $\color{#35bf28}+0.06\\%$ | | test_vmap_mlp_speed[True-True] | 0.9017ms | 0.5902ms | 1.6943 KOps/s | 1.6741 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_vmap_mlp_speed[True-False] | 0.8500ms | 0.5821ms | 1.7178 KOps/s | 1.6897 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_vmap_mlp_speed[False-True] | 0.7336ms | 0.4839ms | 2.0667 KOps/s | 2.0358 KOps/s | $\color{#35bf28}+1.52\\%$ | | test_vmap_mlp_speed[False-False] | 0.7112ms | 0.4833ms | 2.0690 KOps/s | 2.0360 KOps/s | $\color{#35bf28}+1.62\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.5303ms | 0.6466ms | 1.5466 KOps/s | 1.4901 KOps/s | $\color{#35bf28}+3.79\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 1.1214ms | 0.6464ms | 1.5471 KOps/s | 1.5305 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7289ms | 0.5286ms | 1.8918 KOps/s | 1.8777 KOps/s | $\color{#35bf28}+0.75\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.8298ms | 0.5296ms | 1.8881 KOps/s | 1.8671 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_to_module_speed[True] | 2.1655ms | 1.3390ms | 746.8285 Ops/s | 754.1209 Ops/s | $\color{#d91a1a}-0.97\\%$ | | test_to_module_speed[False] | 1.4486ms | 1.2864ms | 777.3339 Ops/s | 782.2832 Ops/s | $\color{#d91a1a}-0.63\\%$ | | test_tc_init | 83.2160μs | 44.1567μs | 22.6466 KOps/s | 21.4522 KOps/s | $\textbf{\color{#35bf28}+5.57\\%}$ | | test_tc_init_nested | 0.1576ms | 88.6967μs | 11.2744 KOps/s | 10.7316 KOps/s | $\textbf{\color{#35bf28}+5.06\\%}$ | | test_tc_first_layer_tensor | 18.5450μs | 1.4817μs | 674.9011 KOps/s | 613.7225 KOps/s | $\textbf{\color{#35bf28}+9.97\\%}$ | | test_tc_first_layer_nontensor | 37.6210μs | 4.3341μs | 230.7261 KOps/s | 225.0799 KOps/s | $\color{#35bf28}+2.51\\%$ | | test_tc_second_layer_tensor | 34.7140μs | 2.7170μs | 368.0556 KOps/s | 347.8153 KOps/s | $\textbf{\color{#35bf28}+5.82\\%}$ | | test_tc_second_layer_nontensor | 36.6190μs | 5.5987μs | 178.6124 KOps/s | 173.5279 KOps/s | $\color{#35bf28}+2.93\\%$ | | test_unbind | 0.4409s | 15.0064ms | 66.6384 Ops/s | 61.5491 Ops/s | $\textbf{\color{#35bf28}+8.27\\%}$ | | test_full_like | 8.5198ms | 7.2992ms | 137.0022 Ops/s | 128.6552 Ops/s | $\textbf{\color{#35bf28}+6.49\\%}$ | | test_zeros_like | 10.3871ms | 6.3400ms | 157.7284 Ops/s | 130.1458 Ops/s | $\textbf{\color{#35bf28}+21.19\\%}$ | | test_ones_like | 12.4441ms | 7.5559ms | 132.3468 Ops/s | 129.5867 Ops/s | $\color{#35bf28}+2.13\\%$ | | test_clone | 13.1233ms | 9.1783ms | 108.9523 Ops/s | 101.3393 Ops/s | $\textbf{\color{#35bf28}+7.51\\%}$ | | test_squeeze | 62.4170μs | 12.6837μs | 78.8413 KOps/s | 74.7077 KOps/s | $\textbf{\color{#35bf28}+5.53\\%}$ | | test_unsqueeze | 0.3176ms | 94.9518μs | 10.5317 KOps/s | 10.8469 KOps/s | $\color{#d91a1a}-2.91\\%$ | | test_split | 0.3461ms | 0.1969ms | 5.0779 KOps/s | 4.9684 KOps/s | $\color{#35bf28}+2.20\\%$ | | test_permute | 0.4520ms | 0.2226ms | 4.4931 KOps/s | 4.4783 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_stack | 28.3107ms | 24.3006ms | 41.1512 Ops/s | 37.4810 Ops/s | $\textbf{\color{#35bf28}+9.79\\%}$ | | test_cat | 27.8110ms | 24.0074ms | 41.6539 Ops/s | 38.8521 Ops/s | $\textbf{\color{#35bf28}+7.21\\%}$ |
github-actions[bot] commented 3 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 225. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 0.1573ms | 15.6694μs | 63.8186 KOps/s | 59.6173 KOps/s | $\textbf{\color{#35bf28}+7.05\\%}$ | | test_plain_set_stack_nested | 31.9200μs | 15.8094μs | 63.2534 KOps/s | 59.3180 KOps/s | $\textbf{\color{#35bf28}+6.63\\%}$ | | test_plain_set_nested_inplace | 0.1022ms | 16.8484μs | 59.3528 KOps/s | 55.6664 KOps/s | $\textbf{\color{#35bf28}+6.62\\%}$ | | test_plain_set_stack_nested_inplace | 45.8310μs | 16.8663μs | 59.2898 KOps/s | 55.8427 KOps/s | $\textbf{\color{#35bf28}+6.17\\%}$ | | test_items | 19.7410μs | 4.6668μs | 214.2801 KOps/s | 214.7748 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_items_nested | 0.3942ms | 0.3621ms | 2.7614 KOps/s | 2.7822 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_items_nested_locked | 0.4020ms | 0.3625ms | 2.7585 KOps/s | 2.7665 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_items_nested_leaf | 0.1022ms | 83.6640μs | 11.9526 KOps/s | 11.7459 KOps/s | $\color{#35bf28}+1.76\\%$ | | test_items_stack_nested | 0.4041ms | 0.3680ms | 2.7173 KOps/s | 2.7670 KOps/s | $\color{#d91a1a}-1.80\\%$ | | test_items_stack_nested_leaf | 0.1051ms | 84.7146μs | 11.8043 KOps/s | 11.7113 KOps/s | $\color{#35bf28}+0.79\\%$ | | test_items_stack_nested_locked | 0.4115ms | 0.3640ms | 2.7475 KOps/s | 2.7535 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_keys | 19.7300μs | 4.3564μs | 229.5469 KOps/s | 229.2419 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_keys_nested | 85.5720μs | 65.4881μs | 15.2699 KOps/s | 15.2254 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_keys_nested_locked | 0.7206ms | 71.8386μs | 13.9201 KOps/s | 13.6883 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_keys_nested_leaf | 78.7320μs | 56.9674μs | 17.5539 KOps/s | 17.3350 KOps/s | $\color{#35bf28}+1.26\\%$ | | test_keys_stack_nested | 82.8510μs | 66.0254μs | 15.1457 KOps/s | 15.1264 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_keys_stack_nested_leaf | 0.1449ms | 56.6286μs | 17.6589 KOps/s | 17.7042 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_keys_stack_nested_locked | 0.2568ms | 70.9282μs | 14.0988 KOps/s | 13.8693 KOps/s | $\color{#35bf28}+1.65\\%$ | | test_values | 65.5950μs | 1.7529μs | 570.4822 KOps/s | 569.3841 KOps/s | $\color{#35bf28}+0.19\\%$ | | test_values_nested | 74.1820μs | 33.7625μs | 29.6186 KOps/s | 29.6439 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_values_nested_locked | 0.1948ms | 35.6819μs | 28.0254 KOps/s | 28.1857 KOps/s | $\color{#d91a1a}-0.57\\%$ | | test_values_nested_leaf | 46.9410μs | 30.0184μs | 33.3129 KOps/s | 33.5611 KOps/s | $\color{#d91a1a}-0.74\\%$ | | test_values_stack_nested | 52.2110μs | 34.4334μs | 29.0415 KOps/s | 29.0074 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_values_stack_nested_leaf | 61.0610μs | 30.7536μs | 32.5165 KOps/s | 32.6295 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_values_stack_nested_locked | 58.0810μs | 36.2704μs | 27.5707 KOps/s | 27.4961 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_membership | 1.3140μs | 0.5419μs | 1.8454 MOps/s | 1.8293 MOps/s | $\color{#35bf28}+0.88\\%$ | | test_membership_nested | 13.1705μs | 1.9412μs | 515.1371 KOps/s | 527.7133 KOps/s | $\color{#d91a1a}-2.38\\%$ | | test_membership_nested_leaf | 15.0655μs | 1.9316μs | 517.6957 KOps/s | 531.0122 KOps/s | $\color{#d91a1a}-2.51\\%$ | | test_membership_stacked_nested | 21.0300μs | 1.9913μs | 502.1935 KOps/s | 510.5696 KOps/s | $\color{#d91a1a}-1.64\\%$ | | test_membership_stacked_nested_leaf | 19.7210μs | 1.9811μs | 504.7647 KOps/s | 513.5709 KOps/s | $\color{#d91a1a}-1.71\\%$ | | test_membership_nested_last | 32.6100μs | 2.8877μs | 346.2999 KOps/s | 344.4310 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_membership_nested_leaf_last | 30.9810μs | 2.9214μs | 342.3048 KOps/s | 346.0130 KOps/s | $\color{#d91a1a}-1.07\\%$ | | test_membership_stacked_nested_last | 27.9000μs | 8.2594μs | 121.0740 KOps/s | 273.9548 KOps/s | $\textbf{\color{#d91a1a}-55.81\\%}$ | | test_membership_stacked_nested_leaf_last | 24.2310μs | 8.3274μs | 120.0854 KOps/s | 277.4890 KOps/s | $\textbf{\color{#d91a1a}-56.72\\%}$ | | test_nested_getleaf | 45.7710μs | 7.8835μs | 126.8466 KOps/s | 126.9366 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_nested_get | 33.6510μs | 7.4298μs | 134.5934 KOps/s | 134.9431 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_stacked_getleaf | 32.8500μs | 7.9525μs | 125.7463 KOps/s | 126.4556 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_stacked_get | 22.2110μs | 7.4359μs | 134.4819 KOps/s | 134.4617 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_nested_getitemleaf | 32.5410μs | 8.0957μs | 123.5228 KOps/s | 124.9457 KOps/s | $\color{#d91a1a}-1.14\\%$ | | test_nested_getitem | 22.6500μs | 7.5727μs | 132.0541 KOps/s | 132.1474 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_stacked_getitemleaf | 32.4800μs | 8.0520μs | 124.1928 KOps/s | 123.4818 KOps/s | $\color{#35bf28}+0.58\\%$ | | test_stacked_getitem | 29.7310μs | 7.5673μs | 132.1483 KOps/s | 131.9189 KOps/s | $\color{#35bf28}+0.17\\%$ | | test_lock_nested | 0.9364ms | 0.4602ms | 2.1730 KOps/s | 2.1570 KOps/s | $\color{#35bf28}+0.74\\%$ | | test_lock_stack_nested | 0.5080ms | 0.4221ms | 2.3689 KOps/s | 2.3246 KOps/s | $\color{#35bf28}+1.91\\%$ | | test_unlock_nested | 0.8163ms | 0.3809ms | 2.6254 KOps/s | 2.6149 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_unlock_stack_nested | 0.4524ms | 0.3418ms | 2.9261 KOps/s | 2.8592 KOps/s | $\color{#35bf28}+2.34\\%$ | | test_flatten_speed | 0.6031ms | 0.1038ms | 9.6325 KOps/s | 9.7124 KOps/s | $\color{#d91a1a}-0.82\\%$ | | test_unflatten_speed | 0.3085ms | 0.2817ms | 3.5504 KOps/s | 3.5513 KOps/s | $\color{#d91a1a}-0.02\\%$ | | test_common_ops | 1.5909ms | 1.2567ms | 795.7476 Ops/s | 782.1745 Ops/s | $\color{#35bf28}+1.74\\%$ | | test_creation | 14.7910μs | 1.6329μs | 612.4228 KOps/s | 602.7459 KOps/s | $\color{#35bf28}+1.61\\%$ | | test_creation_empty | 32.0110μs | 14.8428μs | 67.3729 KOps/s | 59.1326 KOps/s | $\textbf{\color{#35bf28}+13.94\\%}$ | | test_creation_nested_1 | 42.5710μs | 16.8084μs | 59.4941 KOps/s | 52.5583 KOps/s | $\textbf{\color{#35bf28}+13.20\\%}$ | | test_creation_nested_2 | 43.5610μs | 19.2004μs | 52.0823 KOps/s | 47.3439 KOps/s | $\textbf{\color{#35bf28}+10.01\\%}$ | | test_clone | 0.2280ms | 29.7817μs | 33.5777 KOps/s | 32.9888 KOps/s | $\color{#35bf28}+1.79\\%$ | | test_getitem[int] | 1.2938ms | 16.7338μs | 59.7592 KOps/s | 59.0591 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_getitem[slice_int] | 0.1540ms | 28.5049μs | 35.0817 KOps/s | 34.4848 KOps/s | $\color{#35bf28}+1.73\\%$ | | test_getitem[range] | 0.2450ms | 0.1119ms | 8.9343 KOps/s | 8.9630 KOps/s | $\color{#d91a1a}-0.32\\%$ | | test_getitem[tuple] | 0.1565ms | 24.5548μs | 40.7252 KOps/s | 39.9585 KOps/s | $\color{#35bf28}+1.92\\%$ | | test_getitem[list] | 93.0819ms | 0.1189ms | 8.4071 KOps/s | 9.7821 KOps/s | $\textbf{\color{#d91a1a}-14.06\\%}$ | | test_setitem_dim[int] | 71.6920μs | 51.1054μs | 19.5674 KOps/s | 18.9548 KOps/s | $\color{#35bf28}+3.23\\%$ | | test_setitem_dim[slice_int] | 0.2297ms | 74.3905μs | 13.4426 KOps/s | 12.9641 KOps/s | $\color{#35bf28}+3.69\\%$ | | test_setitem_dim[range] | 0.3038ms | 0.1379ms | 7.2514 KOps/s | 7.2665 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_setitem_dim[tuple] | 0.2101ms | 67.5459μs | 14.8047 KOps/s | 14.3299 KOps/s | $\color{#35bf28}+3.31\\%$ | | test_setitem | 0.1905ms | 41.1660μs | 24.2919 KOps/s | 23.6419 KOps/s | $\color{#35bf28}+2.75\\%$ | | test_set | 0.1897ms | 40.7539μs | 24.5375 KOps/s | 23.8798 KOps/s | $\color{#35bf28}+2.75\\%$ | | test_set_shared | 0.3733ms | 53.4701μs | 18.7021 KOps/s | 18.9639 KOps/s | $\color{#d91a1a}-1.38\\%$ | | test_update | 0.2261ms | 48.2930μs | 20.7069 KOps/s | 19.9582 KOps/s | $\color{#35bf28}+3.75\\%$ | | test_update_nested | 0.2401ms | 55.4626μs | 18.0302 KOps/s | 17.2613 KOps/s | $\color{#35bf28}+4.45\\%$ | | test_update__nested | 0.2483ms | 60.8611μs | 16.4309 KOps/s | 16.9384 KOps/s | $\color{#d91a1a}-3.00\\%$ | | test_set_nested | 0.1986ms | 42.3790μs | 23.5966 KOps/s | 22.6934 KOps/s | $\color{#35bf28}+3.98\\%$ | | test_set_nested_new | 0.1967ms | 47.1553μs | 21.2065 KOps/s | 21.0308 KOps/s | $\color{#35bf28}+0.84\\%$ | | test_select | 0.2376ms | 60.5347μs | 16.5194 KOps/s | 16.0189 KOps/s | $\color{#35bf28}+3.12\\%$ | | test_select_nested | 0.3917ms | 51.7289μs | 19.3316 KOps/s | 19.8437 KOps/s | $\color{#d91a1a}-2.58\\%$ | | test_exclude_nested | 96.3620μs | 67.6554μs | 14.7808 KOps/s | 14.7328 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_empty[True] | 0.3210ms | 0.2785ms | 3.5907 KOps/s | 3.5985 KOps/s | $\color{#d91a1a}-0.22\\%$ | | test_empty[False] | 2.4780μs | 0.8461μs | 1.1819 MOps/s | 1.1523 MOps/s | $\color{#35bf28}+2.57\\%$ | | test_to | 0.1305ms | 39.2948μs | 25.4486 KOps/s | 25.8968 KOps/s | $\color{#d91a1a}-1.73\\%$ | | test_to_nonblocking | 0.2127ms | 25.3608μs | 39.4309 KOps/s | 40.5081 KOps/s | $\color{#d91a1a}-2.66\\%$ | | test_unbind_speed | 0.9682ms | 0.2908ms | 3.4392 KOps/s | 3.3633 KOps/s | $\color{#35bf28}+2.26\\%$ | | test_unbind_speed_stack0 | 0.3349ms | 0.2892ms | 3.4578 KOps/s | 3.4102 KOps/s | $\color{#35bf28}+1.40\\%$ | | test_unbind_speed_stack1 | 93.3672ms | 0.7488ms | 1.3355 KOps/s | 1.3131 KOps/s | $\color{#35bf28}+1.70\\%$ | | test_split | 95.1972ms | 2.2847ms | 437.6905 Ops/s | 426.8380 Ops/s | $\color{#35bf28}+2.54\\%$ | | test_chunk | 94.7383ms | 2.3015ms | 434.4988 Ops/s | 424.6943 Ops/s | $\color{#35bf28}+2.31\\%$ | | test_creation[device0] | 0.2495ms | 0.1044ms | 9.5817 KOps/s | 9.5008 KOps/s | $\color{#35bf28}+0.85\\%$ | | test_creation_from_tensor | 0.2548ms | 0.1007ms | 9.9259 KOps/s | 9.8778 KOps/s | $\color{#35bf28}+0.49\\%$ | | test_add_one[memmap_tensor0] | 0.1865ms | 8.8413μs | 113.1053 KOps/s | 113.7384 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_contiguous[memmap_tensor0] | 27.3510μs | 2.1806μs | 458.5996 KOps/s | 455.2820 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_stack[memmap_tensor0] | 35.4800μs | 6.8035μs | 146.9828 KOps/s | 144.3149 KOps/s | $\color{#35bf28}+1.85\\%$ | | test_memmaptd_index | 1.1196ms | 0.4294ms | 2.3288 KOps/s | 2.3460 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_memmaptd_index_astensor | 0.8286ms | 0.4919ms | 2.0329 KOps/s | 2.0262 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_memmaptd_index_op | 1.4231ms | 1.0040ms | 995.9826 Ops/s | 967.6070 Ops/s | $\color{#35bf28}+2.93\\%$ | | test_serialize_model | 95.4702ms | 89.9607ms | 11.1160 Ops/s | 10.8847 Ops/s | $\color{#35bf28}+2.13\\%$ | | test_serialize_model_pickle | 1.3489s | 1.2368s | 0.8086 Ops/s | 0.7643 Ops/s | $\textbf{\color{#35bf28}+5.78\\%}$ | | test_serialize_weights | 90.0436ms | 86.5948ms | 11.5480 Ops/s | 9.7128 Ops/s | $\textbf{\color{#35bf28}+18.89\\%}$ | | test_serialize_weights_returnearly | 0.1867s | 62.6740ms | 15.9556 Ops/s | 14.8703 Ops/s | $\textbf{\color{#35bf28}+7.30\\%}$ | | test_serialize_weights_pickle | 1.4108s | 1.2442s | 0.8037 Ops/s | 0.8084 Ops/s | $\color{#d91a1a}-0.57\\%$ | | test_reshape_pytree | 94.7810μs | 37.7155μs | 26.5143 KOps/s | 26.1511 KOps/s | $\color{#35bf28}+1.39\\%$ | | test_reshape_td | 0.1053ms | 42.6814μs | 23.4294 KOps/s | 23.8288 KOps/s | $\color{#d91a1a}-1.68\\%$ | | test_view_pytree | 0.1504ms | 36.5831μs | 27.3350 KOps/s | 26.4040 KOps/s | $\color{#35bf28}+3.53\\%$ | | test_view_td | 0.1981ms | 48.5848μs | 20.5826 KOps/s | 20.7077 KOps/s | $\color{#d91a1a}-0.60\\%$ | | test_unbind_pytree | 0.1427ms | 35.4551μs | 28.2047 KOps/s | 27.3606 KOps/s | $\color{#35bf28}+3.08\\%$ | | test_unbind_td | 0.4572ms | 43.3041μs | 23.0925 KOps/s | 22.6504 KOps/s | $\color{#35bf28}+1.95\\%$ | | test_split_pytree | 0.3568ms | 48.6415μs | 20.5586 KOps/s | 20.5283 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_split_td | 93.9266ms | 70.4348μs | 14.1975 KOps/s | 16.6169 KOps/s | $\textbf{\color{#d91a1a}-14.56\\%}$ | | test_add_pytree | 0.2455ms | 59.1757μs | 16.8988 KOps/s | 16.8767 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_add_td | 0.2536ms | 88.8415μs | 11.2560 KOps/s | 10.7551 KOps/s | $\color{#35bf28}+4.66\\%$ | | test_compile_add_one_nested[tensordict-compile] | 0.4152ms | 0.2109ms | 4.7418 KOps/s | 4.6024 KOps/s | $\color{#35bf28}+3.03\\%$ | | test_compile_add_one_nested[tensordict-eager] | 0.3201ms | 0.1744ms | 5.7346 KOps/s | 5.7307 KOps/s | $\color{#35bf28}+0.07\\%$ | | test_compile_add_one_nested[pytree-compile] | 0.3011ms | 0.1474ms | 6.7834 KOps/s | 6.7158 KOps/s | $\color{#35bf28}+1.01\\%$ | | test_compile_add_one_nested[pytree-eager] | 0.3648ms | 0.1940ms | 5.1541 KOps/s | 5.1149 KOps/s | $\color{#35bf28}+0.77\\%$ | | test_compile_copy_nested[tensordict-compile] | 0.1931ms | 21.8767μs | 45.7107 KOps/s | 44.1398 KOps/s | $\color{#35bf28}+3.56\\%$ | | test_compile_copy_nested[tensordict-eager] | 0.2493ms | 47.2749μs | 21.1529 KOps/s | 20.9161 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_compile_copy_nested[pytree-compile] | 0.2518ms | 72.8893μs | 13.7194 KOps/s | 13.8059 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_compile_copy_nested[pytree-eager] | 0.2643ms | 61.6739μs | 16.2143 KOps/s | 16.7320 KOps/s | $\color{#d91a1a}-3.09\\%$ | | test_compile_add_one_flat[tensordict-compile] | 0.5131ms | 0.3338ms | 2.9960 KOps/s | 3.0192 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_compile_add_one_flat[tensordict-eager] | 0.3671ms | 0.2249ms | 4.4455 KOps/s | 4.4838 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_compile_add_one_flat[tensorclass-compile] | 0.3057ms | 0.1407ms | 7.1078 KOps/s | 7.5629 KOps/s | $\textbf{\color{#d91a1a}-6.02\\%}$ | | test_compile_add_one_flat[tensorclass-eager] | 0.2372ms | 65.2466μs | 15.3265 KOps/s | 15.3391 KOps/s | $\color{#d91a1a}-0.08\\%$ | | test_compile_add_one_flat[pytree-compile] | 0.4732ms | 0.3279ms | 3.0497 KOps/s | 3.0162 KOps/s | $\color{#35bf28}+1.11\\%$ | | test_compile_add_one_flat[pytree-eager] | 0.8156ms | 0.6413ms | 1.5593 KOps/s | 1.5615 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_compile_add_self_flat[tensordict-eager] | 0.4229ms | 0.2727ms | 3.6674 KOps/s | 3.6769 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_compile_add_self_flat[tensordict-compile] | 0.4735ms | 0.3308ms | 3.0227 KOps/s | 3.0121 KOps/s | $\color{#35bf28}+0.35\\%$ | | test_compile_add_self_flat[tensorclass-eager] | 0.2241ms | 74.4321μs | 13.4351 KOps/s | 13.2188 KOps/s | $\color{#35bf28}+1.64\\%$ | | test_compile_add_self_flat[tensorclass-compile] | 0.2769ms | 0.1334ms | 7.4989 KOps/s | 7.3241 KOps/s | $\color{#35bf28}+2.39\\%$ | | test_compile_add_self_flat[pytree-eager] | 0.7435ms | 0.5537ms | 1.8060 KOps/s | 1.8571 KOps/s | $\color{#d91a1a}-2.75\\%$ | | test_compile_add_self_flat[pytree-compile] | 0.4818ms | 0.3256ms | 3.0711 KOps/s | 3.0189 KOps/s | $\color{#35bf28}+1.73\\%$ | | test_compile_copy_flat[tensordict-compile] | 0.1539ms | 17.9990μs | 55.5585 KOps/s | 52.4857 KOps/s | $\textbf{\color{#35bf28}+5.85\\%}$ | | test_compile_copy_flat[tensordict-eager] | 0.1378ms | 31.5228μs | 31.7231 KOps/s | 31.8165 KOps/s | $\color{#d91a1a}-0.29\\%$ | | test_compile_copy_flat[pytree-compile] | 0.1012ms | 75.5163μs | 13.2422 KOps/s | 13.2552 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_compile_copy_flat[pytree-eager] | 0.1702ms | 59.9960μs | 16.6678 KOps/s | 16.5928 KOps/s | $\color{#35bf28}+0.45\\%$ | | test_compile_assign_and_add[tensordict-compile] | 2.4951ms | 0.9217ms | 1.0849 KOps/s | 1.0770 KOps/s | $\color{#35bf28}+0.74\\%$ | | test_compile_assign_and_add[tensordict-eager] | 3.7202ms | 3.3519ms | 298.3377 Ops/s | 297.3389 Ops/s | $\color{#35bf28}+0.34\\%$ | | test_compile_assign_and_add[pytree-compile] | 2.6209ms | 0.9393ms | 1.0646 KOps/s | 1.0591 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_compile_assign_and_add[pytree-eager] | 3.6108ms | 3.3240ms | 300.8402 Ops/s | 298.3927 Ops/s | $\color{#35bf28}+0.82\\%$ | | test_compile_indexing[tensor-tensordict-compile] | 0.2653ms | 0.1120ms | 8.9297 KOps/s | 8.8295 KOps/s | $\color{#35bf28}+1.14\\%$ | | test_compile_indexing[tensor-tensordict-eager] | 0.2517ms | 63.0512μs | 15.8601 KOps/s | 14.8537 KOps/s | $\textbf{\color{#35bf28}+6.78\\%}$ | | test_compile_indexing[tensor-tensorclass-compile] | 0.2502ms | 0.1045ms | 9.5690 KOps/s | 9.4419 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_compile_indexing[tensor-tensorclass-eager] | 0.2213ms | 47.2325μs | 21.1719 KOps/s | 20.5127 KOps/s | $\color{#35bf28}+3.21\\%$ | | test_compile_indexing[tensor-pytree-compile] | 0.2607ms | 0.1093ms | 9.1484 KOps/s | 9.1812 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_compile_indexing[tensor-pytree-eager] | 0.2241ms | 48.9166μs | 20.4430 KOps/s | 20.5764 KOps/s | $\color{#d91a1a}-0.65\\%$ | | test_compile_indexing[slice-tensordict-compile] | 0.3133ms | 0.1426ms | 7.0127 KOps/s | 7.1225 KOps/s | $\color{#d91a1a}-1.54\\%$ | | test_compile_indexing[slice-tensordict-eager] | 0.3177ms | 25.0502μs | 39.9199 KOps/s | 38.2315 KOps/s | $\color{#35bf28}+4.42\\%$ | | test_compile_indexing[slice-tensorclass-compile] | 0.3092ms | 0.1323ms | 7.5578 KOps/s | 7.5226 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_compile_indexing[slice-tensorclass-eager] | 0.2178ms | 22.1542μs | 45.1382 KOps/s | 42.2761 KOps/s | $\textbf{\color{#35bf28}+6.77\\%}$ | | test_compile_indexing[slice-pytree-compile] | 0.3157ms | 0.1343ms | 7.4446 KOps/s | 7.5638 KOps/s | $\color{#d91a1a}-1.58\\%$ | | test_compile_indexing[slice-pytree-eager] | 0.1563ms | 22.4003μs | 44.6422 KOps/s | 43.1155 KOps/s | $\color{#35bf28}+3.54\\%$ | | test_compile_indexing[int-tensordict-compile] | 0.2924ms | 0.1390ms | 7.1964 KOps/s | 7.0718 KOps/s | $\color{#35bf28}+1.76\\%$ | | test_compile_indexing[int-tensordict-eager] | 0.4988ms | 25.2723μs | 39.5690 KOps/s | 37.9158 KOps/s | $\color{#35bf28}+4.36\\%$ | | test_compile_indexing[int-tensorclass-compile] | 0.2828ms | 0.1320ms | 7.5759 KOps/s | 7.5028 KOps/s | $\color{#35bf28}+0.98\\%$ | | test_compile_indexing[int-tensorclass-eager] | 0.1503ms | 21.9353μs | 45.5887 KOps/s | 44.7317 KOps/s | $\color{#35bf28}+1.92\\%$ | | test_compile_indexing[int-pytree-compile] | 0.3119ms | 0.1338ms | 7.4747 KOps/s | 7.5189 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_compile_indexing[int-pytree-eager] | 0.1325ms | 22.9348μs | 43.6019 KOps/s | 42.9464 KOps/s | $\color{#35bf28}+1.53\\%$ | | test_mod_add[eager] | 0.1875ms | 37.1530μs | 26.9157 KOps/s | 25.9227 KOps/s | $\color{#35bf28}+3.83\\%$ | | test_mod_add[compile] | 0.2180ms | 68.6759μs | 14.5612 KOps/s | 14.4408 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_mod_add[compile-overhead] | 0.2602ms | 0.1466ms | 6.8206 KOps/s | 6.6484 KOps/s | $\color{#35bf28}+2.59\\%$ | | test_mod_wrap[eager] | 0.4152ms | 0.2475ms | 4.0399 KOps/s | 3.8954 KOps/s | $\color{#35bf28}+3.71\\%$ | | test_mod_wrap[compile] | 1.4446ms | 0.2862ms | 3.4945 KOps/s | 3.4855 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_mod_wrap[compile-overhead] | 8.2208ms | 4.3831ms | 228.1506 Ops/s | 227.9730 Ops/s | $\color{#35bf28}+0.08\\%$ | | test_mod_wrap_and_backward[eager] | 1.6155ms | 1.4332ms | 697.7571 Ops/s | 687.8461 Ops/s | $\color{#35bf28}+1.44\\%$ | | test_mod_wrap_and_backward[compile] | 1.6347ms | 1.4308ms | 698.9151 Ops/s | 693.7477 Ops/s | $\color{#35bf28}+0.74\\%$ | | test_mod_wrap_and_backward[compile-overhead] | 1.4617ms | 0.9993ms | 1.0007 KOps/s | 995.8551 Ops/s | $\color{#35bf28}+0.49\\%$ | | test_seq_add[eager] | 0.2587ms | 0.1051ms | 9.5145 KOps/s | 8.9863 KOps/s | $\textbf{\color{#35bf28}+5.88\\%}$ | | test_seq_add[compile] | 0.2413ms | 84.6919μs | 11.8075 KOps/s | 11.4165 KOps/s | $\color{#35bf28}+3.43\\%$ | | test_seq_add[compile-overhead] | 0.2719ms | 0.1228ms | 8.1466 KOps/s | 8.1494 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_seq_wrap[eager] | 0.6116ms | 0.4068ms | 2.4579 KOps/s | 2.3908 KOps/s | $\color{#35bf28}+2.81\\%$ | | test_seq_wrap[compile] | 1.5641ms | 0.3172ms | 3.1530 KOps/s | 3.1220 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_seq_wrap[compile-overhead] | 0.3197s | 0.1512s | 6.6133 Ops/s | 6.5905 Ops/s | $\color{#35bf28}+0.35\\%$ | | test_func_call_runtime[False-eager] | 0.9392ms | 0.7623ms | 1.3118 KOps/s | 1.3254 KOps/s | $\color{#d91a1a}-1.02\\%$ | | test_func_call_runtime[False-compile] | 1.0185ms | 0.7917ms | 1.2631 KOps/s | 1.2449 KOps/s | $\color{#35bf28}+1.46\\%$ | | test_func_call_runtime[False-compile-overhead] | 0.4997ms | 0.3622ms | 2.7605 KOps/s | 2.7621 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_func_call_runtime[True-eager] | 1.1031ms | 0.9319ms | 1.0730 KOps/s | 1.0628 KOps/s | $\color{#35bf28}+0.96\\%$ | | test_func_call_runtime[True-compile] | 1.0401ms | 0.8395ms | 1.1911 KOps/s | 1.1825 KOps/s | $\color{#35bf28}+0.73\\%$ | | test_func_call_runtime[True-compile-overhead] | 0.6077ms | 0.4080ms | 2.4508 KOps/s | 2.4731 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_func_call_cm_runtime[False-eager] | 1.0146ms | 0.7478ms | 1.3373 KOps/s | 1.2967 KOps/s | $\color{#35bf28}+3.13\\%$ | | test_func_call_cm_runtime[False-compile] | 0.9780ms | 0.7942ms | 1.2591 KOps/s | 1.2444 KOps/s | $\color{#35bf28}+1.18\\%$ | | test_func_call_cm_runtime[False-compile-overhead] | 0.5129ms | 0.3621ms | 2.7615 KOps/s | 2.7476 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_func_call_cm_runtime[True-eager] | 1.2453ms | 1.0360ms | 965.2388 Ops/s | 958.0478 Ops/s | $\color{#35bf28}+0.75\\%$ | | test_func_call_cm_runtime[True-compile] | 1.2275ms | 1.0131ms | 987.0419 Ops/s | 977.7383 Ops/s | $\color{#35bf28}+0.95\\%$ | | test_func_call_cm_runtime[True-compile-overhead] | 1.2282ms | 1.0054ms | 994.6380 Ops/s | 952.7504 Ops/s | $\color{#35bf28}+4.40\\%$ | | test_distributed | 1.5609ms | 86.4938μs | 11.5615 KOps/s | 11.1131 KOps/s | $\color{#35bf28}+4.03\\%$ | | test_tdmodule | 33.8010μs | 14.9075μs | 67.0802 KOps/s | 61.5761 KOps/s | $\textbf{\color{#35bf28}+8.94\\%}$ | | test_tdmodule_dispatch | 48.4910μs | 29.5135μs | 33.8828 KOps/s | 31.6119 KOps/s | $\textbf{\color{#35bf28}+7.18\\%}$ | | test_tdseq | 30.7110μs | 15.2668μs | 65.5014 KOps/s | 60.5871 KOps/s | $\textbf{\color{#35bf28}+8.11\\%}$ | | test_tdseq_dispatch | 48.9810μs | 32.1559μs | 31.0985 KOps/s | 28.7450 KOps/s | $\textbf{\color{#35bf28}+8.19\\%}$ | | test_instantiation_functorch | 2.2119ms | 2.0293ms | 492.7719 Ops/s | 497.3688 Ops/s | $\color{#d91a1a}-0.92\\%$ | | test_instantiation_td | 2.1780ms | 1.2975ms | 770.7398 Ops/s | 766.7922 Ops/s | $\color{#35bf28}+0.51\\%$ | | test_exec_functorch | 0.3659ms | 0.2225ms | 4.4953 KOps/s | 4.4577 KOps/s | $\color{#35bf28}+0.84\\%$ | | test_exec_functional_call | 0.3665ms | 0.2136ms | 4.6810 KOps/s | 4.4772 KOps/s | $\color{#35bf28}+4.55\\%$ | | test_exec_td | 0.3642ms | 0.2161ms | 4.6278 KOps/s | 4.5215 KOps/s | $\color{#35bf28}+2.35\\%$ | | test_exec_td_decorator | 0.6445ms | 0.2709ms | 3.6912 KOps/s | 3.6370 KOps/s | $\color{#35bf28}+1.49\\%$ | | test_vmap_mlp_speed[True-True] | 0.8201ms | 0.6504ms | 1.5375 KOps/s | 1.5185 KOps/s | $\color{#35bf28}+1.25\\%$ | | test_vmap_mlp_speed[True-False] | 0.8572ms | 0.6477ms | 1.5440 KOps/s | 1.5275 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_vmap_mlp_speed[False-True] | 0.7306ms | 0.5686ms | 1.7586 KOps/s | 1.7580 KOps/s | $\color{#35bf28}+0.03\\%$ | | test_vmap_mlp_speed[False-False] | 0.7289ms | 0.5689ms | 1.7578 KOps/s | 1.7541 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.4937ms | 0.6970ms | 1.4347 KOps/s | 1.4226 KOps/s | $\color{#35bf28}+0.86\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8800ms | 0.6987ms | 1.4313 KOps/s | 1.4224 KOps/s | $\color{#35bf28}+0.62\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7804ms | 0.6145ms | 1.6273 KOps/s | 1.6340 KOps/s | $\color{#d91a1a}-0.42\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7776ms | 0.6145ms | 1.6274 KOps/s | 1.6317 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_vmap_transformer_speed[True-True] | 8.8813ms | 8.6200ms | 116.0094 Ops/s | 115.8962 Ops/s | $\color{#35bf28}+0.10\\%$ | | test_vmap_transformer_speed[True-False] | 8.8978ms | 8.6244ms | 115.9505 Ops/s | 115.9817 Ops/s | $\color{#d91a1a}-0.03\\%$ | | test_vmap_transformer_speed[False-True] | 8.8187ms | 8.4888ms | 117.8029 Ops/s | 117.6347 Ops/s | $\color{#35bf28}+0.14\\%$ | | test_vmap_transformer_speed[False-False] | 8.7236ms | 8.4979ms | 117.6758 Ops/s | 117.2657 Ops/s | $\color{#35bf28}+0.35\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 20.6052ms | 20.2257ms | 49.4421 Ops/s | 49.4681 Ops/s | $\color{#d91a1a}-0.05\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 20.5775ms | 20.2431ms | 49.3995 Ops/s | 49.5651 Ops/s | $\color{#d91a1a}-0.33\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 20.4568ms | 20.0827ms | 49.7940 Ops/s | 50.0264 Ops/s | $\color{#d91a1a}-0.46\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 20.3565ms | 20.0489ms | 49.8780 Ops/s | 50.0385 Ops/s | $\color{#d91a1a}-0.32\\%$ | | test_to_module_speed[True] | 1.2464ms | 1.1351ms | 880.9838 Ops/s | 881.3647 Ops/s | $\color{#d91a1a}-0.04\\%$ | | test_to_module_speed[False] | 1.2238ms | 1.1089ms | 901.8268 Ops/s | 903.3768 Ops/s | $\color{#d91a1a}-0.17\\%$ | | test_tc_init | 79.6410μs | 37.1266μs | 26.9349 KOps/s | 25.3725 KOps/s | $\textbf{\color{#35bf28}+6.16\\%}$ | | test_tc_init_nested | 0.1395ms | 71.8109μs | 13.9255 KOps/s | 12.2480 KOps/s | $\textbf{\color{#35bf28}+13.70\\%}$ | | test_tc_first_layer_tensor | 4.2785μs | 0.8013μs | 1.2480 MOps/s | 1.2535 MOps/s | $\color{#d91a1a}-0.44\\%$ | | test_tc_first_layer_nontensor | 18.4310μs | 2.5597μs | 390.6690 KOps/s | 392.5910 KOps/s | $\color{#d91a1a}-0.49\\%$ | | test_tc_second_layer_tensor | 6.7670μs | 1.6375μs | 610.6693 KOps/s | 614.6223 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_tc_second_layer_nontensor | 21.2500μs | 3.3735μs | 296.4302 KOps/s | 295.4588 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_unbind | 0.3370s | 11.8061ms | 84.7020 Ops/s | 78.4989 Ops/s | $\textbf{\color{#35bf28}+7.90\\%}$ | | test_full_like | 0.7606ms | 0.5792ms | 1.7266 KOps/s | 1.7246 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_zeros_like | 0.3751ms | 0.1981ms | 5.0484 KOps/s | 5.0491 KOps/s | $\color{#d91a1a}-0.01\\%$ | | test_ones_like | 0.3466ms | 0.1979ms | 5.0534 KOps/s | 5.0523 KOps/s | $\color{#35bf28}+0.02\\%$ | | test_clone | 0.5935ms | 0.4145ms | 2.4125 KOps/s | 2.4046 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_squeeze | 0.1258ms | 10.9599μs | 91.2418 KOps/s | 90.6859 KOps/s | $\color{#35bf28}+0.61\\%$ | | test_unsqueeze | 0.2804ms | 77.7881μs | 12.8554 KOps/s | 12.7288 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_split | 0.5424ms | 0.1714ms | 5.8335 KOps/s | 5.9166 KOps/s | $\color{#d91a1a}-1.40\\%$ | | test_permute | 0.3183ms | 0.1812ms | 5.5197 KOps/s | 5.5744 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_stack | 1.3675ms | 0.9135ms | 1.0947 KOps/s | 1.1067 KOps/s | $\color{#d91a1a}-1.09\\%$ | | test_cat | 1.3443ms | 1.2321ms | 811.6333 Ops/s | 811.5021 Ops/s | $\color{#35bf28}+0.02\\%$ |