pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
832 stars 74 forks source link

[BugFix] Fix `_make_dtype_promotion` backward compat #842

Closed vmoens closed 4 months ago

github-actions[bot] commented 4 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}17$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 55.3540μs | 17.7103μs | 56.4644 KOps/s | 60.0864 KOps/s | $\textbf{\color{#d91a1a}-6.03\\%}$ | | test_plain_set_stack_nested | 63.0680μs | 17.7788μs | 56.2466 KOps/s | 60.0870 KOps/s | $\textbf{\color{#d91a1a}-6.39\\%}$ | | test_plain_set_nested_inplace | 45.7660μs | 19.7439μs | 50.6486 KOps/s | 52.6535 KOps/s | $\color{#d91a1a}-3.81\\%$ | | test_plain_set_stack_nested_inplace | 67.1860μs | 19.7532μs | 50.6246 KOps/s | 53.0655 KOps/s | $\color{#d91a1a}-4.60\\%$ | | test_items | 27.7420μs | 2.6286μs | 380.4258 KOps/s | 362.4521 KOps/s | $\color{#35bf28}+4.96\\%$ | | test_items_nested | 0.4451ms | 0.2649ms | 3.7743 KOps/s | 3.8171 KOps/s | $\color{#d91a1a}-1.12\\%$ | | test_items_nested_locked | 1.3731ms | 0.2655ms | 3.7661 KOps/s | 3.7867 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_items_nested_leaf | 0.1553ms | 76.4910μs | 13.0734 KOps/s | 12.8656 KOps/s | $\color{#35bf28}+1.62\\%$ | | test_items_stack_nested | 0.5304ms | 0.2651ms | 3.7721 KOps/s | 3.7818 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_items_stack_nested_leaf | 0.1345ms | 77.3703μs | 12.9249 KOps/s | 12.5115 KOps/s | $\color{#35bf28}+3.30\\%$ | | test_items_stack_nested_locked | 1.2965ms | 0.2653ms | 3.7688 KOps/s | 3.7486 KOps/s | $\color{#35bf28}+0.54\\%$ | | test_keys | 19.8470μs | 3.8461μs | 260.0016 KOps/s | 256.8036 KOps/s | $\color{#35bf28}+1.25\\%$ | | test_keys_nested | 0.2711ms | 0.1400ms | 7.1420 KOps/s | 7.1726 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_keys_nested_locked | 0.6855ms | 0.1452ms | 6.8865 KOps/s | 6.9369 KOps/s | $\color{#d91a1a}-0.73\\%$ | | test_keys_nested_leaf | 0.2021ms | 0.1192ms | 8.3891 KOps/s | 8.5311 KOps/s | $\color{#d91a1a}-1.66\\%$ | | test_keys_stack_nested | 0.3509ms | 0.1412ms | 7.0839 KOps/s | 7.2263 KOps/s | $\color{#d91a1a}-1.97\\%$ | | test_keys_stack_nested_leaf | 0.2171ms | 0.1194ms | 8.3767 KOps/s | 8.5227 KOps/s | $\color{#d91a1a}-1.71\\%$ | | test_keys_stack_nested_locked | 0.2727ms | 0.1450ms | 6.8966 KOps/s | 7.0494 KOps/s | $\color{#d91a1a}-2.17\\%$ | | test_values | 8.4940μs | 1.1295μs | 885.3185 KOps/s | 867.1205 KOps/s | $\color{#35bf28}+2.10\\%$ | | test_values_nested | 0.2329ms | 51.6144μs | 19.3744 KOps/s | 19.5584 KOps/s | $\color{#d91a1a}-0.94\\%$ | | test_values_nested_locked | 0.1043ms | 51.1561μs | 19.5480 KOps/s | 19.7291 KOps/s | $\color{#d91a1a}-0.92\\%$ | | test_values_nested_leaf | 92.3430μs | 46.1655μs | 21.6612 KOps/s | 21.8144 KOps/s | $\color{#d91a1a}-0.70\\%$ | | test_values_stack_nested | 0.1049ms | 51.7077μs | 19.3395 KOps/s | 19.1167 KOps/s | $\color{#35bf28}+1.17\\%$ | | test_values_stack_nested_leaf | 95.0890μs | 46.5551μs | 21.4799 KOps/s | 21.6953 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_values_stack_nested_locked | 99.3370μs | 51.7264μs | 19.3325 KOps/s | 19.0811 KOps/s | $\color{#35bf28}+1.32\\%$ | | test_membership | 16.9720μs | 1.3425μs | 744.8538 KOps/s | 749.1682 KOps/s | $\color{#d91a1a}-0.58\\%$ | | test_membership_nested | 39.7040μs | 3.4151μs | 292.8194 KOps/s | 283.6397 KOps/s | $\color{#35bf28}+3.24\\%$ | | test_membership_nested_leaf | 26.3400μs | 3.4368μs | 290.9677 KOps/s | 290.4522 KOps/s | $\color{#35bf28}+0.18\\%$ | | test_membership_stacked_nested | 40.3760μs | 3.3864μs | 295.2984 KOps/s | 286.6216 KOps/s | $\color{#35bf28}+3.03\\%$ | | test_membership_stacked_nested_leaf | 23.2240μs | 3.4162μs | 292.7258 KOps/s | 296.4102 KOps/s | $\color{#d91a1a}-1.24\\%$ | | test_membership_nested_last | 39.3340μs | 4.1707μs | 239.7695 KOps/s | 241.5917 KOps/s | $\color{#d91a1a}-0.75\\%$ | | test_membership_nested_leaf_last | 32.1900μs | 4.2040μs | 237.8680 KOps/s | 238.4623 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_membership_stacked_nested_last | 22.5120μs | 4.1620μs | 240.2674 KOps/s | 243.9938 KOps/s | $\color{#d91a1a}-1.53\\%$ | | test_membership_stacked_nested_leaf_last | 29.8160μs | 4.0712μs | 245.6257 KOps/s | 241.9504 KOps/s | $\color{#35bf28}+1.52\\%$ | | test_nested_getleaf | 49.6740μs | 10.6022μs | 94.3200 KOps/s | 96.6881 KOps/s | $\color{#d91a1a}-2.45\\%$ | | test_nested_get | 39.9950μs | 10.1757μs | 98.2731 KOps/s | 101.4749 KOps/s | $\color{#d91a1a}-3.16\\%$ | | test_stacked_getleaf | 26.5000μs | 10.5995μs | 94.3442 KOps/s | 97.2695 KOps/s | $\color{#d91a1a}-3.01\\%$ | | test_stacked_get | 44.4750μs | 9.9757μs | 100.2436 KOps/s | 102.0948 KOps/s | $\color{#d91a1a}-1.81\\%$ | | test_nested_getitemleaf | 36.2080μs | 11.2487μs | 88.8993 KOps/s | 91.2769 KOps/s | $\color{#d91a1a}-2.60\\%$ | | test_nested_getitem | 50.1640μs | 10.4380μs | 95.8034 KOps/s | 99.1377 KOps/s | $\color{#d91a1a}-3.36\\%$ | | test_stacked_getitemleaf | 77.7660μs | 11.1530μs | 89.6622 KOps/s | 91.2473 KOps/s | $\color{#d91a1a}-1.74\\%$ | | test_stacked_getitem | 50.9150μs | 10.3734μs | 96.4007 KOps/s | 100.6281 KOps/s | $\color{#d91a1a}-4.20\\%$ | | test_lock_nested | 48.0267ms | 0.3930ms | 2.5448 KOps/s | 2.9545 KOps/s | $\textbf{\color{#d91a1a}-13.87\\%}$ | | test_lock_stack_nested | 0.7173ms | 0.3158ms | 3.1663 KOps/s | 3.2830 KOps/s | $\color{#d91a1a}-3.55\\%$ | | test_unlock_nested | 0.6665ms | 0.3460ms | 2.8903 KOps/s | 2.9081 KOps/s | $\color{#d91a1a}-0.61\\%$ | | test_unlock_stack_nested | 0.6596ms | 0.3244ms | 3.0831 KOps/s | 3.1829 KOps/s | $\color{#d91a1a}-3.14\\%$ | | test_flatten_speed | 0.1967ms | 95.9391μs | 10.4233 KOps/s | 10.4457 KOps/s | $\color{#d91a1a}-0.21\\%$ | | test_unflatten_speed | 0.7220ms | 0.4165ms | 2.4008 KOps/s | 2.4253 KOps/s | $\color{#d91a1a}-1.01\\%$ | | test_common_ops | 3.8861ms | 0.7522ms | 1.3295 KOps/s | 1.3956 KOps/s | $\color{#d91a1a}-4.73\\%$ | | test_creation | 49.7930μs | 1.9619μs | 509.7128 KOps/s | 532.4408 KOps/s | $\color{#d91a1a}-4.27\\%$ | | test_creation_empty | 31.8400μs | 11.5150μs | 86.8432 KOps/s | 100.4176 KOps/s | $\textbf{\color{#d91a1a}-13.52\\%}$ | | test_creation_nested_1 | 57.1270μs | 14.3477μs | 69.6974 KOps/s | 78.3909 KOps/s | $\textbf{\color{#d91a1a}-11.09\\%}$ | | test_creation_nested_2 | 43.6720μs | 17.3746μs | 57.5552 KOps/s | 62.9438 KOps/s | $\textbf{\color{#d91a1a}-8.56\\%}$ | | test_clone | 77.1450μs | 13.3070μs | 75.1483 KOps/s | 74.0075 KOps/s | $\color{#35bf28}+1.54\\%$ | | test_getitem[int] | 47.3190μs | 11.5925μs | 86.2627 KOps/s | 88.3975 KOps/s | $\color{#d91a1a}-2.41\\%$ | | test_getitem[slice_int] | 80.8330μs | 22.8693μs | 43.7267 KOps/s | 44.8287 KOps/s | $\color{#d91a1a}-2.46\\%$ | | test_getitem[range] | 79.2090μs | 58.7404μs | 17.0241 KOps/s | 16.8546 KOps/s | $\color{#35bf28}+1.01\\%$ | | test_getitem[tuple] | 53.4000μs | 18.9926μs | 52.6521 KOps/s | 53.2898 KOps/s | $\color{#d91a1a}-1.20\\%$ | | test_getitem[list] | 0.1069ms | 40.7528μs | 24.5382 KOps/s | 24.2163 KOps/s | $\color{#35bf28}+1.33\\%$ | | test_setitem_dim[int] | 96.0200μs | 37.1283μs | 26.9336 KOps/s | 29.2298 KOps/s | $\textbf{\color{#d91a1a}-7.86\\%}$ | | test_setitem_dim[slice_int] | 0.1017ms | 64.8672μs | 15.4161 KOps/s | 16.5797 KOps/s | $\textbf{\color{#d91a1a}-7.02\\%}$ | | test_setitem_dim[range] | 0.1476ms | 87.0964μs | 11.4815 KOps/s | 12.0045 KOps/s | $\color{#d91a1a}-4.36\\%$ | | test_setitem_dim[tuple] | 0.1009ms | 53.1101μs | 18.8288 KOps/s | 20.3930 KOps/s | $\textbf{\color{#d91a1a}-7.67\\%}$ | | test_setitem | 80.4200μs | 21.5403μs | 46.4245 KOps/s | 48.7903 KOps/s | $\color{#d91a1a}-4.85\\%$ | | test_set | 85.4700μs | 20.7183μs | 48.2666 KOps/s | 50.1943 KOps/s | $\color{#d91a1a}-3.84\\%$ | | test_set_shared | 3.5246ms | 0.1463ms | 6.8363 KOps/s | 6.9299 KOps/s | $\color{#d91a1a}-1.35\\%$ | | test_update | 81.3130μs | 23.9184μs | 41.8088 KOps/s | 45.4550 KOps/s | $\textbf{\color{#d91a1a}-8.02\\%}$ | | test_update_nested | 78.4870μs | 32.7385μs | 30.5451 KOps/s | 31.7734 KOps/s | $\color{#d91a1a}-3.87\\%$ | | test_update__nested | 79.9500μs | 25.7129μs | 38.8911 KOps/s | 39.2786 KOps/s | $\color{#d91a1a}-0.99\\%$ | | test_set_nested | 58.4000μs | 22.3441μs | 44.7544 KOps/s | 46.3686 KOps/s | $\color{#d91a1a}-3.48\\%$ | | test_set_nested_new | 88.9260μs | 26.5642μs | 37.6447 KOps/s | 38.8877 KOps/s | $\color{#d91a1a}-3.20\\%$ | | test_select | 0.1026ms | 42.6779μs | 23.4314 KOps/s | 24.2199 KOps/s | $\color{#d91a1a}-3.26\\%$ | | test_select_nested | 0.1231ms | 61.0495μs | 16.3801 KOps/s | 16.6230 KOps/s | $\color{#d91a1a}-1.46\\%$ | | test_exclude_nested | 0.2637ms | 0.1225ms | 8.1649 KOps/s | 8.2924 KOps/s | $\color{#d91a1a}-1.54\\%$ | | test_empty[True] | 6.4191ms | 0.4054ms | 2.4665 KOps/s | 2.5388 KOps/s | $\color{#d91a1a}-2.85\\%$ | | test_empty[False] | 8.5285μs | 1.1689μs | 855.5265 KOps/s | 883.4319 KOps/s | $\color{#d91a1a}-3.16\\%$ | | test_unbind_speed | 0.3333ms | 0.2609ms | 3.8324 KOps/s | 3.9025 KOps/s | $\color{#d91a1a}-1.80\\%$ | | test_unbind_speed_stack0 | 0.4723ms | 0.2588ms | 3.8647 KOps/s | 4.0350 KOps/s | $\color{#d91a1a}-4.22\\%$ | | test_unbind_speed_stack1 | 65.4266ms | 0.7319ms | 1.3663 KOps/s | 1.4080 KOps/s | $\color{#d91a1a}-2.96\\%$ | | test_split | 70.8134ms | 1.5999ms | 625.0545 Ops/s | 625.5295 Ops/s | $\color{#d91a1a}-0.08\\%$ | | test_chunk | 66.5128ms | 1.6001ms | 624.9447 Ops/s | 627.5064 Ops/s | $\color{#d91a1a}-0.41\\%$ | | test_creation[device0] | 0.1688ms | 84.6147μs | 11.8183 KOps/s | 11.8245 KOps/s | $\color{#d91a1a}-0.05\\%$ | | test_creation_from_tensor | 3.4892ms | 87.2343μs | 11.4634 KOps/s | 11.6895 KOps/s | $\color{#d91a1a}-1.93\\%$ | | test_add_one[memmap_tensor0] | 57.2470μs | 5.3478μs | 186.9912 KOps/s | 179.6468 KOps/s | $\color{#35bf28}+4.09\\%$ | | test_contiguous[memmap_tensor0] | 19.6370μs | 0.6765μs | 1.4783 MOps/s | 1.5835 MOps/s | $\textbf{\color{#d91a1a}-6.65\\%}$ | | test_stack[memmap_tensor0] | 27.0200μs | 3.6185μs | 276.3542 KOps/s | 278.5017 KOps/s | $\color{#d91a1a}-0.77\\%$ | | test_memmaptd_index | 0.9465ms | 0.2577ms | 3.8804 KOps/s | 3.8898 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_memmaptd_index_astensor | 1.0549ms | 0.3348ms | 2.9867 KOps/s | 3.0124 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_memmaptd_index_op | 0.9225ms | 0.6334ms | 1.5787 KOps/s | 1.6559 KOps/s | $\color{#d91a1a}-4.66\\%$ | | test_serialize_model | 0.1641s | 0.1113s | 8.9875 Ops/s | 8.8126 Ops/s | $\color{#35bf28}+1.99\\%$ | | test_serialize_model_pickle | 0.4479s | 0.3749s | 2.6674 Ops/s | 2.6364 Ops/s | $\color{#35bf28}+1.18\\%$ | | test_serialize_weights | 0.1674s | 0.1130s | 8.8481 Ops/s | 9.2638 Ops/s | $\color{#d91a1a}-4.49\\%$ | | test_serialize_weights_returnearly | 0.1905s | 0.1338s | 7.4739 Ops/s | 7.2129 Ops/s | $\color{#35bf28}+3.62\\%$ | | test_serialize_weights_pickle | 0.8452s | 0.5165s | 1.9362 Ops/s | 2.2778 Ops/s | $\textbf{\color{#d91a1a}-15.00\\%}$ | | test_serialize_weights_filesystem | 0.1520s | 97.6986ms | 10.2356 Ops/s | 10.4918 Ops/s | $\color{#d91a1a}-2.44\\%$ | | test_serialize_model_filesystem | 0.1003s | 92.9669ms | 10.7565 Ops/s | 9.7724 Ops/s | $\textbf{\color{#35bf28}+10.07\\%}$ | | test_reshape_pytree | 91.3840μs | 25.9002μs | 38.6097 KOps/s | 38.6569 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_reshape_td | 78.2370μs | 34.7075μs | 28.8122 KOps/s | 29.2915 KOps/s | $\color{#d91a1a}-1.64\\%$ | | test_view_pytree | 57.9490μs | 25.5521μs | 39.1357 KOps/s | 38.8632 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_view_td | 83.5270μs | 40.0234μs | 24.9854 KOps/s | 25.5857 KOps/s | $\color{#d91a1a}-2.35\\%$ | | test_unbind_pytree | 77.9660μs | 29.6111μs | 33.7711 KOps/s | 33.4877 KOps/s | $\color{#35bf28}+0.85\\%$ | | test_unbind_td | 0.4280ms | 38.1829μs | 26.1898 KOps/s | 26.5741 KOps/s | $\color{#d91a1a}-1.45\\%$ | | test_split_pytree | 69.7110μs | 29.3253μs | 34.1002 KOps/s | 33.3340 KOps/s | $\color{#35bf28}+2.30\\%$ | | test_split_td | 0.5354ms | 40.7229μs | 24.5562 KOps/s | 24.7232 KOps/s | $\color{#d91a1a}-0.68\\%$ | | test_add_pytree | 79.6890μs | 35.3846μs | 28.2609 KOps/s | 27.9761 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_add_td | 0.1777ms | 59.6394μs | 16.7674 KOps/s | 18.4779 KOps/s | $\textbf{\color{#d91a1a}-9.26\\%}$ | | test_distributed | 0.2173ms | 99.8316μs | 10.0169 KOps/s | 9.7284 KOps/s | $\color{#35bf28}+2.97\\%$ | | test_tdmodule | 41.7680μs | 18.0518μs | 55.3962 KOps/s | 56.3438 KOps/s | $\color{#d91a1a}-1.68\\%$ | | test_tdmodule_dispatch | 69.4000μs | 35.9701μs | 27.8008 KOps/s | 28.5292 KOps/s | $\color{#d91a1a}-2.55\\%$ | | test_tdseq | 40.7170μs | 20.7750μs | 48.1348 KOps/s | 47.9493 KOps/s | $\color{#35bf28}+0.39\\%$ | | test_tdseq_dispatch | 75.5810μs | 40.3784μs | 24.7657 KOps/s | 24.6810 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_instantiation_functorch | 1.5059ms | 1.3138ms | 761.1327 Ops/s | 762.1626 Ops/s | $\color{#d91a1a}-0.14\\%$ | | test_instantiation_td | 1.8900ms | 1.0364ms | 964.8747 Ops/s | 977.2380 Ops/s | $\color{#d91a1a}-1.27\\%$ | | test_exec_functorch | 0.3597ms | 0.1589ms | 6.2950 KOps/s | 6.1273 KOps/s | $\color{#35bf28}+2.74\\%$ | | test_exec_functional_call | 0.2970ms | 0.1504ms | 6.6489 KOps/s | 6.6297 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_exec_td | 0.2932ms | 0.1466ms | 6.8234 KOps/s | 6.8598 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_exec_td_decorator | 0.9000ms | 0.2237ms | 4.4712 KOps/s | 4.5075 KOps/s | $\color{#d91a1a}-0.80\\%$ | | test_vmap_mlp_speed[True-True] | 0.7853ms | 0.4928ms | 2.0293 KOps/s | 2.0365 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_vmap_mlp_speed[True-False] | 0.7630ms | 0.4914ms | 2.0349 KOps/s | 2.0655 KOps/s | $\color{#d91a1a}-1.48\\%$ | | test_vmap_mlp_speed[False-True] | 0.6016ms | 0.3987ms | 2.5081 KOps/s | 2.5003 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_vmap_mlp_speed[False-False] | 0.5918ms | 0.3979ms | 2.5130 KOps/s | 2.5136 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.0859ms | 0.5628ms | 1.7767 KOps/s | 1.7977 KOps/s | $\color{#d91a1a}-1.17\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8932ms | 0.5653ms | 1.7688 KOps/s | 1.7915 KOps/s | $\color{#d91a1a}-1.27\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.6652ms | 0.4601ms | 2.1734 KOps/s | 2.1808 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7507ms | 0.4612ms | 2.1683 KOps/s | 2.1777 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_to_module_speed[True] | 73.6143ms | 1.8164ms | 550.5355 Ops/s | 603.7795 Ops/s | $\textbf{\color{#d91a1a}-8.82\\%}$ | | test_to_module_speed[False] | 2.2824ms | 1.6636ms | 601.1061 Ops/s | 617.5449 Ops/s | $\color{#d91a1a}-2.66\\%$ | | test_tc_init | 73.6880μs | 31.0885μs | 32.1662 KOps/s | 35.3131 KOps/s | $\textbf{\color{#d91a1a}-8.91\\%}$ | | test_tc_init_nested | 0.1515ms | 62.3082μs | 16.0493 KOps/s | 17.8798 KOps/s | $\textbf{\color{#d91a1a}-10.24\\%}$ | | test_tc_first_layer_tensor | 3.9646μs | 0.6756μs | 1.4802 MOps/s | 1.4587 MOps/s | $\color{#35bf28}+1.48\\%$ | | test_tc_first_layer_nontensor | 2.1335μs | 0.6663μs | 1.5008 MOps/s | 1.5138 MOps/s | $\color{#d91a1a}-0.86\\%$ | | test_tc_second_layer_tensor | 16.8010μs | 1.8574μs | 538.3749 KOps/s | 552.1394 KOps/s | $\color{#d91a1a}-2.49\\%$ | | test_tc_second_layer_nontensor | 14.7980μs | 1.6459μs | 607.5569 KOps/s | 671.6788 KOps/s | $\textbf{\color{#d91a1a}-9.55\\%}$ | | test_unbind | 81.6286ms | 7.0817ms | 141.2092 Ops/s | 131.0674 Ops/s | $\textbf{\color{#35bf28}+7.74\\%}$ | | test_full_like | 19.4655ms | 11.7688ms | 84.9704 Ops/s | 83.9510 Ops/s | $\color{#35bf28}+1.21\\%$ | | test_zeros_like | 14.4214ms | 5.7658ms | 173.4372 Ops/s | 172.1134 Ops/s | $\color{#35bf28}+0.77\\%$ | | test_ones_like | 6.9768ms | 6.1388ms | 162.8991 Ops/s | 157.7602 Ops/s | $\color{#35bf28}+3.26\\%$ | | test_clone | 8.2621ms | 7.8583ms | 127.2545 Ops/s | 128.0410 Ops/s | $\color{#d91a1a}-0.61\\%$ | | test_squeeze | 59.9730μs | 14.3360μs | 69.7547 KOps/s | 70.8139 KOps/s | $\color{#d91a1a}-1.50\\%$ | | test_unsqueeze | 0.1953ms | 59.5149μs | 16.8025 KOps/s | 16.2912 KOps/s | $\color{#35bf28}+3.14\\%$ | | test_split | 0.1699ms | 0.1124ms | 8.8995 KOps/s | 8.8785 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_permute | 0.1969ms | 0.1249ms | 8.0053 KOps/s | 7.8086 KOps/s | $\color{#35bf28}+2.52\\%$ | | test_stack | 28.3265ms | 22.0195ms | 45.4143 Ops/s | 45.6400 Ops/s | $\color{#d91a1a}-0.49\\%$ | | test_cat | 26.7834ms | 21.9120ms | 45.6372 Ops/s | 45.3475 Ops/s | $\color{#35bf28}+0.64\\%$ |
github-actions[bot] commented 4 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}28$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 87.5220μs | 12.8023μs | 78.1109 KOps/s | 79.5164 KOps/s | $\color{#d91a1a}-1.77\\%$ | | test_plain_set_stack_nested | 29.1800μs | 12.8068μs | 78.0835 KOps/s | 80.5393 KOps/s | $\color{#d91a1a}-3.05\\%$ | | test_plain_set_nested_inplace | 38.2010μs | 14.2822μs | 70.0174 KOps/s | 73.8896 KOps/s | $\textbf{\color{#d91a1a}-5.24\\%}$ | | test_plain_set_stack_nested_inplace | 36.5410μs | 14.1789μs | 70.5272 KOps/s | 73.2504 KOps/s | $\color{#d91a1a}-3.72\\%$ | | test_items | 21.4010μs | 4.6703μs | 214.1182 KOps/s | 211.2732 KOps/s | $\color{#35bf28}+1.35\\%$ | | test_items_nested | 0.3717ms | 0.3368ms | 2.9691 KOps/s | 2.9730 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_items_nested_locked | 0.3725ms | 0.3367ms | 2.9700 KOps/s | 2.9240 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_items_nested_leaf | 0.1084ms | 82.5074μs | 12.1201 KOps/s | 11.9790 KOps/s | $\color{#35bf28}+1.18\\%$ | | test_items_stack_nested | 0.4305ms | 0.3379ms | 2.9592 KOps/s | 2.8994 KOps/s | $\color{#35bf28}+2.06\\%$ | | test_items_stack_nested_leaf | 0.1267ms | 81.7209μs | 12.2368 KOps/s | 11.8661 KOps/s | $\color{#35bf28}+3.12\\%$ | | test_items_stack_nested_locked | 0.3665ms | 0.3371ms | 2.9666 KOps/s | 2.9057 KOps/s | $\color{#35bf28}+2.10\\%$ | | test_keys | 16.8700μs | 4.3639μs | 229.1531 KOps/s | 226.7030 KOps/s | $\color{#35bf28}+1.08\\%$ | | test_keys_nested | 96.9120μs | 69.3988μs | 14.4095 KOps/s | 14.8087 KOps/s | $\color{#d91a1a}-2.70\\%$ | | test_keys_nested_locked | 2.0538ms | 74.1114μs | 13.4932 KOps/s | 13.2865 KOps/s | $\color{#35bf28}+1.56\\%$ | | test_keys_nested_leaf | 0.1034ms | 58.9823μs | 16.9542 KOps/s | 16.8471 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_keys_stack_nested | 90.2820μs | 66.2229μs | 15.1005 KOps/s | 14.7450 KOps/s | $\color{#35bf28}+2.41\\%$ | | test_keys_stack_nested_leaf | 82.9720μs | 58.8196μs | 17.0011 KOps/s | 17.2909 KOps/s | $\color{#d91a1a}-1.68\\%$ | | test_keys_stack_nested_locked | 91.8220μs | 72.2604μs | 13.8388 KOps/s | 13.5146 KOps/s | $\color{#35bf28}+2.40\\%$ | | test_values | 7.5537μs | 1.8051μs | 553.9879 KOps/s | 548.5517 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_values_nested | 56.8020μs | 35.6511μs | 28.0496 KOps/s | 28.0581 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_values_nested_locked | 57.4810μs | 37.4235μs | 26.7212 KOps/s | 26.3341 KOps/s | $\color{#35bf28}+1.47\\%$ | | test_values_nested_leaf | 45.3510μs | 31.3722μs | 31.8753 KOps/s | 31.5837 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_values_stack_nested | 52.9220μs | 35.9871μs | 27.7877 KOps/s | 27.6820 KOps/s | $\color{#35bf28}+0.38\\%$ | | test_values_stack_nested_leaf | 50.7710μs | 31.8197μs | 31.4271 KOps/s | 30.9699 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_values_stack_nested_locked | 57.0810μs | 38.0167μs | 26.3042 KOps/s | 25.9711 KOps/s | $\color{#35bf28}+1.28\\%$ | | test_membership | 13.9700μs | 0.8400μs | 1.1905 MOps/s | 1.4410 MOps/s | $\textbf{\color{#d91a1a}-17.38\\%}$ | | test_membership_nested | 18.3600μs | 2.5543μs | 391.4910 KOps/s | 390.9004 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_membership_nested_leaf | 16.8900μs | 2.5395μs | 393.7802 KOps/s | 387.9449 KOps/s | $\color{#35bf28}+1.50\\%$ | | test_membership_stacked_nested | 23.9800μs | 2.5572μs | 391.0500 KOps/s | 394.5407 KOps/s | $\color{#d91a1a}-0.88\\%$ | | test_membership_stacked_nested_leaf | 16.4300μs | 2.5985μs | 384.8401 KOps/s | 394.7629 KOps/s | $\color{#d91a1a}-2.51\\%$ | | test_membership_nested_last | 16.9310μs | 3.1010μs | 322.4784 KOps/s | 324.2483 KOps/s | $\color{#d91a1a}-0.55\\%$ | | test_membership_nested_leaf_last | 25.4410μs | 3.0604μs | 326.7544 KOps/s | 325.9315 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_membership_stacked_nested_last | 34.2110μs | 9.6962μs | 103.1335 KOps/s | 260.5245 KOps/s | $\textbf{\color{#d91a1a}-60.41\\%}$ | | test_membership_stacked_nested_leaf_last | 28.4210μs | 9.6774μs | 103.3339 KOps/s | 259.0101 KOps/s | $\textbf{\color{#d91a1a}-60.10\\%}$ | | test_nested_getleaf | 31.0510μs | 8.4048μs | 118.9796 KOps/s | 119.5559 KOps/s | $\color{#d91a1a}-0.48\\%$ | | test_nested_get | 26.7810μs | 7.8611μs | 127.2093 KOps/s | 126.3257 KOps/s | $\color{#35bf28}+0.70\\%$ | | test_stacked_getleaf | 25.1500μs | 8.3891μs | 119.2027 KOps/s | 119.6654 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_stacked_get | 24.4310μs | 7.8258μs | 127.7817 KOps/s | 127.7921 KOps/s | $-0.01\\%$ | | test_nested_getitemleaf | 25.8800μs | 8.5601μs | 116.8217 KOps/s | 116.8927 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_nested_getitem | 61.7310μs | 8.0296μs | 124.5399 KOps/s | 124.2295 KOps/s | $\color{#35bf28}+0.25\\%$ | | test_stacked_getitemleaf | 31.8300μs | 8.5362μs | 117.1482 KOps/s | 116.1361 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_stacked_getitem | 20.5100μs | 8.0694μs | 123.9252 KOps/s | 124.1048 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_lock_nested | 57.8097ms | 0.4204ms | 2.3789 KOps/s | 2.4627 KOps/s | $\color{#d91a1a}-3.40\\%$ | | test_lock_stack_nested | 0.3427ms | 0.3057ms | 3.2717 KOps/s | 3.2412 KOps/s | $\color{#35bf28}+0.94\\%$ | | test_unlock_nested | 60.0262ms | 0.4232ms | 2.3632 KOps/s | 2.4357 KOps/s | $\color{#d91a1a}-2.98\\%$ | | test_unlock_stack_nested | 0.3517ms | 0.3140ms | 3.1850 KOps/s | 3.1645 KOps/s | $\color{#35bf28}+0.65\\%$ | | test_flatten_speed | 0.3605ms | 0.1019ms | 9.8149 KOps/s | 9.8066 KOps/s | $\color{#35bf28}+0.09\\%$ | | test_unflatten_speed | 0.3745ms | 0.2937ms | 3.4046 KOps/s | 3.3957 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_common_ops | 1.0675ms | 0.6130ms | 1.6313 KOps/s | 1.7492 KOps/s | $\textbf{\color{#d91a1a}-6.74\\%}$ | | test_creation | 14.4500μs | 1.6394μs | 609.9814 KOps/s | 620.1143 KOps/s | $\color{#d91a1a}-1.63\\%$ | | test_creation_empty | 22.8800μs | 8.6274μs | 115.9104 KOps/s | 131.1975 KOps/s | $\textbf{\color{#d91a1a}-11.65\\%}$ | | test_creation_nested_1 | 30.5600μs | 10.5084μs | 95.1623 KOps/s | 104.8683 KOps/s | $\textbf{\color{#d91a1a}-9.26\\%}$ | | test_creation_nested_2 | 26.7800μs | 12.6177μs | 79.2536 KOps/s | 85.9652 KOps/s | $\textbf{\color{#d91a1a}-7.81\\%}$ | | test_clone | 66.8220μs | 12.3254μs | 81.1329 KOps/s | 83.3710 KOps/s | $\color{#d91a1a}-2.68\\%$ | | test_getitem[int] | 30.8110μs | 11.5169μs | 86.8288 KOps/s | 91.1464 KOps/s | $\color{#d91a1a}-4.74\\%$ | | test_getitem[slice_int] | 62.2420μs | 22.1884μs | 45.0685 KOps/s | 47.3698 KOps/s | $\color{#d91a1a}-4.86\\%$ | | test_getitem[range] | 68.4010μs | 49.1430μs | 20.3488 KOps/s | 19.8928 KOps/s | $\color{#35bf28}+2.29\\%$ | | test_getitem[tuple] | 43.7310μs | 19.5926μs | 51.0396 KOps/s | 52.1331 KOps/s | $\color{#d91a1a}-2.10\\%$ | | test_getitem[list] | 0.1317ms | 35.9698μs | 27.8011 KOps/s | 29.2378 KOps/s | $\color{#d91a1a}-4.91\\%$ | | test_setitem_dim[int] | 62.3710μs | 31.0892μs | 32.1655 KOps/s | 33.8965 KOps/s | $\textbf{\color{#d91a1a}-5.11\\%}$ | | test_setitem_dim[slice_int] | 84.2620μs | 52.2250μs | 19.1479 KOps/s | 19.8416 KOps/s | $\color{#d91a1a}-3.50\\%$ | | test_setitem_dim[range] | 0.1003ms | 69.7116μs | 14.3448 KOps/s | 14.7635 KOps/s | $\color{#d91a1a}-2.84\\%$ | | test_setitem_dim[tuple] | 67.4510μs | 45.4694μs | 21.9928 KOps/s | 22.7867 KOps/s | $\color{#d91a1a}-3.48\\%$ | | test_setitem | 38.1910μs | 17.1573μs | 58.2844 KOps/s | 61.5869 KOps/s | $\textbf{\color{#d91a1a}-5.36\\%}$ | | test_set | 52.0620μs | 16.7957μs | 59.5390 KOps/s | 62.3723 KOps/s | $\color{#d91a1a}-4.54\\%$ | | test_set_shared | 1.6987ms | 0.1021ms | 9.7912 KOps/s | 10.0963 KOps/s | $\color{#d91a1a}-3.02\\%$ | | test_update | 86.4420μs | 19.7633μs | 50.5989 KOps/s | 55.2623 KOps/s | $\textbf{\color{#d91a1a}-8.44\\%}$ | | test_update_nested | 73.3120μs | 24.8417μs | 40.2548 KOps/s | 42.4132 KOps/s | $\textbf{\color{#d91a1a}-5.09\\%}$ | | test_update__nested | 50.9410μs | 22.7385μs | 43.9784 KOps/s | 44.2592 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_set_nested | 74.0020μs | 18.0084μs | 55.5297 KOps/s | 58.6926 KOps/s | $\textbf{\color{#d91a1a}-5.39\\%}$ | | test_set_nested_new | 67.6210μs | 20.9563μs | 47.7184 KOps/s | 50.0010 KOps/s | $\color{#d91a1a}-4.57\\%$ | | test_select | 66.9910μs | 33.4100μs | 29.9311 KOps/s | 29.6588 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_select_nested | 0.9297ms | 55.1038μs | 18.1476 KOps/s | 17.9937 KOps/s | $\color{#35bf28}+0.85\\%$ | | test_exclude_nested | 0.1323ms | 0.1104ms | 9.0578 KOps/s | 9.0208 KOps/s | $\color{#35bf28}+0.41\\%$ | | test_empty[True] | 0.3937ms | 0.3441ms | 2.9065 KOps/s | 2.8700 KOps/s | $\color{#35bf28}+1.27\\%$ | | test_empty[False] | 3.0151μs | 0.9358μs | 1.0686 MOps/s | 1.0340 MOps/s | $\color{#35bf28}+3.34\\%$ | | test_to | 0.1025ms | 76.9930μs | 12.9882 KOps/s | 12.9204 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_to_nonblocking | 89.5620μs | 60.9912μs | 16.3958 KOps/s | 15.2256 KOps/s | $\textbf{\color{#35bf28}+7.69\\%}$ | | test_unbind_speed | 0.3069ms | 0.2729ms | 3.6646 KOps/s | 3.7104 KOps/s | $\color{#d91a1a}-1.23\\%$ | | test_unbind_speed_stack0 | 0.3040ms | 0.2699ms | 3.7049 KOps/s | 3.7117 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_unbind_speed_stack1 | 75.2053ms | 0.8093ms | 1.2357 KOps/s | 1.2283 KOps/s | $\color{#35bf28}+0.60\\%$ | | test_split | 74.7313ms | 1.8115ms | 552.0395 Ops/s | 585.3615 Ops/s | $\textbf{\color{#d91a1a}-5.69\\%}$ | | test_chunk | 1.7376ms | 1.6812ms | 594.7974 Ops/s | 632.4595 Ops/s | $\textbf{\color{#d91a1a}-5.95\\%}$ | | test_creation[device0] | 0.1104ms | 59.4827μs | 16.8116 KOps/s | 17.2986 KOps/s | $\color{#d91a1a}-2.82\\%$ | | test_creation_from_tensor | 0.1300ms | 55.5165μs | 18.0127 KOps/s | 18.4992 KOps/s | $\color{#d91a1a}-2.63\\%$ | | test_add_one[memmap_tensor0] | 98.2820μs | 7.3204μs | 136.6039 KOps/s | 140.8750 KOps/s | $\color{#d91a1a}-3.03\\%$ | | test_contiguous[memmap_tensor0] | 24.5810μs | 0.6566μs | 1.5230 MOps/s | 1.4953 MOps/s | $\color{#35bf28}+1.85\\%$ | | test_stack[memmap_tensor0] | 46.6310μs | 5.6625μs | 176.5991 KOps/s | 203.6556 KOps/s | $\textbf{\color{#d91a1a}-13.29\\%}$ | | test_memmaptd_index | 1.1673ms | 0.3113ms | 3.2121 KOps/s | 3.4602 KOps/s | $\textbf{\color{#d91a1a}-7.17\\%}$ | | test_memmaptd_index_astensor | 0.6912ms | 0.3839ms | 2.6049 KOps/s | 2.5530 KOps/s | $\color{#35bf28}+2.03\\%$ | | test_memmaptd_index_op | 0.9606ms | 0.7017ms | 1.4252 KOps/s | 1.5305 KOps/s | $\textbf{\color{#d91a1a}-6.88\\%}$ | | test_serialize_model | 0.1061s | 0.1023s | 9.7716 Ops/s | 8.6348 Ops/s | $\textbf{\color{#35bf28}+13.17\\%}$ | | test_serialize_model_pickle | 1.3546s | 1.2358s | 0.8092 Ops/s | 0.8082 Ops/s | $\color{#35bf28}+0.13\\%$ | | test_serialize_weights | 0.1768s | 0.1082s | 9.2422 Ops/s | 8.7898 Ops/s | $\textbf{\color{#35bf28}+5.15\\%}$ | | test_serialize_weights_returnearly | 0.3029s | 0.1079s | 9.2688 Ops/s | 10.0837 Ops/s | $\textbf{\color{#d91a1a}-8.08\\%}$ | | test_serialize_weights_pickle | 1.3557s | 1.2480s | 0.8013 Ops/s | 0.8010 Ops/s | $\color{#35bf28}+0.03\\%$ | | test_reshape_pytree | 49.2110μs | 27.6701μs | 36.1401 KOps/s | 38.6931 KOps/s | $\textbf{\color{#d91a1a}-6.60\\%}$ | | test_reshape_td | 61.0210μs | 33.3350μs | 29.9985 KOps/s | 31.7889 KOps/s | $\textbf{\color{#d91a1a}-5.63\\%}$ | | test_view_pytree | 44.0310μs | 27.3235μs | 36.5985 KOps/s | 38.6034 KOps/s | $\textbf{\color{#d91a1a}-5.19\\%}$ | | test_view_td | 61.9310μs | 38.3145μs | 26.0998 KOps/s | 27.3027 KOps/s | $\color{#d91a1a}-4.41\\%$ | | test_unbind_pytree | 49.9510μs | 33.0302μs | 30.2753 KOps/s | 31.2149 KOps/s | $\color{#d91a1a}-3.01\\%$ | | test_unbind_td | 0.4433ms | 42.1747μs | 23.7109 KOps/s | 22.9616 KOps/s | $\color{#35bf28}+3.26\\%$ | | test_split_pytree | 61.2320μs | 36.2210μs | 27.6083 KOps/s | 29.3036 KOps/s | $\textbf{\color{#d91a1a}-5.79\\%}$ | | test_split_td | 0.1127ms | 43.0764μs | 23.2146 KOps/s | 25.5578 KOps/s | $\textbf{\color{#d91a1a}-9.17\\%}$ | | test_add_pytree | 70.3820μs | 39.5575μs | 25.2797 KOps/s | 26.1169 KOps/s | $\color{#d91a1a}-3.21\\%$ | | test_add_td | 78.8920μs | 53.8155μs | 18.5820 KOps/s | 19.0353 KOps/s | $\color{#d91a1a}-2.38\\%$ | | test_distributed | 1.8195ms | 85.9707μs | 11.6319 KOps/s | 13.6805 KOps/s | $\textbf{\color{#d91a1a}-14.97\\%}$ | | test_tdmodule | 44.0910μs | 15.0152μs | 66.5994 KOps/s | 68.6932 KOps/s | $\color{#d91a1a}-3.05\\%$ | | test_tdmodule_dispatch | 49.4510μs | 29.5622μs | 33.8270 KOps/s | 36.3974 KOps/s | $\textbf{\color{#d91a1a}-7.06\\%}$ | | test_tdseq | 32.3710μs | 16.6628μs | 60.0140 KOps/s | 61.7432 KOps/s | $\color{#d91a1a}-2.80\\%$ | | test_tdseq_dispatch | 53.4910μs | 32.1934μs | 31.0623 KOps/s | 32.1156 KOps/s | $\color{#d91a1a}-3.28\\%$ | | test_instantiation_functorch | 1.5034ms | 1.4418ms | 693.5560 Ops/s | 712.8085 Ops/s | $\color{#d91a1a}-2.70\\%$ | | test_instantiation_td | 1.5004ms | 1.0004ms | 999.5900 Ops/s | 1.0254 KOps/s | $\color{#d91a1a}-2.51\\%$ | | test_exec_functorch | 0.1753ms | 0.1515ms | 6.5988 KOps/s | 6.8129 KOps/s | $\color{#d91a1a}-3.14\\%$ | | test_exec_functional_call | 0.1813ms | 0.1446ms | 6.9137 KOps/s | 6.9902 KOps/s | $\color{#d91a1a}-1.09\\%$ | | test_exec_td | 0.1834ms | 0.1436ms | 6.9646 KOps/s | 7.0066 KOps/s | $\color{#d91a1a}-0.60\\%$ | | test_exec_td_decorator | 0.5355ms | 0.2140ms | 4.6720 KOps/s | 4.6785 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_vmap_mlp_speed[True-True] | 0.6374ms | 0.5824ms | 1.7170 KOps/s | 1.6704 KOps/s | $\color{#35bf28}+2.79\\%$ | | test_vmap_mlp_speed[True-False] | 0.6613ms | 0.5803ms | 1.7232 KOps/s | 1.6559 KOps/s | $\color{#35bf28}+4.07\\%$ | | test_vmap_mlp_speed[False-True] | 0.5693ms | 0.5097ms | 1.9619 KOps/s | 1.8690 KOps/s | $\color{#35bf28}+4.97\\%$ | | test_vmap_mlp_speed[False-False] | 0.5569ms | 0.5098ms | 1.9614 KOps/s | 1.8936 KOps/s | $\color{#35bf28}+3.58\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.0885ms | 0.6450ms | 1.5503 KOps/s | 1.4691 KOps/s | $\textbf{\color{#35bf28}+5.53\\%}$ | | test_vmap_mlp_speed_decorator[True-False] | 0.7662ms | 0.6436ms | 1.5538 KOps/s | 1.5620 KOps/s | $\color{#d91a1a}-0.52\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7182ms | 0.5695ms | 1.7558 KOps/s | 1.7588 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7267ms | 0.5665ms | 1.7652 KOps/s | 1.7594 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_vmap_transformer_speed[True-True] | 7.8220ms | 7.6796ms | 130.2149 Ops/s | 130.4374 Ops/s | $\color{#d91a1a}-0.17\\%$ | | test_vmap_transformer_speed[True-False] | 8.1165ms | 7.7405ms | 129.1902 Ops/s | 131.0419 Ops/s | $\color{#d91a1a}-1.41\\%$ | | test_vmap_transformer_speed[False-True] | 7.7088ms | 7.6076ms | 131.4467 Ops/s | 126.2359 Ops/s | $\color{#35bf28}+4.13\\%$ | | test_vmap_transformer_speed[False-False] | 7.7216ms | 7.5961ms | 131.6459 Ops/s | 127.6240 Ops/s | $\color{#35bf28}+3.15\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 18.7468ms | 18.6074ms | 53.7421 Ops/s | 52.3667 Ops/s | $\color{#35bf28}+2.63\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 18.8321ms | 18.6696ms | 53.5630 Ops/s | 52.5522 Ops/s | $\color{#35bf28}+1.92\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 19.4536ms | 18.5460ms | 53.9198 Ops/s | 52.4877 Ops/s | $\color{#35bf28}+2.73\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 20.1042ms | 18.7696ms | 53.2776 Ops/s | 53.0530 Ops/s | $\color{#35bf28}+0.42\\%$ | | test_to_module_speed[True] | 1.7542ms | 1.4968ms | 668.0759 Ops/s | 654.8441 Ops/s | $\color{#35bf28}+2.02\\%$ | | test_to_module_speed[False] | 1.6729ms | 1.4815ms | 674.9688 Ops/s | 660.7338 Ops/s | $\color{#35bf28}+2.15\\%$ | | test_tc_init | 48.4120μs | 24.6867μs | 40.5077 KOps/s | 43.7161 KOps/s | $\textbf{\color{#d91a1a}-7.34\\%}$ | | test_tc_init_nested | 83.1820μs | 45.8276μs | 21.8209 KOps/s | 21.0027 KOps/s | $\color{#35bf28}+3.90\\%$ | | test_tc_first_layer_tensor | 5.1586μs | 0.3565μs | 2.8053 MOps/s | 2.7839 MOps/s | $\color{#35bf28}+0.77\\%$ | | test_tc_first_layer_nontensor | 15.3003μs | 0.3836μs | 2.6069 MOps/s | 2.5991 MOps/s | $\color{#35bf28}+0.30\\%$ | | test_tc_second_layer_tensor | 22.3800μs | 1.0637μs | 940.1427 KOps/s | 942.9432 KOps/s | $\color{#d91a1a}-0.30\\%$ | | test_tc_second_layer_nontensor | 34.0325μs | 0.8144μs | 1.2279 MOps/s | 1.2014 MOps/s | $\color{#35bf28}+2.21\\%$ | | test_unbind | 0.1052s | 7.9783ms | 125.3393 Ops/s | 129.8900 Ops/s | $\color{#d91a1a}-3.50\\%$ | | test_full_like | 11.2983ms | 11.0747ms | 90.2962 Ops/s | 76.9203 Ops/s | $\textbf{\color{#35bf28}+17.39\\%}$ | | test_zeros_like | 8.4412ms | 7.9443ms | 125.8757 Ops/s | 127.9243 Ops/s | $\color{#d91a1a}-1.60\\%$ | | test_ones_like | 8.2371ms | 7.9255ms | 126.1747 Ops/s | 127.8955 Ops/s | $\color{#d91a1a}-1.35\\%$ | | test_clone | 9.4028ms | 9.2478ms | 108.1337 Ops/s | 108.3867 Ops/s | $\color{#d91a1a}-0.23\\%$ | | test_squeeze | 60.5520μs | 11.3446μs | 88.1479 KOps/s | 91.6767 KOps/s | $\color{#d91a1a}-3.85\\%$ | | test_unsqueeze | 97.2220μs | 53.1403μs | 18.8181 KOps/s | 19.0047 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_split | 0.1590ms | 0.1013ms | 9.8670 KOps/s | 10.1907 KOps/s | $\color{#d91a1a}-3.18\\%$ | | test_permute | 0.1604ms | 0.1176ms | 8.5000 KOps/s | 9.0512 KOps/s | $\textbf{\color{#d91a1a}-6.09\\%}$ | | test_stack | 26.9449ms | 26.7960ms | 37.3190 Ops/s | 37.5533 Ops/s | $\color{#d91a1a}-0.62\\%$ | | test_cat | 27.1108ms | 26.7450ms | 37.3902 Ops/s | 37.6083 Ops/s | $\color{#d91a1a}-0.58\\%$ |