pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
832 stars 74 forks source link

[Test] Test FC of memmap save and load #838

Closed vmoens closed 4 months ago

vmoens commented 4 months ago

cc @albanD @mikaylagawarecki

github-actions[bot] commented 4 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 43.4710μs | 17.0594μs | 58.6188 KOps/s | 57.3411 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_plain_set_stack_nested | 34.8850μs | 17.1263μs | 58.3896 KOps/s | 57.2677 KOps/s | $\color{#35bf28}+1.96\\%$ | | test_plain_set_nested_inplace | 50.5150μs | 19.3412μs | 51.7030 KOps/s | 51.2798 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_plain_set_stack_nested_inplace | 40.5960μs | 19.2143μs | 52.0446 KOps/s | 51.2606 KOps/s | $\color{#35bf28}+1.53\\%$ | | test_items | 31.6990μs | 2.5205μs | 396.7390 KOps/s | 354.8811 KOps/s | $\textbf{\color{#35bf28}+11.79\\%}$ | | test_items_nested | 0.3638ms | 0.2665ms | 3.7526 KOps/s | 3.7641 KOps/s | $\color{#d91a1a}-0.30\\%$ | | test_items_nested_locked | 1.3320ms | 0.2663ms | 3.7548 KOps/s | 3.7621 KOps/s | $\color{#d91a1a}-0.19\\%$ | | test_items_nested_leaf | 0.1207ms | 77.1216μs | 12.9665 KOps/s | 13.0017 KOps/s | $\color{#d91a1a}-0.27\\%$ | | test_items_stack_nested | 1.3198ms | 0.2663ms | 3.7546 KOps/s | 3.6729 KOps/s | $\color{#35bf28}+2.22\\%$ | | test_items_stack_nested_leaf | 0.1086ms | 74.6851μs | 13.3896 KOps/s | 12.7250 KOps/s | $\textbf{\color{#35bf28}+5.22\\%}$ | | test_items_stack_nested_locked | 0.3152ms | 0.2678ms | 3.7340 KOps/s | 3.7427 KOps/s | $\color{#d91a1a}-0.23\\%$ | | test_keys | 38.4620μs | 3.9852μs | 250.9270 KOps/s | 258.1891 KOps/s | $\color{#d91a1a}-2.81\\%$ | | test_keys_nested | 0.2054ms | 0.1384ms | 7.2258 KOps/s | 7.2689 KOps/s | $\color{#d91a1a}-0.59\\%$ | | test_keys_nested_locked | 0.6434ms | 0.1444ms | 6.9268 KOps/s | 7.0292 KOps/s | $\color{#d91a1a}-1.46\\%$ | | test_keys_nested_leaf | 0.2046ms | 0.1169ms | 8.5551 KOps/s | 8.5420 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_keys_stack_nested | 0.2321ms | 0.1347ms | 7.4219 KOps/s | 7.2468 KOps/s | $\color{#35bf28}+2.42\\%$ | | test_keys_stack_nested_leaf | 0.2106ms | 0.1148ms | 8.7079 KOps/s | 8.5272 KOps/s | $\color{#35bf28}+2.12\\%$ | | test_keys_stack_nested_locked | 0.2430ms | 0.1399ms | 7.1468 KOps/s | 7.0278 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_values | 6.6400μs | 1.1989μs | 834.1038 KOps/s | 853.1381 KOps/s | $\color{#d91a1a}-2.23\\%$ | | test_values_nested | 0.1075ms | 50.4280μs | 19.8303 KOps/s | 19.8064 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_values_nested_locked | 0.1024ms | 50.5779μs | 19.7715 KOps/s | 19.7934 KOps/s | $\color{#d91a1a}-0.11\\%$ | | test_values_nested_leaf | 93.5550μs | 45.0523μs | 22.1964 KOps/s | 21.7826 KOps/s | $\color{#35bf28}+1.90\\%$ | | test_values_stack_nested | 95.5380μs | 51.2788μs | 19.5012 KOps/s | 19.4369 KOps/s | $\color{#35bf28}+0.33\\%$ | | test_values_stack_nested_leaf | 78.4770μs | 44.4893μs | 22.4773 KOps/s | 21.6323 KOps/s | $\color{#35bf28}+3.91\\%$ | | test_values_stack_nested_locked | 97.1220μs | 51.1343μs | 19.5563 KOps/s | 19.4941 KOps/s | $\color{#35bf28}+0.32\\%$ | | test_membership | 17.9030μs | 1.3554μs | 737.7849 KOps/s | 726.6856 KOps/s | $\color{#35bf28}+1.53\\%$ | | test_membership_nested | 27.0410μs | 3.5125μs | 284.6979 KOps/s | 285.8424 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_membership_nested_leaf | 27.6920μs | 3.5204μs | 284.0611 KOps/s | 289.0514 KOps/s | $\color{#d91a1a}-1.73\\%$ | | test_membership_stacked_nested | 28.5730μs | 3.5072μs | 285.1307 KOps/s | 270.4604 KOps/s | $\textbf{\color{#35bf28}+5.42\\%}$ | | test_membership_stacked_nested_leaf | 33.7230μs | 3.5300μs | 283.2897 KOps/s | 288.3471 KOps/s | $\color{#d91a1a}-1.75\\%$ | | test_membership_nested_last | 22.0710μs | 4.3176μs | 231.6101 KOps/s | 240.4191 KOps/s | $\color{#d91a1a}-3.66\\%$ | | test_membership_nested_leaf_last | 27.7210μs | 4.3818μs | 228.2158 KOps/s | 240.0077 KOps/s | $\color{#d91a1a}-4.91\\%$ | | test_membership_stacked_nested_last | 36.1570μs | 13.2491μs | 75.4770 KOps/s | 176.9485 KOps/s | $\textbf{\color{#d91a1a}-57.35\\%}$ | | test_membership_stacked_nested_leaf_last | 48.7720μs | 13.3677μs | 74.8069 KOps/s | 176.9615 KOps/s | $\textbf{\color{#d91a1a}-57.73\\%}$ | | test_nested_getleaf | 40.3660μs | 10.4918μs | 95.3124 KOps/s | 95.9531 KOps/s | $\color{#d91a1a}-0.67\\%$ | | test_nested_get | 33.1320μs | 9.8278μs | 101.7517 KOps/s | 101.3346 KOps/s | $\color{#35bf28}+0.41\\%$ | | test_stacked_getleaf | 45.1940μs | 10.3020μs | 97.0683 KOps/s | 98.6953 KOps/s | $\color{#d91a1a}-1.65\\%$ | | test_stacked_get | 42.2590μs | 9.8757μs | 101.2591 KOps/s | 101.7492 KOps/s | $\color{#d91a1a}-0.48\\%$ | | test_nested_getitemleaf | 32.7110μs | 10.9638μs | 91.2091 KOps/s | 91.7966 KOps/s | $\color{#d91a1a}-0.64\\%$ | | test_nested_getitem | 51.6460μs | 10.1932μs | 98.1044 KOps/s | 99.1415 KOps/s | $\color{#d91a1a}-1.05\\%$ | | test_stacked_getitemleaf | 41.5180μs | 11.0541μs | 90.4639 KOps/s | 92.6848 KOps/s | $\color{#d91a1a}-2.40\\%$ | | test_stacked_getitem | 40.6460μs | 10.1703μs | 98.3251 KOps/s | 99.0096 KOps/s | $\color{#d91a1a}-0.69\\%$ | | test_lock_nested | 48.1291ms | 0.3905ms | 2.5611 KOps/s | 2.9680 KOps/s | $\textbf{\color{#d91a1a}-13.71\\%}$ | | test_lock_stack_nested | 0.4399ms | 0.3002ms | 3.3311 KOps/s | 3.2819 KOps/s | $\color{#35bf28}+1.50\\%$ | | test_unlock_nested | 0.6835ms | 0.3467ms | 2.8847 KOps/s | 2.9315 KOps/s | $\color{#d91a1a}-1.60\\%$ | | test_unlock_stack_nested | 0.5127ms | 0.3080ms | 3.2465 KOps/s | 3.1761 KOps/s | $\color{#35bf28}+2.21\\%$ | | test_flatten_speed | 0.2000ms | 95.4170μs | 10.4803 KOps/s | 10.5683 KOps/s | $\color{#d91a1a}-0.83\\%$ | | test_unflatten_speed | 0.7375ms | 0.4096ms | 2.4416 KOps/s | 2.4796 KOps/s | $\color{#d91a1a}-1.53\\%$ | | test_common_ops | 1.3907ms | 0.7248ms | 1.3797 KOps/s | 1.3393 KOps/s | $\color{#35bf28}+3.02\\%$ | | test_creation | 27.4510μs | 1.9211μs | 520.5285 KOps/s | 526.5301 KOps/s | $\color{#d91a1a}-1.14\\%$ | | test_creation_empty | 43.7620μs | 10.7706μs | 92.8456 KOps/s | 83.5730 KOps/s | $\textbf{\color{#35bf28}+11.10\\%}$ | | test_creation_nested_1 | 40.1850μs | 14.4679μs | 69.1184 KOps/s | 67.7273 KOps/s | $\color{#35bf28}+2.05\\%$ | | test_creation_nested_2 | 43.5010μs | 16.8503μs | 59.3460 KOps/s | 56.4475 KOps/s | $\textbf{\color{#35bf28}+5.13\\%}$ | | test_clone | 0.1168ms | 12.9785μs | 77.0505 KOps/s | 74.4293 KOps/s | $\color{#35bf28}+3.52\\%$ | | test_getitem[int] | 37.6200μs | 11.3736μs | 87.9232 KOps/s | 88.9313 KOps/s | $\color{#d91a1a}-1.13\\%$ | | test_getitem[slice_int] | 62.2860μs | 22.5054μs | 44.4337 KOps/s | 45.8788 KOps/s | $\color{#d91a1a}-3.15\\%$ | | test_getitem[range] | 79.7590μs | 58.2210μs | 17.1759 KOps/s | 17.0789 KOps/s | $\color{#35bf28}+0.57\\%$ | | test_getitem[tuple] | 58.8590μs | 18.8327μs | 53.0992 KOps/s | 54.2927 KOps/s | $\color{#d91a1a}-2.20\\%$ | | test_getitem[list] | 98.0830μs | 41.5202μs | 24.0847 KOps/s | 25.4038 KOps/s | $\textbf{\color{#d91a1a}-5.19\\%}$ | | test_setitem_dim[int] | 76.1720μs | 35.6438μs | 28.0554 KOps/s | 28.0215 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_setitem_dim[slice_int] | 0.1211ms | 62.0052μs | 16.1277 KOps/s | 16.1331 KOps/s | $\color{#d91a1a}-0.03\\%$ | | test_setitem_dim[range] | 0.1458ms | 85.4114μs | 11.7080 KOps/s | 11.8244 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_setitem_dim[tuple] | 93.5050μs | 51.5975μs | 19.3808 KOps/s | 19.7581 KOps/s | $\color{#d91a1a}-1.91\\%$ | | test_setitem | 59.8310μs | 20.1622μs | 49.5978 KOps/s | 47.7055 KOps/s | $\color{#35bf28}+3.97\\%$ | | test_set | 98.2840μs | 20.0691μs | 49.8278 KOps/s | 49.3222 KOps/s | $\color{#35bf28}+1.03\\%$ | | test_set_shared | 4.4174ms | 0.1443ms | 6.9319 KOps/s | 6.9269 KOps/s | $\color{#35bf28}+0.07\\%$ | | test_update | 0.1355ms | 22.2057μs | 45.0334 KOps/s | 42.8739 KOps/s | $\textbf{\color{#35bf28}+5.04\\%}$ | | test_update_nested | 84.8690μs | 30.5375μs | 32.7466 KOps/s | 31.4022 KOps/s | $\color{#35bf28}+4.28\\%$ | | test_update__nested | 71.0130μs | 24.8446μs | 40.2502 KOps/s | 40.1205 KOps/s | $\color{#35bf28}+0.32\\%$ | | test_set_nested | 80.3300μs | 21.6326μs | 46.2266 KOps/s | 45.0977 KOps/s | $\color{#35bf28}+2.50\\%$ | | test_set_nested_new | 62.1670μs | 26.0023μs | 38.4581 KOps/s | 37.9121 KOps/s | $\color{#35bf28}+1.44\\%$ | | test_select | 83.0750μs | 40.9500μs | 24.4200 KOps/s | 23.3287 KOps/s | $\color{#35bf28}+4.68\\%$ | | test_select_nested | 0.1200ms | 60.0734μs | 16.6463 KOps/s | 16.6483 KOps/s | $\color{#d91a1a}-0.01\\%$ | | test_exclude_nested | 0.2269ms | 0.1205ms | 8.3013 KOps/s | 8.3725 KOps/s | $\color{#d91a1a}-0.85\\%$ | | test_empty[True] | 0.4628ms | 0.3966ms | 2.5214 KOps/s | 2.4816 KOps/s | $\color{#35bf28}+1.61\\%$ | | test_empty[False] | 8.5835μs | 1.1907μs | 839.8118 KOps/s | 854.1641 KOps/s | $\color{#d91a1a}-1.68\\%$ | | test_unbind_speed | 0.6085ms | 0.2595ms | 3.8537 KOps/s | 3.8909 KOps/s | $\color{#d91a1a}-0.96\\%$ | | test_unbind_speed_stack0 | 0.3744ms | 0.2455ms | 4.0731 KOps/s | 4.0684 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_unbind_speed_stack1 | 65.6205ms | 0.7181ms | 1.3926 KOps/s | 1.3853 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_split | 66.0872ms | 1.6066ms | 622.4135 Ops/s | 634.3904 Ops/s | $\color{#d91a1a}-1.89\\%$ | | test_chunk | 65.9907ms | 1.6144ms | 619.4343 Ops/s | 637.0369 Ops/s | $\color{#d91a1a}-2.76\\%$ | | test_creation[device0] | 0.1817ms | 85.0102μs | 11.7633 KOps/s | 12.0407 KOps/s | $\color{#d91a1a}-2.30\\%$ | | test_creation_from_tensor | 0.1572ms | 83.5652μs | 11.9667 KOps/s | 11.9337 KOps/s | $\color{#35bf28}+0.28\\%$ | | test_add_one[memmap_tensor0] | 60.8640μs | 5.4200μs | 184.5007 KOps/s | 181.1126 KOps/s | $\color{#35bf28}+1.87\\%$ | | test_contiguous[memmap_tensor0] | 19.7170μs | 0.6526μs | 1.5324 MOps/s | 1.5885 MOps/s | $\color{#d91a1a}-3.53\\%$ | | test_stack[memmap_tensor0] | 30.8780μs | 3.5704μs | 280.0767 KOps/s | 280.0839 KOps/s | $-0.00\\%$ | | test_memmaptd_index | 0.4297ms | 0.2537ms | 3.9410 KOps/s | 3.9601 KOps/s | $\color{#d91a1a}-0.48\\%$ | | test_memmaptd_index_astensor | 0.7070ms | 0.3275ms | 3.0539 KOps/s | 3.0722 KOps/s | $\color{#d91a1a}-0.60\\%$ | | test_memmaptd_index_op | 0.8830ms | 0.6113ms | 1.6360 KOps/s | 1.6176 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_serialize_model | 0.1719s | 0.1155s | 8.6607 Ops/s | 9.6516 Ops/s | $\textbf{\color{#d91a1a}-10.27\\%}$ | | test_serialize_model_pickle | 0.4581s | 0.3771s | 2.6519 Ops/s | 2.6450 Ops/s | $\color{#35bf28}+0.26\\%$ | | test_serialize_weights | 0.1612s | 0.1087s | 9.2016 Ops/s | 8.7409 Ops/s | $\textbf{\color{#35bf28}+5.27\\%}$ | | test_serialize_weights_returnearly | 0.1869s | 0.1329s | 7.5235 Ops/s | 7.4734 Ops/s | $\color{#35bf28}+0.67\\%$ | | test_serialize_weights_pickle | 1.1023s | 0.5675s | 1.7621 Ops/s | 2.2882 Ops/s | $\textbf{\color{#d91a1a}-23.00\\%}$ | | test_serialize_weights_filesystem | 99.9610ms | 90.6937ms | 11.0261 Ops/s | 10.5845 Ops/s | $\color{#35bf28}+4.17\\%$ | | test_serialize_model_filesystem | 0.1570s | 99.9561ms | 10.0044 Ops/s | 10.6310 Ops/s | $\textbf{\color{#d91a1a}-5.89\\%}$ | | test_reshape_pytree | 66.6540μs | 25.5592μs | 39.1248 KOps/s | 39.2262 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_reshape_td | 73.5180μs | 34.1517μs | 29.2812 KOps/s | 29.2237 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_view_pytree | 61.0750μs | 25.2073μs | 39.6710 KOps/s | 39.5893 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_view_td | 84.0370μs | 39.3780μs | 25.3949 KOps/s | 25.5546 KOps/s | $\color{#d91a1a}-0.63\\%$ | | test_unbind_pytree | 62.4160μs | 29.5784μs | 33.8084 KOps/s | 34.2347 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_unbind_td | 0.3614ms | 38.2772μs | 26.1252 KOps/s | 26.4765 KOps/s | $\color{#d91a1a}-1.33\\%$ | | test_split_pytree | 73.6680μs | 29.7306μs | 33.6354 KOps/s | 34.6579 KOps/s | $\color{#d91a1a}-2.95\\%$ | | test_split_td | 0.1257ms | 40.7107μs | 24.5636 KOps/s | 25.0274 KOps/s | $\color{#d91a1a}-1.85\\%$ | | test_add_pytree | 94.1670μs | 34.2262μs | 29.2174 KOps/s | 28.6335 KOps/s | $\color{#35bf28}+2.04\\%$ | | test_add_td | 0.1189ms | 55.2772μs | 18.0906 KOps/s | 17.2919 KOps/s | $\color{#35bf28}+4.62\\%$ | | test_distributed | 0.2137ms | 0.1005ms | 9.9508 KOps/s | 9.7971 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_tdmodule | 75.1300μs | 18.2240μs | 54.8727 KOps/s | 55.5942 KOps/s | $\color{#d91a1a}-1.30\\%$ | | test_tdmodule_dispatch | 56.2460μs | 35.5376μs | 28.1392 KOps/s | 28.1787 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_tdseq | 37.9210μs | 20.3673μs | 49.0984 KOps/s | 49.1017 KOps/s | $-0.01\\%$ | | test_tdseq_dispatch | 65.2820μs | 39.9554μs | 25.0279 KOps/s | 24.7913 KOps/s | $\color{#35bf28}+0.95\\%$ | | test_instantiation_functorch | 2.9221ms | 1.3101ms | 763.2771 Ops/s | 755.4591 Ops/s | $\color{#35bf28}+1.03\\%$ | | test_instantiation_td | 1.5861ms | 0.9931ms | 1.0069 KOps/s | 991.4953 Ops/s | $\color{#35bf28}+1.56\\%$ | | test_exec_functorch | 0.2922ms | 0.1609ms | 6.2137 KOps/s | 5.8080 KOps/s | $\textbf{\color{#35bf28}+6.98\\%}$ | | test_exec_functional_call | 0.2386ms | 0.1475ms | 6.7805 KOps/s | 6.7574 KOps/s | $\color{#35bf28}+0.34\\%$ | | test_exec_td | 0.2617ms | 0.1448ms | 6.9074 KOps/s | 6.8128 KOps/s | $\color{#35bf28}+1.39\\%$ | | test_exec_td_decorator | 0.7862ms | 0.2185ms | 4.5761 KOps/s | 4.5580 KOps/s | $\color{#35bf28}+0.40\\%$ | | test_vmap_mlp_speed[True-True] | 0.6834ms | 0.4795ms | 2.0856 KOps/s | 2.0651 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_vmap_mlp_speed[True-False] | 0.8384ms | 0.4781ms | 2.0918 KOps/s | 2.0670 KOps/s | $\color{#35bf28}+1.20\\%$ | | test_vmap_mlp_speed[False-True] | 0.5752ms | 0.3901ms | 2.5632 KOps/s | 2.5330 KOps/s | $\color{#35bf28}+1.19\\%$ | | test_vmap_mlp_speed[False-False] | 0.6606ms | 0.3911ms | 2.5572 KOps/s | 2.5381 KOps/s | $\color{#35bf28}+0.75\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.2504ms | 0.5519ms | 1.8120 KOps/s | 1.7992 KOps/s | $\color{#35bf28}+0.71\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.7127ms | 0.5487ms | 1.8224 KOps/s | 1.8067 KOps/s | $\color{#35bf28}+0.87\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7648ms | 0.4530ms | 2.2076 KOps/s | 2.1712 KOps/s | $\color{#35bf28}+1.68\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.6340ms | 0.4501ms | 2.2219 KOps/s | 2.1759 KOps/s | $\color{#35bf28}+2.11\\%$ | | test_to_module_speed[True] | 2.3673ms | 1.6986ms | 588.7081 Ops/s | 591.6907 Ops/s | $\color{#d91a1a}-0.50\\%$ | | test_to_module_speed[False] | 2.6605ms | 1.6708ms | 598.5023 Ops/s | 601.3196 Ops/s | $\color{#d91a1a}-0.47\\%$ | | test_tc_init | 54.8430μs | 29.6277μs | 33.7522 KOps/s | 31.3156 KOps/s | $\textbf{\color{#35bf28}+7.78\\%}$ | | test_tc_init_nested | 94.8680μs | 60.3847μs | 16.5605 KOps/s | 15.4642 KOps/s | $\textbf{\color{#35bf28}+7.09\\%}$ | | test_tc_first_layer_tensor | 3.9574μs | 0.7081μs | 1.4122 MOps/s | 1.5148 MOps/s | $\textbf{\color{#d91a1a}-6.77\\%}$ | | test_tc_first_layer_nontensor | 2.0208μs | 0.6936μs | 1.4416 MOps/s | 1.5258 MOps/s | $\textbf{\color{#d91a1a}-5.51\\%}$ | | test_tc_second_layer_tensor | 26.0790μs | 1.9129μs | 522.7705 KOps/s | 527.3316 KOps/s | $\color{#d91a1a}-0.86\\%$ | | test_tc_second_layer_nontensor | 10.4197μs | 1.5709μs | 636.5798 KOps/s | 592.9092 KOps/s | $\textbf{\color{#35bf28}+7.37\\%}$ | | test_unbind | 83.2872ms | 7.4674ms | 133.9149 Ops/s | 143.7652 Ops/s | $\textbf{\color{#d91a1a}-6.85\\%}$ | | test_full_like | 16.4957ms | 9.8400ms | 101.6264 Ops/s | 90.5487 Ops/s | $\textbf{\color{#35bf28}+12.23\\%}$ | | test_zeros_like | 6.8495ms | 5.7046ms | 175.2959 Ops/s | 185.6292 Ops/s | $\textbf{\color{#d91a1a}-5.57\\%}$ | | test_ones_like | 6.8348ms | 6.1186ms | 163.4358 Ops/s | 170.5415 Ops/s | $\color{#d91a1a}-4.17\\%$ | | test_clone | 12.9777ms | 7.4805ms | 133.6811 Ops/s | 135.6000 Ops/s | $\color{#d91a1a}-1.42\\%$ | | test_squeeze | 58.0280μs | 13.8725μs | 72.0851 KOps/s | 74.8288 KOps/s | $\color{#d91a1a}-3.67\\%$ | | test_unsqueeze | 0.1148ms | 58.6714μs | 17.0441 KOps/s | 16.7453 KOps/s | $\color{#35bf28}+1.78\\%$ | | test_split | 0.2471ms | 0.1110ms | 9.0112 KOps/s | 9.0333 KOps/s | $\color{#d91a1a}-0.24\\%$ | | test_permute | 0.2830ms | 0.1255ms | 7.9651 KOps/s | 8.0439 KOps/s | $\color{#d91a1a}-0.98\\%$ | | test_stack | 27.4364ms | 21.4038ms | 46.7206 Ops/s | 48.2336 Ops/s | $\color{#d91a1a}-3.14\\%$ | | test_cat | 27.6381ms | 21.2419ms | 47.0767 Ops/s | 48.2536 Ops/s | $\color{#d91a1a}-2.44\\%$ |
github-actions[bot] commented 4 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}32$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 92.9650μs | 13.0932μs | 76.3758 KOps/s | 86.3623 KOps/s | $\textbf{\color{#d91a1a}-11.56\\%}$ | | test_plain_set_stack_nested | 27.4920μs | 13.2341μs | 75.5626 KOps/s | 85.3203 KOps/s | $\textbf{\color{#d91a1a}-11.44\\%}$ | | test_plain_set_nested_inplace | 42.8520μs | 14.4596μs | 69.1580 KOps/s | 77.8146 KOps/s | $\textbf{\color{#d91a1a}-11.12\\%}$ | | test_plain_set_stack_nested_inplace | 43.7320μs | 14.4516μs | 69.1966 KOps/s | 77.0017 KOps/s | $\textbf{\color{#d91a1a}-10.14\\%}$ | | test_items | 17.9010μs | 4.7403μs | 210.9550 KOps/s | 210.2899 KOps/s | $\color{#35bf28}+0.32\\%$ | | test_items_nested | 0.3861ms | 0.3364ms | 2.9725 KOps/s | 2.9588 KOps/s | $\color{#35bf28}+0.47\\%$ | | test_items_nested_locked | 0.3893ms | 0.3381ms | 2.9575 KOps/s | 2.8685 KOps/s | $\color{#35bf28}+3.10\\%$ | | test_items_nested_leaf | 0.1034ms | 82.7954μs | 12.0780 KOps/s | 12.1639 KOps/s | $\color{#d91a1a}-0.71\\%$ | | test_items_stack_nested | 0.3960ms | 0.3394ms | 2.9466 KOps/s | 2.9453 KOps/s | $\color{#35bf28}+0.04\\%$ | | test_items_stack_nested_leaf | 0.1124ms | 83.4532μs | 11.9828 KOps/s | 12.0488 KOps/s | $\color{#d91a1a}-0.55\\%$ | | test_items_stack_nested_locked | 0.4018ms | 0.3384ms | 2.9555 KOps/s | 2.9342 KOps/s | $\color{#35bf28}+0.72\\%$ | | test_keys | 24.0910μs | 4.3448μs | 230.1616 KOps/s | 230.4462 KOps/s | $\color{#d91a1a}-0.12\\%$ | | test_keys_nested | 0.1381ms | 66.5911μs | 15.0170 KOps/s | 14.8638 KOps/s | $\color{#35bf28}+1.03\\%$ | | test_keys_nested_locked | 2.1003ms | 71.9051μs | 13.9072 KOps/s | 13.7240 KOps/s | $\color{#35bf28}+1.34\\%$ | | test_keys_nested_leaf | 79.1840μs | 57.0243μs | 17.5364 KOps/s | 17.2243 KOps/s | $\color{#35bf28}+1.81\\%$ | | test_keys_stack_nested | 83.5040μs | 66.3810μs | 15.0645 KOps/s | 14.9407 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_keys_stack_nested_leaf | 80.5930μs | 57.0767μs | 17.5203 KOps/s | 17.2674 KOps/s | $\color{#35bf28}+1.46\\%$ | | test_keys_stack_nested_locked | 95.7050μs | 70.7104μs | 14.1422 KOps/s | 13.9336 KOps/s | $\color{#35bf28}+1.50\\%$ | | test_values | 10.9703μs | 1.8001μs | 555.5261 KOps/s | 551.7487 KOps/s | $\color{#35bf28}+0.68\\%$ | | test_values_nested | 60.3130μs | 34.9169μs | 28.6394 KOps/s | 28.5660 KOps/s | $\color{#35bf28}+0.26\\%$ | | test_values_nested_locked | 59.1430μs | 36.6780μs | 27.2643 KOps/s | 27.2087 KOps/s | $\color{#35bf28}+0.20\\%$ | | test_values_nested_leaf | 50.2020μs | 30.9976μs | 32.2606 KOps/s | 32.1670 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_values_stack_nested | 67.7830μs | 35.7431μs | 27.9774 KOps/s | 28.4599 KOps/s | $\color{#d91a1a}-1.70\\%$ | | test_values_stack_nested_leaf | 61.2530μs | 31.7708μs | 31.4754 KOps/s | 31.8481 KOps/s | $\color{#d91a1a}-1.17\\%$ | | test_values_stack_nested_locked | 68.9340μs | 37.4985μs | 26.6677 KOps/s | 27.1659 KOps/s | $\color{#d91a1a}-1.83\\%$ | | test_membership | 3.4216μs | 0.7316μs | 1.3669 MOps/s | 1.3341 MOps/s | $\color{#35bf28}+2.46\\%$ | | test_membership_nested | 31.0610μs | 2.5830μs | 387.1512 KOps/s | 390.2969 KOps/s | $\color{#d91a1a}-0.81\\%$ | | test_membership_nested_leaf | 24.9410μs | 2.5568μs | 391.1156 KOps/s | 393.1903 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_membership_stacked_nested | 23.5210μs | 2.6302μs | 380.2046 KOps/s | 392.8956 KOps/s | $\color{#d91a1a}-3.23\\%$ | | test_membership_stacked_nested_leaf | 32.3510μs | 2.5847μs | 386.8987 KOps/s | 394.0558 KOps/s | $\color{#d91a1a}-1.82\\%$ | | test_membership_nested_last | 21.7810μs | 3.0958μs | 323.0148 KOps/s | 324.1556 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_membership_nested_leaf_last | 33.5510μs | 3.0849μs | 324.1592 KOps/s | 327.7962 KOps/s | $\color{#d91a1a}-1.11\\%$ | | test_membership_stacked_nested_last | 20.8110μs | 3.8639μs | 258.8081 KOps/s | 325.7642 KOps/s | $\textbf{\color{#d91a1a}-20.55\\%}$ | | test_membership_stacked_nested_leaf_last | 34.2710μs | 3.9054μs | 256.0561 KOps/s | 325.6503 KOps/s | $\textbf{\color{#d91a1a}-21.37\\%}$ | | test_nested_getleaf | 25.4110μs | 8.3679μs | 119.5049 KOps/s | 119.4621 KOps/s | $\color{#35bf28}+0.04\\%$ | | test_nested_get | 35.3530μs | 7.8858μs | 126.8099 KOps/s | 127.3604 KOps/s | $\color{#d91a1a}-0.43\\%$ | | test_stacked_getleaf | 37.4720μs | 8.3997μs | 119.0516 KOps/s | 119.5355 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_stacked_get | 24.2410μs | 7.9312μs | 126.0840 KOps/s | 126.7606 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_nested_getitemleaf | 39.1420μs | 8.5290μs | 117.2472 KOps/s | 117.1221 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_nested_getitem | 61.6130μs | 8.0470μs | 124.2701 KOps/s | 124.6169 KOps/s | $\color{#d91a1a}-0.28\\%$ | | test_stacked_getitemleaf | 31.8720μs | 8.5596μs | 116.8285 KOps/s | 117.3783 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_stacked_getitem | 36.4820μs | 8.0433μs | 124.3277 KOps/s | 124.1710 KOps/s | $\color{#35bf28}+0.13\\%$ | | test_lock_nested | 59.0627ms | 0.4030ms | 2.4815 KOps/s | 2.4889 KOps/s | $\color{#d91a1a}-0.30\\%$ | | test_lock_stack_nested | 0.3316ms | 0.3001ms | 3.3318 KOps/s | 3.3317 KOps/s | $+0.01\\%$ | | test_unlock_nested | 60.9244ms | 0.4055ms | 2.4662 KOps/s | 2.4554 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_unlock_stack_nested | 0.3567ms | 0.3078ms | 3.2489 KOps/s | 3.2422 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_flatten_speed | 0.3541ms | 0.1009ms | 9.9093 KOps/s | 9.8782 KOps/s | $\color{#35bf28}+0.32\\%$ | | test_unflatten_speed | 0.3329ms | 0.2876ms | 3.4770 KOps/s | 3.4577 KOps/s | $\color{#35bf28}+0.56\\%$ | | test_common_ops | 1.0761ms | 0.5881ms | 1.7003 KOps/s | 1.8943 KOps/s | $\textbf{\color{#d91a1a}-10.24\\%}$ | | test_creation | 13.6900μs | 1.6386μs | 610.2794 KOps/s | 618.5483 KOps/s | $\color{#d91a1a}-1.34\\%$ | | test_creation_empty | 24.5020μs | 9.3211μs | 107.2834 KOps/s | 161.9405 KOps/s | $\textbf{\color{#d91a1a}-33.75\\%}$ | | test_creation_nested_1 | 41.7020μs | 11.1521μs | 89.6691 KOps/s | 124.7634 KOps/s | $\textbf{\color{#d91a1a}-28.13\\%}$ | | test_creation_nested_2 | 32.6520μs | 13.2558μs | 75.4387 KOps/s | 99.3411 KOps/s | $\textbf{\color{#d91a1a}-24.06\\%}$ | | test_clone | 36.5520μs | 11.9919μs | 83.3898 KOps/s | 84.6307 KOps/s | $\color{#d91a1a}-1.47\\%$ | | test_getitem[int] | 56.1830μs | 10.8433μs | 92.2230 KOps/s | 94.0944 KOps/s | $\color{#d91a1a}-1.99\\%$ | | test_getitem[slice_int] | 49.1130μs | 20.7275μs | 48.2452 KOps/s | 48.6733 KOps/s | $\color{#d91a1a}-0.88\\%$ | | test_getitem[range] | 66.6630μs | 51.6824μs | 19.3489 KOps/s | 21.4628 KOps/s | $\textbf{\color{#d91a1a}-9.85\\%}$ | | test_getitem[tuple] | 47.8620μs | 18.4220μs | 54.2830 KOps/s | 54.2612 KOps/s | $\color{#35bf28}+0.04\\%$ | | test_getitem[list] | 0.1373ms | 33.9020μs | 29.4968 KOps/s | 31.3258 KOps/s | $\textbf{\color{#d91a1a}-5.84\\%}$ | | test_setitem_dim[int] | 48.8020μs | 29.2027μs | 34.2434 KOps/s | 37.5194 KOps/s | $\textbf{\color{#d91a1a}-8.73\\%}$ | | test_setitem_dim[slice_int] | 67.0430μs | 49.2385μs | 20.3093 KOps/s | 21.3829 KOps/s | $\textbf{\color{#d91a1a}-5.02\\%}$ | | test_setitem_dim[range] | 0.1043ms | 66.3057μs | 15.0817 KOps/s | 15.9065 KOps/s | $\textbf{\color{#d91a1a}-5.19\\%}$ | | test_setitem_dim[tuple] | 60.7730μs | 43.1262μs | 23.1877 KOps/s | 24.2188 KOps/s | $\color{#d91a1a}-4.26\\%$ | | test_setitem | 40.4820μs | 16.8943μs | 59.1914 KOps/s | 67.2053 KOps/s | $\textbf{\color{#d91a1a}-11.92\\%}$ | | test_set | 47.7830μs | 16.1118μs | 62.0663 KOps/s | 68.8897 KOps/s | $\textbf{\color{#d91a1a}-9.90\\%}$ | | test_set_shared | 1.6509ms | 99.7247μs | 10.0276 KOps/s | 10.1543 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_update | 69.6430μs | 19.4223μs | 51.4872 KOps/s | 63.1994 KOps/s | $\textbf{\color{#d91a1a}-18.53\\%}$ | | test_update_nested | 68.5630μs | 24.4970μs | 40.8213 KOps/s | 47.7140 KOps/s | $\textbf{\color{#d91a1a}-14.45\\%}$ | | test_update__nested | 56.3320μs | 22.7084μs | 44.0366 KOps/s | 45.5721 KOps/s | $\color{#d91a1a}-3.37\\%$ | | test_set_nested | 60.9230μs | 17.4627μs | 57.2649 KOps/s | 64.6696 KOps/s | $\textbf{\color{#d91a1a}-11.45\\%}$ | | test_set_nested_new | 58.0030μs | 19.9904μs | 50.0241 KOps/s | 55.3885 KOps/s | $\textbf{\color{#d91a1a}-9.68\\%}$ | | test_select | 65.2430μs | 32.5956μs | 30.6789 KOps/s | 31.8084 KOps/s | $\color{#d91a1a}-3.55\\%$ | | test_select_nested | 0.9430ms | 55.0771μs | 18.1564 KOps/s | 18.4845 KOps/s | $\color{#d91a1a}-1.78\\%$ | | test_exclude_nested | 0.1440ms | 0.1102ms | 9.0776 KOps/s | 9.1086 KOps/s | $\color{#d91a1a}-0.34\\%$ | | test_empty[True] | 0.3909ms | 0.3433ms | 2.9129 KOps/s | 2.8803 KOps/s | $\color{#35bf28}+1.13\\%$ | | test_empty[False] | 3.2211μs | 0.9279μs | 1.0777 MOps/s | 1.0881 MOps/s | $\color{#d91a1a}-0.95\\%$ | | test_to | 0.1042ms | 78.3961μs | 12.7557 KOps/s | 12.8203 KOps/s | $\color{#d91a1a}-0.50\\%$ | | test_to_nonblocking | 95.4040μs | 62.1426μs | 16.0920 KOps/s | 16.2158 KOps/s | $\color{#d91a1a}-0.76\\%$ | | test_unbind_speed | 0.2952ms | 0.2633ms | 3.7981 KOps/s | 3.8514 KOps/s | $\color{#d91a1a}-1.38\\%$ | | test_unbind_speed_stack0 | 0.3229ms | 0.2667ms | 3.7495 KOps/s | 3.8279 KOps/s | $\color{#d91a1a}-2.05\\%$ | | test_unbind_speed_stack1 | 75.7494ms | 0.8108ms | 1.2333 KOps/s | 1.2300 KOps/s | $\color{#35bf28}+0.27\\%$ | | test_split | 76.0605ms | 1.6785ms | 595.7728 Ops/s | 605.0600 Ops/s | $\color{#d91a1a}-1.53\\%$ | | test_chunk | 75.9622ms | 1.6696ms | 598.9603 Ops/s | 606.3781 Ops/s | $\color{#d91a1a}-1.22\\%$ | | test_creation[device0] | 0.1284ms | 57.0999μs | 17.5132 KOps/s | 16.8370 KOps/s | $\color{#35bf28}+4.02\\%$ | | test_creation_from_tensor | 0.1302ms | 52.9706μs | 18.8784 KOps/s | 17.5763 KOps/s | $\textbf{\color{#35bf28}+7.41\\%}$ | | test_add_one[memmap_tensor0] | 79.2040μs | 6.8524μs | 145.9340 KOps/s | 144.7500 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_contiguous[memmap_tensor0] | 10.5010μs | 0.6628μs | 1.5087 MOps/s | 1.4690 MOps/s | $\color{#35bf28}+2.71\\%$ | | test_stack[memmap_tensor0] | 27.7110μs | 4.7752μs | 209.4167 KOps/s | 214.1503 KOps/s | $\color{#d91a1a}-2.21\\%$ | | test_memmaptd_index | 1.0556ms | 0.2932ms | 3.4102 KOps/s | 3.4471 KOps/s | $\color{#d91a1a}-1.07\\%$ | | test_memmaptd_index_astensor | 0.7159ms | 0.3592ms | 2.7841 KOps/s | 2.7957 KOps/s | $\color{#d91a1a}-0.41\\%$ | | test_memmaptd_index_op | 0.9458ms | 0.6685ms | 1.4959 KOps/s | 1.6261 KOps/s | $\textbf{\color{#d91a1a}-8.00\\%}$ | | test_serialize_model | 0.1823s | 0.1100s | 9.0908 Ops/s | 9.6225 Ops/s | $\textbf{\color{#d91a1a}-5.53\\%}$ | | test_serialize_model_pickle | 1.3497s | 1.2349s | 0.8098 Ops/s | 0.8090 Ops/s | $\color{#35bf28}+0.09\\%$ | | test_serialize_weights | 0.1798s | 0.1078s | 9.2781 Ops/s | 8.8334 Ops/s | $\textbf{\color{#35bf28}+5.04\\%}$ | | test_serialize_weights_returnearly | 0.2899s | 0.1008s | 9.9244 Ops/s | 10.0721 Ops/s | $\color{#d91a1a}-1.47\\%$ | | test_serialize_weights_pickle | 1.3535s | 1.2483s | 0.8011 Ops/s | 0.8011 Ops/s | $-0.00\\%$ | | test_reshape_pytree | 0.2248ms | 26.0385μs | 38.4047 KOps/s | 38.5496 KOps/s | $\color{#d91a1a}-0.38\\%$ | | test_reshape_td | 57.0230μs | 31.0161μs | 32.2414 KOps/s | 32.3691 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_view_pytree | 90.1840μs | 25.8163μs | 38.7353 KOps/s | 38.8371 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_view_td | 0.2534ms | 36.7618μs | 27.2022 KOps/s | 26.6079 KOps/s | $\color{#35bf28}+2.23\\%$ | | test_unbind_pytree | 66.9540μs | 31.6038μs | 31.6418 KOps/s | 30.2366 KOps/s | $\color{#35bf28}+4.65\\%$ | | test_unbind_td | 0.5003ms | 42.6429μs | 23.4506 KOps/s | 24.6029 KOps/s | $\color{#d91a1a}-4.68\\%$ | | test_split_pytree | 0.1673ms | 35.3429μs | 28.2942 KOps/s | 28.8348 KOps/s | $\color{#d91a1a}-1.87\\%$ | | test_split_td | 0.2526ms | 40.9963μs | 24.3925 KOps/s | 25.9965 KOps/s | $\textbf{\color{#d91a1a}-6.17\\%}$ | | test_add_pytree | 66.5030μs | 37.3061μs | 26.8052 KOps/s | 26.6335 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_add_td | 0.2552ms | 49.1382μs | 20.3507 KOps/s | 20.4660 KOps/s | $\color{#d91a1a}-0.56\\%$ | | test_distributed | 2.4276ms | 69.5977μs | 14.3683 KOps/s | 14.3977 KOps/s | $\color{#d91a1a}-0.20\\%$ | | test_tdmodule | 30.6810μs | 14.6701μs | 68.1659 KOps/s | 73.8308 KOps/s | $\textbf{\color{#d91a1a}-7.67\\%}$ | | test_tdmodule_dispatch | 46.5320μs | 29.3253μs | 34.1003 KOps/s | 37.9724 KOps/s | $\textbf{\color{#d91a1a}-10.20\\%}$ | | test_tdseq | 26.6410μs | 16.6049μs | 60.2230 KOps/s | 66.2328 KOps/s | $\textbf{\color{#d91a1a}-9.07\\%}$ | | test_tdseq_dispatch | 56.7730μs | 32.8087μs | 30.4798 KOps/s | 34.5744 KOps/s | $\textbf{\color{#d91a1a}-11.84\\%}$ | | test_instantiation_functorch | 1.6297ms | 1.5246ms | 655.9267 Ops/s | 662.1924 Ops/s | $\color{#d91a1a}-0.95\\%$ | | test_instantiation_td | 1.5557ms | 1.0404ms | 961.1259 Ops/s | 972.6590 Ops/s | $\color{#d91a1a}-1.19\\%$ | | test_exec_functorch | 0.1947ms | 0.1493ms | 6.6991 KOps/s | 6.5931 KOps/s | $\color{#35bf28}+1.61\\%$ | | test_exec_functional_call | 0.2028ms | 0.1374ms | 7.2769 KOps/s | 7.2071 KOps/s | $\color{#35bf28}+0.97\\%$ | | test_exec_td | 0.2188ms | 0.1367ms | 7.3166 KOps/s | 7.3457 KOps/s | $\color{#d91a1a}-0.40\\%$ | | test_exec_td_decorator | 0.4574ms | 0.2072ms | 4.8254 KOps/s | 4.7495 KOps/s | $\color{#35bf28}+1.60\\%$ | | test_vmap_mlp_speed[True-True] | 0.6320ms | 0.5814ms | 1.7200 KOps/s | 1.7428 KOps/s | $\color{#d91a1a}-1.31\\%$ | | test_vmap_mlp_speed[True-False] | 0.6476ms | 0.5815ms | 1.7197 KOps/s | 1.7111 KOps/s | $\color{#35bf28}+0.50\\%$ | | test_vmap_mlp_speed[False-True] | 0.5623ms | 0.5137ms | 1.9468 KOps/s | 1.9185 KOps/s | $\color{#35bf28}+1.48\\%$ | | test_vmap_mlp_speed[False-False] | 0.5790ms | 0.5132ms | 1.9484 KOps/s | 1.8714 KOps/s | $\color{#35bf28}+4.11\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.0465ms | 0.6493ms | 1.5401 KOps/s | 1.5696 KOps/s | $\color{#d91a1a}-1.88\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.7818ms | 0.6458ms | 1.5485 KOps/s | 1.5727 KOps/s | $\color{#d91a1a}-1.54\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.7136ms | 0.5719ms | 1.7486 KOps/s | 1.7559 KOps/s | $\color{#d91a1a}-0.42\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7448ms | 0.5719ms | 1.7486 KOps/s | 1.7650 KOps/s | $\color{#d91a1a}-0.93\\%$ | | test_vmap_transformer_speed[True-True] | 7.7443ms | 7.6455ms | 130.7951 Ops/s | 130.9727 Ops/s | $\color{#d91a1a}-0.14\\%$ | | test_vmap_transformer_speed[True-False] | 7.7265ms | 7.6177ms | 131.2738 Ops/s | 126.0283 Ops/s | $\color{#35bf28}+4.16\\%$ | | test_vmap_transformer_speed[False-True] | 7.9649ms | 7.6196ms | 131.2399 Ops/s | 128.0850 Ops/s | $\color{#35bf28}+2.46\\%$ | | test_vmap_transformer_speed[False-False] | 7.6445ms | 7.5832ms | 131.8701 Ops/s | 130.0957 Ops/s | $\color{#35bf28}+1.36\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 18.7486ms | 18.6345ms | 53.6640 Ops/s | 53.6545 Ops/s | $\color{#35bf28}+0.02\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 18.6834ms | 18.6160ms | 53.7173 Ops/s | 53.8816 Ops/s | $\color{#d91a1a}-0.30\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 18.6612ms | 18.5324ms | 53.9596 Ops/s | 54.0617 Ops/s | $\color{#d91a1a}-0.19\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 19.0313ms | 18.5433ms | 53.9278 Ops/s | 54.2197 Ops/s | $\color{#d91a1a}-0.54\\%$ | | test_to_module_speed[True] | 1.6560ms | 1.5305ms | 653.3724 Ops/s | 651.4152 Ops/s | $\color{#35bf28}+0.30\\%$ | | test_to_module_speed[False] | 1.6276ms | 1.5033ms | 665.2084 Ops/s | 661.4320 Ops/s | $\color{#35bf28}+0.57\\%$ | | test_tc_init | 46.9720μs | 26.7016μs | 37.4509 KOps/s | 50.5623 KOps/s | $\textbf{\color{#d91a1a}-25.93\\%}$ | | test_tc_init_nested | 80.2550μs | 51.7701μs | 19.3162 KOps/s | 24.5863 KOps/s | $\textbf{\color{#d91a1a}-21.44\\%}$ | | test_tc_first_layer_tensor | 0.7618μs | 0.3638μs | 2.7486 MOps/s | 2.7470 MOps/s | $\color{#35bf28}+0.06\\%$ | | test_tc_first_layer_nontensor | 2.8752μs | 0.3945μs | 2.5351 MOps/s | 2.5429 MOps/s | $\color{#d91a1a}-0.31\\%$ | | test_tc_second_layer_tensor | 12.3510μs | 1.0679μs | 936.3972 KOps/s | 925.2412 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_tc_second_layer_nontensor | 1.6156μs | 0.8043μs | 1.2433 MOps/s | 1.1973 MOps/s | $\color{#35bf28}+3.85\\%$ | | test_unbind | 0.1117s | 6.8091ms | 146.8626 Ops/s | 195.9429 Ops/s | $\textbf{\color{#d91a1a}-25.05\\%}$ | | test_full_like | 13.5312ms | 13.1517ms | 76.0359 Ops/s | 89.0079 Ops/s | $\textbf{\color{#d91a1a}-14.57\\%}$ | | test_zeros_like | 8.3105ms | 7.8772ms | 126.9482 Ops/s | 126.6457 Ops/s | $\color{#35bf28}+0.24\\%$ | | test_ones_like | 8.3604ms | 7.9084ms | 126.4477 Ops/s | 126.6837 Ops/s | $\color{#d91a1a}-0.19\\%$ | | test_clone | 9.4950ms | 9.2655ms | 107.9270 Ops/s | 108.1171 Ops/s | $\color{#d91a1a}-0.18\\%$ | | test_squeeze | 64.5130μs | 10.9768μs | 91.1015 KOps/s | 90.2684 KOps/s | $\color{#35bf28}+0.92\\%$ | | test_unsqueeze | 0.1128ms | 51.2741μs | 19.5030 KOps/s | 18.9019 KOps/s | $\color{#35bf28}+3.18\\%$ | | test_split | 0.1376ms | 95.7752μs | 10.4411 KOps/s | 10.2145 KOps/s | $\color{#35bf28}+2.22\\%$ | | test_permute | 0.1405ms | 0.1102ms | 9.0754 KOps/s | 8.6941 KOps/s | $\color{#35bf28}+4.39\\%$ | | test_stack | 27.0567ms | 26.7147ms | 37.4326 Ops/s | 37.4338 Ops/s | $-0.00\\%$ | | test_cat | 27.0756ms | 26.6812ms | 37.4796 Ops/s | 37.3677 Ops/s | $\color{#35bf28}+0.30\\%$ |
vmoens commented 4 months ago

Good point @MateuszGuzek thanks!

vmoens commented 4 months ago

@MateuszGuzek upon reflection I wonder if that's a good idea. We want to avoid people cloning a repo and having some weird behaviour when they / someone else deletes the local copy of the repo...