pytorch / tensordict

TensorDict is a pytorch dedicated tensor container.
MIT License
832 stars 74 forks source link

[Refactor] Make all leaves in tensorclass part of `_tensordict`, except for NonTensorData #841

Closed vmoens closed 4 months ago

vmoens commented 4 months ago

This is bc-breaking in the following ways:

  1. non-tensor data are proper leaves in the tensorclass

    data = MyData(X=X, y=y, z="a string", batch_size=batch_size)
    assert "z" in data.keys() # used to break
  2. non-tensor data will be compared if a tensorclass is compared to another tc / td

    # Previously
    z = "a striing!"
    tensordict = TensorDict(
            {
                "X": X,
                "y": y,
            },
            batch_size=[3, 4],
    )
    data = MyData(X=X, y=y, z=z, batch_size=batch_size)
    assert (tensordict == data).all()
    # Now z needs to be part of the tensordict as it won't be ignored during comparison
    tensordict = TensorDict(
            {
                "X": X,
                "y": y,
            },
            batch_size=[3, 4],
    )
  3. Non-tensor data following comparison is not None

    data0 = MyData(X=X, y=y, z="a string", batch_size=batch_size)
    data1 = MyData(X=X, y=y, z="another string", batch_size=batch_size)
    (data0 == data1).z # used to be None, now a TD with boolean values
  4. when setting non-tensor values in-place will now return a ValueError, not RuntimeError

  5. This now works BUT it will convert any NonTensorData in data in a NonTensorStack (since values depend on their location in the batch):

    data0 = MyData(X=X, y=y, z="a string", batch_size=batch_size)
    data1 = MyData(X=X, y=y, z="another string", batch_size=batch_size)
    data0[:2] = data1[:2]
    data0.z # used to be a string, bc ignored by __setitem__, now a list
vmoens commented 4 months ago

@maximilianigl I think this will resolve #717 but the price to pay is some bc-breaking changes - hopefully for the better.

github-actions[bot] commented 4 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}25$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | ------------------------------------------ | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 45.6360μs | 16.8935μs | 59.1944 KOps/s | 61.1181 KOps/s | $\color{#d91a1a}-3.15\\%$ | | test_plain_set_stack_nested | 83.7960μs | 17.0410μs | 58.6821 KOps/s | 60.5688 KOps/s | $\color{#d91a1a}-3.12\\%$ | | test_plain_set_nested_inplace | 49.7830μs | 19.3593μs | 51.6547 KOps/s | 54.0189 KOps/s | $\color{#d91a1a}-4.38\\%$ | | test_plain_set_stack_nested_inplace | 50.6040μs | 19.4057μs | 51.5312 KOps/s | 54.3718 KOps/s | $\textbf{\color{#d91a1a}-5.22\\%}$ | | test_items | 23.1830μs | 2.5853μs | 386.8093 KOps/s | 384.1491 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_items_nested | 1.0495ms | 0.2742ms | 3.6470 KOps/s | 3.6430 KOps/s | $\color{#35bf28}+0.11\\%$ | | test_items_nested_locked | 0.5926ms | 0.2740ms | 3.6491 KOps/s | 3.6403 KOps/s | $\color{#35bf28}+0.24\\%$ | | test_items_nested_leaf | 0.1500ms | 79.0817μs | 12.6451 KOps/s | 12.7574 KOps/s | $\color{#d91a1a}-0.88\\%$ | | test_items_stack_nested | 0.5175ms | 0.2766ms | 3.6147 KOps/s | 3.6195 KOps/s | $\color{#d91a1a}-0.13\\%$ | | test_items_stack_nested_leaf | 0.1780ms | 80.9080μs | 12.3597 KOps/s | 12.5693 KOps/s | $\color{#d91a1a}-1.67\\%$ | | test_items_stack_nested_locked | 1.1009ms | 0.2757ms | 3.6265 KOps/s | 3.6289 KOps/s | $\color{#d91a1a}-0.07\\%$ | | test_keys | 23.8150μs | 3.9916μs | 250.5238 KOps/s | 255.9985 KOps/s | $\color{#d91a1a}-2.14\\%$ | | test_keys_nested | 0.2400ms | 0.1405ms | 7.1166 KOps/s | 7.2041 KOps/s | $\color{#d91a1a}-1.21\\%$ | | test_keys_nested_locked | 0.6964ms | 0.1456ms | 6.8672 KOps/s | 6.8671 KOps/s | $+0.00\\%$ | | test_keys_nested_leaf | 0.1940ms | 0.1192ms | 8.3903 KOps/s | 8.4299 KOps/s | $\color{#d91a1a}-0.47\\%$ | | test_keys_stack_nested | 0.2827ms | 0.1410ms | 7.0922 KOps/s | 7.2511 KOps/s | $\color{#d91a1a}-2.19\\%$ | | test_keys_stack_nested_leaf | 0.2555ms | 0.1185ms | 8.4390 KOps/s | 8.2965 KOps/s | $\color{#35bf28}+1.72\\%$ | | test_keys_stack_nested_locked | 0.2754ms | 0.1436ms | 6.9647 KOps/s | 6.9652 KOps/s | $-0.01\\%$ | | test_values | 9.4878μs | 1.3622μs | 734.1001 KOps/s | 870.7083 KOps/s | $\textbf{\color{#d91a1a}-15.69\\%}$ | | test_values_nested | 98.2430μs | 50.0145μs | 19.9942 KOps/s | 19.3721 KOps/s | $\color{#35bf28}+3.21\\%$ | | test_values_nested_locked | 0.1062ms | 50.3982μs | 19.8420 KOps/s | 19.0337 KOps/s | $\color{#35bf28}+4.25\\%$ | | test_values_nested_leaf | 88.9460μs | 45.4734μs | 21.9909 KOps/s | 21.7282 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_values_stack_nested | 0.1372ms | 51.9780μs | 19.2389 KOps/s | 19.2777 KOps/s | $\color{#d91a1a}-0.20\\%$ | | test_values_stack_nested_leaf | 0.1028ms | 45.5084μs | 21.9740 KOps/s | 21.8951 KOps/s | $\color{#35bf28}+0.36\\%$ | | test_values_stack_nested_locked | 0.1058ms | 51.5804μs | 19.3872 KOps/s | 19.3715 KOps/s | $\color{#35bf28}+0.08\\%$ | | test_membership | 30.2380μs | 1.3023μs | 767.8771 KOps/s | 729.0072 KOps/s | $\textbf{\color{#35bf28}+5.33\\%}$ | | test_membership_nested | 19.3970μs | 3.4466μs | 290.1395 KOps/s | 292.7133 KOps/s | $\color{#d91a1a}-0.88\\%$ | | test_membership_nested_leaf | 29.6750μs | 3.4708μs | 288.1194 KOps/s | 290.5534 KOps/s | $\color{#d91a1a}-0.84\\%$ | | test_membership_stacked_nested | 21.4400μs | 3.4546μs | 289.4730 KOps/s | 293.2454 KOps/s | $\color{#d91a1a}-1.29\\%$ | | test_membership_stacked_nested_leaf | 33.7730μs | 3.4890μs | 286.6125 KOps/s | 289.9980 KOps/s | $\color{#d91a1a}-1.17\\%$ | | test_membership_nested_last | 35.9970μs | 4.2339μs | 236.1911 KOps/s | 239.0085 KOps/s | $\color{#d91a1a}-1.18\\%$ | | test_membership_nested_leaf_last | 26.3100μs | 4.2409μs | 235.7986 KOps/s | 239.4739 KOps/s | $\color{#d91a1a}-1.53\\%$ | | test_membership_stacked_nested_last | 26.4790μs | 4.2031μs | 237.9190 KOps/s | 209.1378 KOps/s | $\textbf{\color{#35bf28}+13.76\\%}$ | | test_membership_stacked_nested_leaf_last | 45.8150μs | 4.2218μs | 236.8637 KOps/s | 205.4298 KOps/s | $\textbf{\color{#35bf28}+15.30\\%}$ | | test_nested_getleaf | 59.1310μs | 10.6830μs | 93.6070 KOps/s | 92.3243 KOps/s | $\color{#35bf28}+1.39\\%$ | | test_nested_get | 43.2810μs | 10.1189μs | 98.8249 KOps/s | 99.2740 KOps/s | $\color{#d91a1a}-0.45\\%$ | | test_stacked_getleaf | 95.7380μs | 10.6647μs | 93.7677 KOps/s | 91.9675 KOps/s | $\color{#35bf28}+1.96\\%$ | | test_stacked_get | 27.5220μs | 9.9428μs | 100.5757 KOps/s | 98.0253 KOps/s | $\color{#35bf28}+2.60\\%$ | | test_nested_getitemleaf | 51.5460μs | 11.3099μs | 88.4179 KOps/s | 87.2602 KOps/s | $\color{#35bf28}+1.33\\%$ | | test_nested_getitem | 44.7840μs | 10.4349μs | 95.8323 KOps/s | 95.2395 KOps/s | $\color{#35bf28}+0.62\\%$ | | test_stacked_getitemleaf | 34.0130μs | 11.1670μs | 89.5497 KOps/s | 91.6482 KOps/s | $\color{#d91a1a}-2.29\\%$ | | test_stacked_getitem | 36.8990μs | 10.2357μs | 97.6971 KOps/s | 96.0772 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_lock_nested | 0.9288ms | 0.3400ms | 2.9415 KOps/s | 2.9866 KOps/s | $\color{#d91a1a}-1.51\\%$ | | test_lock_stack_nested | 0.6067ms | 0.3041ms | 3.2883 KOps/s | 3.3463 KOps/s | $\color{#d91a1a}-1.73\\%$ | | test_unlock_nested | 0.7490ms | 0.3444ms | 2.9038 KOps/s | 2.9096 KOps/s | $\color{#d91a1a}-0.20\\%$ | | test_unlock_stack_nested | 0.4931ms | 0.3131ms | 3.1938 KOps/s | 3.2527 KOps/s | $\color{#d91a1a}-1.81\\%$ | | test_flatten_speed | 0.5934ms | 98.4019μs | 10.1624 KOps/s | 10.1982 KOps/s | $\color{#d91a1a}-0.35\\%$ | | test_unflatten_speed | 0.7085ms | 0.4107ms | 2.4348 KOps/s | 2.4728 KOps/s | $\color{#d91a1a}-1.54\\%$ | | test_common_ops | 3.0883ms | 0.7495ms | 1.3342 KOps/s | 1.4493 KOps/s | $\textbf{\color{#d91a1a}-7.94\\%}$ | | test_creation | 75.2210μs | 2.1124μs | 473.3913 KOps/s | 534.2178 KOps/s | $\textbf{\color{#d91a1a}-11.39\\%}$ | | test_creation_empty | 33.4320μs | 11.3941μs | 87.7647 KOps/s | 105.4938 KOps/s | $\textbf{\color{#d91a1a}-16.81\\%}$ | | test_creation_nested_1 | 51.8160μs | 14.1561μs | 70.6410 KOps/s | 81.9022 KOps/s | $\textbf{\color{#d91a1a}-13.75\\%}$ | | test_creation_nested_2 | 42.1290μs | 17.6398μs | 56.6901 KOps/s | 64.7164 KOps/s | $\textbf{\color{#d91a1a}-12.40\\%}$ | | test_clone | 93.4440μs | 12.8933μs | 77.5597 KOps/s | 75.8859 KOps/s | $\color{#35bf28}+2.21\\%$ | | test_getitem[int] | 26.3490μs | 11.4108μs | 87.6364 KOps/s | 89.5399 KOps/s | $\color{#d91a1a}-2.13\\%$ | | test_getitem[slice_int] | 57.0060μs | 22.9643μs | 43.5459 KOps/s | 44.4743 KOps/s | $\color{#d91a1a}-2.09\\%$ | | test_getitem[range] | 80.7610μs | 57.7378μs | 17.3197 KOps/s | 16.4877 KOps/s | $\textbf{\color{#35bf28}+5.05\\%}$ | | test_getitem[tuple] | 73.3760μs | 19.0318μs | 52.5435 KOps/s | 53.1897 KOps/s | $\color{#d91a1a}-1.21\\%$ | | test_getitem[list] | 0.1235ms | 40.1635μs | 24.8982 KOps/s | 24.2951 KOps/s | $\color{#35bf28}+2.48\\%$ | | test_setitem_dim[int] | 84.3970μs | 33.2525μs | 30.0729 KOps/s | 30.2901 KOps/s | $\color{#d91a1a}-0.72\\%$ | | test_setitem_dim[slice_int] | 0.1082ms | 60.7947μs | 16.4488 KOps/s | 16.8447 KOps/s | $\color{#d91a1a}-2.35\\%$ | | test_setitem_dim[range] | 0.1155ms | 81.5430μs | 12.2635 KOps/s | 11.9985 KOps/s | $\color{#35bf28}+2.21\\%$ | | test_setitem_dim[tuple] | 86.5910μs | 49.9772μs | 20.0091 KOps/s | 20.3549 KOps/s | $\color{#d91a1a}-1.70\\%$ | | test_setitem | 57.2970μs | 19.8162μs | 50.4637 KOps/s | 52.0902 KOps/s | $\color{#d91a1a}-3.12\\%$ | | test_set | 61.9960μs | 19.3842μs | 51.5885 KOps/s | 53.5790 KOps/s | $\color{#d91a1a}-3.71\\%$ | | test_set_shared | 1.5016ms | 0.1477ms | 6.7693 KOps/s | 7.1748 KOps/s | $\textbf{\color{#d91a1a}-5.65\\%}$ | | test_update | 0.1452ms | 22.3203μs | 44.8023 KOps/s | 46.7151 KOps/s | $\color{#d91a1a}-4.09\\%$ | | test_update_nested | 0.1438ms | 31.4715μs | 31.7748 KOps/s | 33.6567 KOps/s | $\textbf{\color{#d91a1a}-5.59\\%}$ | | test_update__nested | 96.0490μs | 24.8061μs | 40.3126 KOps/s | 40.1979 KOps/s | $\color{#35bf28}+0.29\\%$ | | test_set_nested | 70.6210μs | 21.4981μs | 46.5157 KOps/s | 48.0622 KOps/s | $\color{#d91a1a}-3.22\\%$ | | test_set_nested_new | 76.0120μs | 25.4554μs | 39.2845 KOps/s | 40.2032 KOps/s | $\color{#d91a1a}-2.29\\%$ | | test_select | 97.8020μs | 40.6874μs | 24.5776 KOps/s | 25.3403 KOps/s | $\color{#d91a1a}-3.01\\%$ | | test_select_nested | 0.1317ms | 58.7094μs | 17.0331 KOps/s | 17.2388 KOps/s | $\color{#d91a1a}-1.19\\%$ | | test_exclude_nested | 0.2217ms | 0.1194ms | 8.3741 KOps/s | 8.5322 KOps/s | $\color{#d91a1a}-1.85\\%$ | | test_empty[True] | 0.5916ms | 0.4016ms | 2.4898 KOps/s | 2.5212 KOps/s | $\color{#d91a1a}-1.25\\%$ | | test_empty[False] | 5.4160μs | 1.0360μs | 965.2148 KOps/s | 985.9476 KOps/s | $\color{#d91a1a}-2.10\\%$ | | test_unbind_speed | 0.4517ms | 0.2481ms | 4.0313 KOps/s | 4.0531 KOps/s | $\color{#d91a1a}-0.54\\%$ | | test_unbind_speed_stack0 | 0.4842ms | 0.2470ms | 4.0493 KOps/s | 4.1303 KOps/s | $\color{#d91a1a}-1.96\\%$ | | test_unbind_speed_stack1 | 73.5439ms | 0.7277ms | 1.3741 KOps/s | 1.4302 KOps/s | $\color{#d91a1a}-3.92\\%$ | | test_split | 73.1228ms | 1.6108ms | 620.8169 Ops/s | 630.5732 Ops/s | $\color{#d91a1a}-1.55\\%$ | | test_chunk | 73.2610ms | 1.6251ms | 615.3361 Ops/s | 628.4813 Ops/s | $\color{#d91a1a}-2.09\\%$ | | test_creation[device0] | 0.2602ms | 86.1911μs | 11.6021 KOps/s | 11.5879 KOps/s | $\color{#35bf28}+0.12\\%$ | | test_creation_from_tensor | 3.4010ms | 87.8761μs | 11.3797 KOps/s | 11.6607 KOps/s | $\color{#d91a1a}-2.41\\%$ | | test_add_one[memmap_tensor0] | 83.0850μs | 5.3408μs | 187.2386 KOps/s | 175.8883 KOps/s | $\textbf{\color{#35bf28}+6.45\\%}$ | | test_contiguous[memmap_tensor0] | 18.1340μs | 0.6428μs | 1.5557 MOps/s | 1.5389 MOps/s | $\color{#35bf28}+1.10\\%$ | | test_stack[memmap_tensor0] | 29.7960μs | 3.5845μs | 278.9798 KOps/s | 263.5372 KOps/s | $\textbf{\color{#35bf28}+5.86\\%}$ | | test_memmaptd_index | 1.1122ms | 0.2618ms | 3.8203 KOps/s | 3.9488 KOps/s | $\color{#d91a1a}-3.25\\%$ | | test_memmaptd_index_astensor | 0.6034ms | 0.3363ms | 2.9732 KOps/s | 3.0585 KOps/s | $\color{#d91a1a}-2.79\\%$ | | test_memmaptd_index_op | 1.0490ms | 0.6288ms | 1.5904 KOps/s | 1.6878 KOps/s | $\textbf{\color{#d91a1a}-5.77\\%}$ | | test_serialize_model | 0.1010s | 97.1051ms | 10.2981 Ops/s | 10.2794 Ops/s | $\color{#35bf28}+0.18\\%$ | | test_serialize_model_pickle | 0.4482s | 0.3809s | 2.6253 Ops/s | 2.6311 Ops/s | $\color{#d91a1a}-0.22\\%$ | | test_serialize_weights | 0.1027s | 95.1982ms | 10.5044 Ops/s | 9.6067 Ops/s | $\textbf{\color{#35bf28}+9.34\\%}$ | | test_serialize_weights_returnearly | 0.1830s | 0.1276s | 7.8397 Ops/s | 7.8795 Ops/s | $\color{#d91a1a}-0.50\\%$ | | test_serialize_weights_pickle | 0.9455s | 0.5424s | 1.8435 Ops/s | 2.4451 Ops/s | $\textbf{\color{#d91a1a}-24.60\\%}$ | | test_serialize_weights_filesystem | 98.7438ms | 93.8593ms | 10.6542 Ops/s | 9.7707 Ops/s | $\textbf{\color{#35bf28}+9.04\\%}$ | | test_serialize_model_filesystem | 0.1683s | 0.1023s | 9.7794 Ops/s | 10.5633 Ops/s | $\textbf{\color{#d91a1a}-7.42\\%}$ | | test_reshape_pytree | 70.2510μs | 26.1429μs | 38.2513 KOps/s | 38.7818 KOps/s | $\color{#d91a1a}-1.37\\%$ | | test_reshape_td | 72.1040μs | 33.8406μs | 29.5503 KOps/s | 29.2907 KOps/s | $\color{#35bf28}+0.89\\%$ | | test_view_pytree | 58.3090μs | 25.5680μs | 39.1114 KOps/s | 39.1761 KOps/s | $\color{#d91a1a}-0.17\\%$ | | test_view_td | 0.1621ms | 39.8385μs | 25.1013 KOps/s | 25.0641 KOps/s | $\color{#35bf28}+0.15\\%$ | | test_unbind_pytree | 0.1205ms | 29.1407μs | 34.3163 KOps/s | 33.3814 KOps/s | $\color{#35bf28}+2.80\\%$ | | test_unbind_td | 0.4025ms | 36.3067μs | 27.5432 KOps/s | 27.1166 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_split_pytree | 86.5220μs | 29.5684μs | 33.8198 KOps/s | 34.3330 KOps/s | $\color{#d91a1a}-1.49\\%$ | | test_split_td | 0.1273ms | 40.6290μs | 24.6130 KOps/s | 25.4215 KOps/s | $\color{#d91a1a}-3.18\\%$ | | test_add_pytree | 0.1039ms | 34.9826μs | 28.5857 KOps/s | 28.6156 KOps/s | $\color{#d91a1a}-0.10\\%$ | | test_add_td | 0.1344ms | 55.2859μs | 18.0878 KOps/s | 19.1568 KOps/s | $\textbf{\color{#d91a1a}-5.58\\%}$ | | test_distributed | 0.2444ms | 0.1060ms | 9.4352 KOps/s | 9.6962 KOps/s | $\color{#d91a1a}-2.69\\%$ | | test_tdmodule | 30.3470μs | 17.5056μs | 57.1246 KOps/s | 58.2021 KOps/s | $\color{#d91a1a}-1.85\\%$ | | test_tdmodule_dispatch | 78.4670μs | 35.7923μs | 27.9390 KOps/s | 29.6051 KOps/s | $\textbf{\color{#d91a1a}-5.63\\%}$ | | test_tdseq | 39.3030μs | 20.6284μs | 48.4768 KOps/s | 49.6581 KOps/s | $\color{#d91a1a}-2.38\\%$ | | test_tdseq_dispatch | 67.5260μs | 40.5920μs | 24.6354 KOps/s | 25.3471 KOps/s | $\color{#d91a1a}-2.81\\%$ | | test_instantiation_functorch | 1.5917ms | 1.3388ms | 746.9534 Ops/s | 733.5448 Ops/s | $\color{#35bf28}+1.83\\%$ | | test_instantiation_td | 2.4011ms | 1.0634ms | 940.3430 Ops/s | 985.8151 Ops/s | $\color{#d91a1a}-4.61\\%$ | | test_exec_functorch | 0.2968ms | 0.1632ms | 6.1289 KOps/s | 6.0583 KOps/s | $\color{#35bf28}+1.17\\%$ | | test_exec_functional_call | 0.2988ms | 0.1492ms | 6.7005 KOps/s | 6.6710 KOps/s | $\color{#35bf28}+0.44\\%$ | | test_exec_td | 0.3347ms | 0.1463ms | 6.8368 KOps/s | 6.9272 KOps/s | $\color{#d91a1a}-1.30\\%$ | | test_exec_td_decorator | 0.8110ms | 0.2217ms | 4.5097 KOps/s | 4.5541 KOps/s | $\color{#d91a1a}-0.97\\%$ | | test_vmap_mlp_speed[True-True] | 0.8276ms | 0.4963ms | 2.0149 KOps/s | 2.0415 KOps/s | $\color{#d91a1a}-1.30\\%$ | | test_vmap_mlp_speed[True-False] | 0.6937ms | 0.4841ms | 2.0655 KOps/s | 2.0902 KOps/s | $\color{#d91a1a}-1.18\\%$ | | test_vmap_mlp_speed[False-True] | 0.6863ms | 0.3934ms | 2.5416 KOps/s | 2.5164 KOps/s | $\color{#35bf28}+1.00\\%$ | | test_vmap_mlp_speed[False-False] | 0.7647ms | 0.3943ms | 2.5359 KOps/s | 2.5121 KOps/s | $\color{#35bf28}+0.95\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 1.1784ms | 0.5636ms | 1.7742 KOps/s | 1.6579 KOps/s | $\textbf{\color{#35bf28}+7.02\\%}$ | | test_vmap_mlp_speed_decorator[True-False] | 1.1094ms | 0.5721ms | 1.7480 KOps/s | 1.7938 KOps/s | $\color{#d91a1a}-2.55\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.9203ms | 0.4560ms | 2.1929 KOps/s | 2.1667 KOps/s | $\color{#35bf28}+1.21\\%$ | | test_vmap_mlp_speed_decorator[False-False] | 0.6700ms | 0.4542ms | 2.2018 KOps/s | 2.1662 KOps/s | $\color{#35bf28}+1.64\\%$ | | test_to_module_speed[True] | 2.5935ms | 1.7094ms | 584.9997 Ops/s | 598.5670 Ops/s | $\color{#d91a1a}-2.27\\%$ | | test_to_module_speed[False] | 2.4291ms | 1.6626ms | 601.4599 Ops/s | 610.4616 Ops/s | $\color{#d91a1a}-1.47\\%$ | | test_tc_init | 0.1449ms | 61.2629μs | 16.3231 KOps/s | 37.0836 KOps/s | $\textbf{\color{#d91a1a}-55.98\\%}$ | | test_tc_init_nested | 0.1936ms | 0.1217ms | 8.2137 KOps/s | 18.5482 KOps/s | $\textbf{\color{#d91a1a}-55.72\\%}$ | | test_tc_first_layer_tensor | 29.9260μs | 8.0422μs | 124.3438 KOps/s | 1.3980 MOps/s | $\textbf{\color{#d91a1a}-91.11\\%}$ | | test_tc_first_layer_nontensor | 30.6470μs | 8.0614μs | 124.0484 KOps/s | 1.4495 MOps/s | $\textbf{\color{#d91a1a}-91.44\\%}$ | | test_tc_second_layer_tensor | 21.0900μs | 2.5644μs | 389.9566 KOps/s | 538.6275 KOps/s | $\textbf{\color{#d91a1a}-27.60\\%}$ | | test_tc_second_layer_nontensor | 40.3150μs | 9.1571μs | 109.2044 KOps/s | 602.4611 KOps/s | $\textbf{\color{#d91a1a}-81.87\\%}$ | | test_unbind | 9.5144ms | 9.1098ms | 109.7724 Ops/s | 149.4072 Ops/s | $\textbf{\color{#d91a1a}-26.53\\%}$ | | test_full_like | 17.0028ms | 11.7769ms | 84.9117 Ops/s | 92.4755 Ops/s | $\textbf{\color{#d91a1a}-8.18\\%}$ | | test_zeros_like | 11.4449ms | 5.8282ms | 171.5792 Ops/s | 171.4434 Ops/s | $\color{#35bf28}+0.08\\%$ | | test_ones_like | 12.3536ms | 6.4631ms | 154.7233 Ops/s | 159.2340 Ops/s | $\color{#d91a1a}-2.83\\%$ | | test_clone | 13.4002ms | 8.0000ms | 125.0007 Ops/s | 128.3148 Ops/s | $\color{#d91a1a}-2.58\\%$ | | test_squeeze | 62.1760μs | 13.7219μs | 72.8761 KOps/s | 70.3017 KOps/s | $\color{#35bf28}+3.66\\%$ | | test_unsqueeze | 0.1997ms | 98.8538μs | 10.1160 KOps/s | 16.6661 KOps/s | $\textbf{\color{#d91a1a}-39.30\\%}$ | | test_split | 0.5246ms | 0.2871ms | 3.4837 KOps/s | 9.1405 KOps/s | $\textbf{\color{#d91a1a}-61.89\\%}$ | | test_permute | 0.3446ms | 0.2244ms | 4.4560 KOps/s | 7.9751 KOps/s | $\textbf{\color{#d91a1a}-44.13\\%}$ | | test_stack | 26.3374ms | 22.4203ms | 44.6025 Ops/s | 46.7679 Ops/s | $\color{#d91a1a}-4.63\\%$ | | test_cat | 25.4964ms | 21.9432ms | 45.5722 Ops/s | 46.8564 Ops/s | $\color{#d91a1a}-2.74\\%$ |
github-actions[bot] commented 4 months ago

$\color{#D29922}\textsf{\Large\⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results | Name | Max | Mean | Ops | Ops on Repo `HEAD` | Change | | -------------------------------------------------- | --------- | --------- | --------------- | ------------------ | ----------------------------------- | | test_plain_set_nested | 83.5940μs | 13.3420μs | 74.9511 KOps/s | 76.6086 KOps/s | $\color{#d91a1a}-2.16\\%$ | | test_plain_set_stack_nested | 29.7910μs | 13.5481μs | 73.8108 KOps/s | 76.3437 KOps/s | $\color{#d91a1a}-3.32\\%$ | | test_plain_set_nested_inplace | 40.5720μs | 14.7421μs | 67.8331 KOps/s | 68.6475 KOps/s | $\color{#d91a1a}-1.19\\%$ | | test_plain_set_stack_nested_inplace | 38.9120μs | 14.6557μs | 68.2327 KOps/s | 69.1319 KOps/s | $\color{#d91a1a}-1.30\\%$ | | test_items | 26.5220μs | 4.6395μs | 215.5415 KOps/s | 210.3198 KOps/s | $\color{#35bf28}+2.48\\%$ | | test_items_nested | 0.5023ms | 0.3403ms | 2.9384 KOps/s | 2.9197 KOps/s | $\color{#35bf28}+0.64\\%$ | | test_items_nested_locked | 0.3632ms | 0.3417ms | 2.9262 KOps/s | 2.8938 KOps/s | $\color{#35bf28}+1.12\\%$ | | test_items_nested_leaf | 0.1129ms | 83.2029μs | 12.0188 KOps/s | 12.0627 KOps/s | $\color{#d91a1a}-0.36\\%$ | | test_items_stack_nested | 0.3701ms | 0.3423ms | 2.9213 KOps/s | 2.8144 KOps/s | $\color{#35bf28}+3.80\\%$ | | test_items_stack_nested_leaf | 0.2420ms | 82.6237μs | 12.1031 KOps/s | 11.7513 KOps/s | $\color{#35bf28}+2.99\\%$ | | test_items_stack_nested_locked | 0.5270ms | 0.3426ms | 2.9187 KOps/s | 2.8685 KOps/s | $\color{#35bf28}+1.75\\%$ | | test_keys | 15.8910μs | 4.3214μs | 231.4082 KOps/s | 227.8323 KOps/s | $\color{#35bf28}+1.57\\%$ | | test_keys_nested | 91.3550μs | 69.1184μs | 14.4679 KOps/s | 14.3932 KOps/s | $\color{#35bf28}+0.52\\%$ | | test_keys_nested_locked | 2.4809ms | 74.9701μs | 13.3387 KOps/s | 13.2720 KOps/s | $\color{#35bf28}+0.50\\%$ | | test_keys_nested_leaf | 84.4640μs | 60.0927μs | 16.6410 KOps/s | 16.6038 KOps/s | $\color{#35bf28}+0.22\\%$ | | test_keys_stack_nested | 91.2050μs | 69.3658μs | 14.4163 KOps/s | 14.4591 KOps/s | $\color{#d91a1a}-0.30\\%$ | | test_keys_stack_nested_leaf | 1.5924ms | 59.2897μs | 16.8663 KOps/s | 16.5063 KOps/s | $\color{#35bf28}+2.18\\%$ | | test_keys_stack_nested_locked | 0.2184ms | 74.6189μs | 13.4014 KOps/s | 13.2850 KOps/s | $\color{#35bf28}+0.88\\%$ | | test_values | 11.7473μs | 1.8115μs | 552.0181 KOps/s | 552.8963 KOps/s | $\color{#d91a1a}-0.16\\%$ | | test_values_nested | 59.8730μs | 35.0154μs | 28.5589 KOps/s | 27.9751 KOps/s | $\color{#35bf28}+2.09\\%$ | | test_values_nested_locked | 60.2540μs | 36.7913μs | 27.1803 KOps/s | 26.6161 KOps/s | $\color{#35bf28}+2.12\\%$ | | test_values_nested_leaf | 56.7530μs | 31.1705μs | 32.0816 KOps/s | 31.4205 KOps/s | $\color{#35bf28}+2.10\\%$ | | test_values_stack_nested | 1.6310ms | 35.7403μs | 27.9797 KOps/s | 27.2109 KOps/s | $\color{#35bf28}+2.83\\%$ | | test_values_stack_nested_leaf | 1.0727ms | 31.6007μs | 31.6448 KOps/s | 30.6140 KOps/s | $\color{#35bf28}+3.37\\%$ | | test_values_stack_nested_locked | 0.2154ms | 36.6646μs | 27.2743 KOps/s | 26.2496 KOps/s | $\color{#35bf28}+3.90\\%$ | | test_membership | 1.4336μs | 0.6980μs | 1.4326 MOps/s | 1.3946 MOps/s | $\color{#35bf28}+2.73\\%$ | | test_membership_nested | 18.3510μs | 2.5514μs | 391.9461 KOps/s | 386.4174 KOps/s | $\color{#35bf28}+1.43\\%$ | | test_membership_nested_leaf | 21.9110μs | 2.5776μs | 387.9527 KOps/s | 388.3064 KOps/s | $\color{#d91a1a}-0.09\\%$ | | test_membership_stacked_nested | 24.6610μs | 2.5926μs | 385.7164 KOps/s | 386.4290 KOps/s | $\color{#d91a1a}-0.18\\%$ | | test_membership_stacked_nested_leaf | 0.1879ms | 2.5692μs | 389.2270 KOps/s | 387.4450 KOps/s | $\color{#35bf28}+0.46\\%$ | | test_membership_nested_last | 0.1839ms | 3.1086μs | 321.6866 KOps/s | 317.3228 KOps/s | $\color{#35bf28}+1.38\\%$ | | test_membership_nested_leaf_last | 16.7210μs | 3.1258μs | 319.9190 KOps/s | 320.8717 KOps/s | $\color{#d91a1a}-0.30\\%$ | | test_membership_stacked_nested_last | 30.0920μs | 3.0831μs | 324.3506 KOps/s | 281.9570 KOps/s | $\textbf{\color{#35bf28}+15.04\\%}$ | | test_membership_stacked_nested_leaf_last | 19.2110μs | 3.1306μs | 319.4254 KOps/s | 276.8614 KOps/s | $\textbf{\color{#35bf28}+15.37\\%}$ | | test_nested_getleaf | 23.9210μs | 8.3276μs | 120.0821 KOps/s | 119.0976 KOps/s | $\color{#35bf28}+0.83\\%$ | | test_nested_get | 35.8820μs | 7.8534μs | 127.3335 KOps/s | 127.6502 KOps/s | $\color{#d91a1a}-0.25\\%$ | | test_stacked_getleaf | 31.2220μs | 8.3428μs | 119.8636 KOps/s | 119.6140 KOps/s | $\color{#35bf28}+0.21\\%$ | | test_stacked_get | 35.5120μs | 7.8575μs | 127.2676 KOps/s | 127.4493 KOps/s | $\color{#d91a1a}-0.14\\%$ | | test_nested_getitemleaf | 25.5320μs | 8.5472μs | 116.9980 KOps/s | 116.6315 KOps/s | $\color{#35bf28}+0.31\\%$ | | test_nested_getitem | 25.7020μs | 8.0162μs | 124.7473 KOps/s | 124.8072 KOps/s | $\color{#d91a1a}-0.05\\%$ | | test_stacked_getitemleaf | 24.8810μs | 8.5101μs | 117.5069 KOps/s | 116.9474 KOps/s | $\color{#35bf28}+0.48\\%$ | | test_stacked_getitem | 44.6630μs | 8.0544μs | 124.1561 KOps/s | 125.2819 KOps/s | $\color{#d91a1a}-0.90\\%$ | | test_lock_nested | 59.2302ms | 0.3920ms | 2.5510 KOps/s | 2.4814 KOps/s | $\color{#35bf28}+2.80\\%$ | | test_lock_stack_nested | 0.4201ms | 0.2912ms | 3.4342 KOps/s | 3.3644 KOps/s | $\color{#35bf28}+2.07\\%$ | | test_unlock_nested | 63.6856ms | 0.4005ms | 2.4967 KOps/s | 2.4777 KOps/s | $\color{#35bf28}+0.77\\%$ | | test_unlock_stack_nested | 0.4748ms | 0.3017ms | 3.3148 KOps/s | 3.2951 KOps/s | $\color{#35bf28}+0.60\\%$ | | test_flatten_speed | 0.3522ms | 0.1017ms | 9.8314 KOps/s | 9.7816 KOps/s | $\color{#35bf28}+0.51\\%$ | | test_unflatten_speed | 0.3518ms | 0.2924ms | 3.4195 KOps/s | 3.4300 KOps/s | $\color{#d91a1a}-0.31\\%$ | | test_common_ops | 1.0714ms | 0.5902ms | 1.6942 KOps/s | 1.6665 KOps/s | $\color{#35bf28}+1.66\\%$ | | test_creation | 18.5410μs | 1.6092μs | 621.4240 KOps/s | 629.5337 KOps/s | $\color{#d91a1a}-1.29\\%$ | | test_creation_empty | 26.2520μs | 9.6525μs | 103.5997 KOps/s | 112.3672 KOps/s | $\textbf{\color{#d91a1a}-7.80\\%}$ | | test_creation_nested_1 | 33.0220μs | 11.4197μs | 87.5681 KOps/s | 93.3699 KOps/s | $\textbf{\color{#d91a1a}-6.21\\%}$ | | test_creation_nested_2 | 50.7530μs | 13.6839μs | 73.0787 KOps/s | 77.1406 KOps/s | $\textbf{\color{#d91a1a}-5.27\\%}$ | | test_clone | 74.1250μs | 11.1636μs | 89.5771 KOps/s | 87.8780 KOps/s | $\color{#35bf28}+1.93\\%$ | | test_getitem[int] | 24.9310μs | 10.5339μs | 94.9317 KOps/s | 93.6430 KOps/s | $\color{#35bf28}+1.38\\%$ | | test_getitem[slice_int] | 0.1014ms | 20.1061μs | 49.7362 KOps/s | 49.6674 KOps/s | $\color{#35bf28}+0.14\\%$ | | test_getitem[range] | 62.1930μs | 44.7506μs | 22.3461 KOps/s | 21.8470 KOps/s | $\color{#35bf28}+2.28\\%$ | | test_getitem[tuple] | 41.5720μs | 18.1193μs | 55.1898 KOps/s | 54.0100 KOps/s | $\color{#35bf28}+2.18\\%$ | | test_getitem[list] | 0.1274ms | 31.3908μs | 31.8565 KOps/s | 29.3418 KOps/s | $\textbf{\color{#35bf28}+8.57\\%}$ | | test_setitem_dim[int] | 45.8930μs | 27.0802μs | 36.9273 KOps/s | 32.1155 KOps/s | $\textbf{\color{#35bf28}+14.98\\%}$ | | test_setitem_dim[slice_int] | 64.8940μs | 47.7336μs | 20.9496 KOps/s | 18.9180 KOps/s | $\textbf{\color{#35bf28}+10.74\\%}$ | | test_setitem_dim[range] | 94.7860μs | 64.3461μs | 15.5410 KOps/s | 13.9895 KOps/s | $\textbf{\color{#35bf28}+11.09\\%}$ | | test_setitem_dim[tuple] | 63.1540μs | 41.6031μs | 24.0367 KOps/s | 22.0126 KOps/s | $\textbf{\color{#35bf28}+9.19\\%}$ | | test_setitem | 59.3830μs | 16.6870μs | 59.9269 KOps/s | 58.6963 KOps/s | $\color{#35bf28}+2.10\\%$ | | test_set | 53.5530μs | 15.9604μs | 62.6550 KOps/s | 59.8542 KOps/s | $\color{#35bf28}+4.68\\%$ | | test_set_shared | 1.5734ms | 96.0633μs | 10.4098 KOps/s | 10.1902 KOps/s | $\color{#35bf28}+2.16\\%$ | | test_update | 85.7450μs | 19.0549μs | 52.4798 KOps/s | 54.6935 KOps/s | $\color{#d91a1a}-4.05\\%$ | | test_update_nested | 77.5040μs | 24.3187μs | 41.1206 KOps/s | 41.5992 KOps/s | $\color{#d91a1a}-1.15\\%$ | | test_update__nested | 51.7730μs | 21.2699μs | 47.0147 KOps/s | 46.6045 KOps/s | $\color{#35bf28}+0.88\\%$ | | test_set_nested | 80.1150μs | 16.7806μs | 59.5926 KOps/s | 58.9421 KOps/s | $\color{#35bf28}+1.10\\%$ | | test_set_nested_new | 94.1250μs | 19.3417μs | 51.7017 KOps/s | 51.7346 KOps/s | $\color{#d91a1a}-0.06\\%$ | | test_select | 0.1170ms | 31.9984μs | 31.2516 KOps/s | 30.2506 KOps/s | $\color{#35bf28}+3.31\\%$ | | test_select_nested | 0.8826ms | 52.7662μs | 18.9515 KOps/s | 19.3750 KOps/s | $\color{#d91a1a}-2.19\\%$ | | test_exclude_nested | 0.1846ms | 0.1088ms | 9.1916 KOps/s | 9.2969 KOps/s | $\color{#d91a1a}-1.13\\%$ | | test_empty[True] | 0.3818ms | 0.3399ms | 2.9418 KOps/s | 2.9217 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_empty[False] | 2.3962μs | 0.8294μs | 1.2057 MOps/s | 1.2340 MOps/s | $\color{#d91a1a}-2.29\\%$ | | test_to | 88.5750μs | 58.1307μs | 17.2026 KOps/s | 15.4066 KOps/s | $\textbf{\color{#35bf28}+11.66\\%}$ | | test_to_nonblocking | 0.1991ms | 34.7957μs | 28.7392 KOps/s | 28.2022 KOps/s | $\color{#35bf28}+1.90\\%$ | | test_unbind_speed | 0.3910ms | 0.2534ms | 3.9465 KOps/s | 3.8934 KOps/s | $\color{#35bf28}+1.36\\%$ | | test_unbind_speed_stack0 | 0.3766ms | 0.2521ms | 3.9666 KOps/s | 3.8642 KOps/s | $\color{#35bf28}+2.65\\%$ | | test_unbind_speed_stack1 | 77.3432ms | 0.8259ms | 1.2108 KOps/s | 1.2642 KOps/s | $\color{#d91a1a}-4.22\\%$ | | test_split | 79.3484ms | 1.6371ms | 610.8465 Ops/s | 607.3634 Ops/s | $\color{#35bf28}+0.57\\%$ | | test_chunk | 1.5707ms | 1.5151ms | 660.0074 Ops/s | 608.5911 Ops/s | $\textbf{\color{#35bf28}+8.45\\%}$ | | test_creation[device0] | 0.2011ms | 56.3537μs | 17.7451 KOps/s | 17.6017 KOps/s | $\color{#35bf28}+0.81\\%$ | | test_creation_from_tensor | 0.1951ms | 51.8667μs | 19.2802 KOps/s | 19.0180 KOps/s | $\color{#35bf28}+1.38\\%$ | | test_add_one[memmap_tensor0] | 0.1241ms | 6.7076μs | 149.0843 KOps/s | 149.0900 KOps/s | $-0.00\\%$ | | test_contiguous[memmap_tensor0] | 12.5310μs | 0.6556μs | 1.5253 MOps/s | 1.5612 MOps/s | $\color{#d91a1a}-2.30\\%$ | | test_stack[memmap_tensor0] | 39.6220μs | 4.7566μs | 210.2330 KOps/s | 211.3520 KOps/s | $\color{#d91a1a}-0.53\\%$ | | test_memmaptd_index | 1.0964ms | 0.2645ms | 3.7803 KOps/s | 3.6795 KOps/s | $\color{#35bf28}+2.74\\%$ | | test_memmaptd_index_astensor | 0.5904ms | 0.3267ms | 3.0606 KOps/s | 3.0081 KOps/s | $\color{#35bf28}+1.75\\%$ | | test_memmaptd_index_op | 0.9201ms | 0.6359ms | 1.5725 KOps/s | 1.5893 KOps/s | $\color{#d91a1a}-1.06\\%$ | | test_serialize_model | 0.1814s | 0.1006s | 9.9402 Ops/s | 10.0370 Ops/s | $\color{#d91a1a}-0.96\\%$ | | test_serialize_model_pickle | 1.3509s | 1.2361s | 0.8090 Ops/s | 0.8082 Ops/s | $\color{#35bf28}+0.10\\%$ | | test_serialize_weights | 0.1713s | 98.2388ms | 10.1793 Ops/s | 9.2696 Ops/s | $\textbf{\color{#35bf28}+9.81\\%}$ | | test_serialize_weights_returnearly | 69.1916ms | 61.9161ms | 16.1509 Ops/s | 11.8837 Ops/s | $\textbf{\color{#35bf28}+35.91\\%}$ | | test_serialize_weights_pickle | 1.3513s | 1.2370s | 0.8084 Ops/s | 0.8085 Ops/s | $-0.00\\%$ | | test_reshape_pytree | 0.1594ms | 26.2666μs | 38.0711 KOps/s | 37.6967 KOps/s | $\color{#35bf28}+0.99\\%$ | | test_reshape_td | 0.2229ms | 31.5861μs | 31.6595 KOps/s | 31.3393 KOps/s | $\color{#35bf28}+1.02\\%$ | | test_view_pytree | 0.1659ms | 26.0587μs | 38.3749 KOps/s | 38.4750 KOps/s | $\color{#d91a1a}-0.26\\%$ | | test_view_td | 0.2593ms | 37.7481μs | 26.4914 KOps/s | 26.9158 KOps/s | $\color{#d91a1a}-1.58\\%$ | | test_unbind_pytree | 0.2503ms | 32.9264μs | 30.3707 KOps/s | 31.0321 KOps/s | $\color{#d91a1a}-2.13\\%$ | | test_unbind_td | 0.5297ms | 40.4303μs | 24.7339 KOps/s | 25.5934 KOps/s | $\color{#d91a1a}-3.36\\%$ | | test_split_pytree | 0.1403ms | 34.5043μs | 28.9819 KOps/s | 28.3911 KOps/s | $\color{#35bf28}+2.08\\%$ | | test_split_td | 0.1642ms | 38.0056μs | 26.3119 KOps/s | 26.2696 KOps/s | $\color{#35bf28}+0.16\\%$ | | test_add_pytree | 0.2624ms | 37.9798μs | 26.3298 KOps/s | 26.1495 KOps/s | $\color{#35bf28}+0.69\\%$ | | test_add_td | 0.2105ms | 54.0030μs | 18.5175 KOps/s | 18.2552 KOps/s | $\color{#35bf28}+1.44\\%$ | | test_distributed | 1.9637ms | 71.9506μs | 13.8984 KOps/s | 14.0026 KOps/s | $\color{#d91a1a}-0.74\\%$ | | test_tdmodule | 0.1472ms | 15.7593μs | 63.4547 KOps/s | 61.7103 KOps/s | $\color{#35bf28}+2.83\\%$ | | test_tdmodule_dispatch | 54.7420μs | 29.9157μs | 33.4273 KOps/s | 32.2993 KOps/s | $\color{#35bf28}+3.49\\%$ | | test_tdseq | 33.6020μs | 17.1018μs | 58.4732 KOps/s | 59.5634 KOps/s | $\color{#d91a1a}-1.83\\%$ | | test_tdseq_dispatch | 54.4930μs | 33.3972μs | 29.9427 KOps/s | 29.5736 KOps/s | $\color{#35bf28}+1.25\\%$ | | test_instantiation_functorch | 1.5573ms | 1.3794ms | 724.9712 Ops/s | 712.8947 Ops/s | $\color{#35bf28}+1.69\\%$ | | test_instantiation_td | 1.4790ms | 0.9750ms | 1.0256 KOps/s | 1.0172 KOps/s | $\color{#35bf28}+0.82\\%$ | | test_exec_functorch | 0.1875ms | 0.1418ms | 7.0536 KOps/s | 7.0809 KOps/s | $\color{#d91a1a}-0.39\\%$ | | test_exec_functional_call | 0.2585ms | 0.1312ms | 7.6205 KOps/s | 7.7452 KOps/s | $\color{#d91a1a}-1.61\\%$ | | test_exec_td | 0.3189ms | 0.1273ms | 7.8562 KOps/s | 7.5436 KOps/s | $\color{#35bf28}+4.14\\%$ | | test_exec_td_decorator | 0.4498ms | 0.1991ms | 5.0217 KOps/s | 4.9567 KOps/s | $\color{#35bf28}+1.31\\%$ | | test_vmap_mlp_speed[True-True] | 1.8309ms | 0.5671ms | 1.7633 KOps/s | 1.7815 KOps/s | $\color{#d91a1a}-1.02\\%$ | | test_vmap_mlp_speed[True-False] | 0.7423ms | 0.5814ms | 1.7200 KOps/s | 1.7808 KOps/s | $\color{#d91a1a}-3.42\\%$ | | test_vmap_mlp_speed[False-True] | 0.6656ms | 0.5086ms | 1.9662 KOps/s | 2.0194 KOps/s | $\color{#d91a1a}-2.64\\%$ | | test_vmap_mlp_speed[False-False] | 0.6876ms | 0.5057ms | 1.9775 KOps/s | 1.9283 KOps/s | $\color{#35bf28}+2.55\\%$ | | test_vmap_mlp_speed_decorator[True-True] | 0.9839ms | 0.6265ms | 1.5961 KOps/s | 1.5829 KOps/s | $\color{#35bf28}+0.84\\%$ | | test_vmap_mlp_speed_decorator[True-False] | 0.8126ms | 0.6252ms | 1.5996 KOps/s | 1.5730 KOps/s | $\color{#35bf28}+1.69\\%$ | | test_vmap_mlp_speed_decorator[False-True] | 0.6961ms | 0.5457ms | 1.8324 KOps/s | 1.6876 KOps/s | $\textbf{\color{#35bf28}+8.58\\%}$ | | test_vmap_mlp_speed_decorator[False-False] | 0.7059ms | 0.5481ms | 1.8244 KOps/s | 1.6945 KOps/s | $\textbf{\color{#35bf28}+7.67\\%}$ | | test_vmap_transformer_speed[True-True] | 7.8344ms | 7.3893ms | 135.3316 Ops/s | 132.2657 Ops/s | $\color{#35bf28}+2.32\\%$ | | test_vmap_transformer_speed[True-False] | 7.6326ms | 7.3311ms | 136.4049 Ops/s | 132.4202 Ops/s | $\color{#35bf28}+3.01\\%$ | | test_vmap_transformer_speed[False-True] | 7.6277ms | 7.2903ms | 137.1695 Ops/s | 134.5431 Ops/s | $\color{#35bf28}+1.95\\%$ | | test_vmap_transformer_speed[False-False] | 8.0754ms | 7.5302ms | 132.7986 Ops/s | 137.4430 Ops/s | $\color{#d91a1a}-3.38\\%$ | | test_vmap_transformer_speed_decorator[True-True] | 18.2331ms | 17.8807ms | 55.9263 Ops/s | 56.3909 Ops/s | $\color{#d91a1a}-0.82\\%$ | | test_vmap_transformer_speed_decorator[True-False] | 18.1972ms | 17.8339ms | 56.0731 Ops/s | 56.4931 Ops/s | $\color{#d91a1a}-0.74\\%$ | | test_vmap_transformer_speed_decorator[False-True] | 18.6209ms | 17.8002ms | 56.1792 Ops/s | 56.4649 Ops/s | $\color{#d91a1a}-0.51\\%$ | | test_vmap_transformer_speed_decorator[False-False] | 18.2207ms | 17.7779ms | 56.2497 Ops/s | 56.5496 Ops/s | $\color{#d91a1a}-0.53\\%$ | | test_to_module_speed[True] | 1.6278ms | 1.4918ms | 670.3377 Ops/s | 681.9274 Ops/s | $\color{#d91a1a}-1.70\\%$ | | test_to_module_speed[False] | 1.5930ms | 1.4763ms | 677.3470 Ops/s | 692.3180 Ops/s | $\color{#d91a1a}-2.16\\%$ | | test_tc_init | 0.1791ms | 54.0740μs | 18.4932 KOps/s | 38.6656 KOps/s | $\textbf{\color{#d91a1a}-52.17\\%}$ | | test_tc_init_nested | 0.2319ms | 0.1059ms | 9.4422 KOps/s | 18.2337 KOps/s | $\textbf{\color{#d91a1a}-48.22\\%}$ | | test_tc_first_layer_tensor | 24.6220μs | 3.6935μs | 270.7471 KOps/s | 2.7938 MOps/s | $\textbf{\color{#d91a1a}-90.31\\%}$ | | test_tc_first_layer_nontensor | 17.9310μs | 3.7064μs | 269.8072 KOps/s | 2.6310 MOps/s | $\textbf{\color{#d91a1a}-89.75\\%}$ | | test_tc_second_layer_tensor | 5.8902μs | 1.1778μs | 849.0588 KOps/s | 1.0264 MOps/s | $\textbf{\color{#d91a1a}-17.28\\%}$ | | test_tc_second_layer_nontensor | 19.5810μs | 4.2161μs | 237.1879 KOps/s | 1.2070 MOps/s | $\textbf{\color{#d91a1a}-80.35\\%}$ | | test_unbind | 0.1126s | 13.6666ms | 73.1710 Ops/s | 121.0664 Ops/s | $\textbf{\color{#d91a1a}-39.56\\%}$ | | test_full_like | 11.4609ms | 10.2053ms | 97.9887 Ops/s | 83.2254 Ops/s | $\textbf{\color{#35bf28}+17.74\\%}$ | | test_zeros_like | 8.4690ms | 8.0499ms | 124.2254 Ops/s | 124.7202 Ops/s | $\color{#d91a1a}-0.40\\%$ | | test_ones_like | 8.8898ms | 8.1320ms | 122.9707 Ops/s | 123.5628 Ops/s | $\color{#d91a1a}-0.48\\%$ | | test_clone | 10.9343ms | 10.1465ms | 98.5561 Ops/s | 101.9487 Ops/s | $\color{#d91a1a}-3.33\\%$ | | test_squeeze | 0.1882ms | 10.9110μs | 91.6506 KOps/s | 86.8160 KOps/s | $\textbf{\color{#35bf28}+5.57\\%}$ | | test_unsqueeze | 0.2569ms | 87.4934μs | 11.4294 KOps/s | 18.6366 KOps/s | $\textbf{\color{#d91a1a}-38.67\\%}$ | | test_split | 0.1043s | 3.4961ms | 286.0350 Ops/s | 10.0738 KOps/s | $\textbf{\color{#d91a1a}-97.16\\%}$ | | test_permute | 0.3721ms | 0.1984ms | 5.0392 KOps/s | 9.1877 KOps/s | $\textbf{\color{#d91a1a}-45.15\\%}$ | | test_stack | 30.1227ms | 29.2468ms | 34.1918 Ops/s | 34.6668 Ops/s | $\color{#d91a1a}-1.37\\%$ | | test_cat | 31.3382ms | 28.8500ms | 34.6620 Ops/s | 34.6503 Ops/s | $\color{#35bf28}+0.03\\%$ |