hidet-org / hidet

An open-source efficient deep learning framework/compiler, written in python.
https://hidet.org
Apache License 2.0
636 stars 50 forks source link

[Tracking Issue] Benchmarks #154

Open yaoyaoding opened 1 year ago

yaoyaoding commented 1 year ago

This issue tracks the performance benchmarks of hidet vs. other dynamo backends in pytorch.

The benchmark scripts that produce these report are located at hidet/scripts/bench.

yaoyaoding commented 1 year ago

2023-04-05

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.656 3.261 3.302 1.481
resnet50 f16[1,3,224,224] 5.663 3.395 3.460 1.217
model/bert-base-uncased f32, bs=1, seq=128 6.060 3.095 2.920 2.335
model/bert-base-uncased f16, bs=1, seq=128 6.444 1.425 1.099 1.923

Time: 3.35 hours

yaoyaoding commented 1 year ago

2023-04-07

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.542 1.408 1.407 1.291
resnet50 f16[1,3,224,224] 1.557 1.250 1.253 1.089
model/bert-base-uncased f32, bs=1, seq=128 3.037 2.727 2.631 2.012
model/bert-base-uncased f16, bs=1, seq=128 1.865 1.288 1.017 1.715

Time: 2.47 hours

yaoyaoding commented 1 year ago

2023-04-08

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.533 1.402 1.399 1.303
resnet50 f16[1,3,224,224] 1.591 1.250 1.294 1.122
model/bert-base-uncased f32, bs=1, seq=128 3.033 2.807 2.634 2.008
model/bert-base-uncased f16, bs=1, seq=128 1.830 1.289 1.014 1.683

Time: 2.45 hours

yaoyaoding commented 1 year ago

2023-04-09

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.537 1.404 1.399 1.293
resnet50 f16[1,3,224,224] 1.578 1.234 1.258 1.055
model/bert-base-uncased f32, bs=1, seq=128 2.867 2.738 2.515 2.014
model/bert-base-uncased f16, bs=1, seq=128 1.863 1.287 1.014 1.688

Time: 2.45 hours

yaoyaoding commented 1 year ago

2023-04-10

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.541 1.406 1.404 1.280
resnet50 f16[1,3,224,224] 1.577 1.243 1.288 1.116
model/bert-base-uncased f32, bs=1, seq=128 2.958 2.809 2.612 2.015
model/bert-base-uncased f16, bs=1, seq=128 1.871 1.289 1.015 1.683

Time: 2.18 hours

yaoyaoding commented 1 year ago

2023-04-11

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.534 1.404 1.401 1.288
resnet50 f16[1,3,224,224] 1.614 1.261 1.280 1.040
model/bert-base-uncased f32, bs=1, seq=128 2.911 2.811 2.631 2.031
model/bert-base-uncased f16, bs=1, seq=128 1.843 1.289 1.013 1.574

Time: 2.18 hours

yaoyaoding commented 1 year ago

2023-04-11

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.584 3.263 3.316 1.512
resnet50 f16[1,3,224,224] 5.466 3.376 3.420 1.130
model/bert-base-uncased f32, bs=1, seq=128 6.085 3.093 2.898 2.354
model/bert-base-uncased f16, bs=1, seq=128 6.288 1.425 1.095 1.863

Time: 8.44 hours

yaoyaoding commented 1 year ago

2023-04-12

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.604 3.222 3.265 1.483
resnet50 f16[1,3,224,224] 5.401 3.365 3.412 1.141
model/bert-base-uncased f32, bs=1, seq=128 6.085 3.082 2.896 2.357
model/bert-base-uncased f16, bs=1, seq=128 6.224 1.426 1.097 1.829

Time: 7.91 hours

yaoyaoding commented 1 year ago

2023-04-14

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.534 1.400 1.319 1.279
resnet50 f16[1,3,224,224] 1.623 1.207 1.226 1.006
model/bert-base-uncased f32, bs=1, seq=128 2.941 2.716 2.593 2.006
model/bert-base-uncased f16, bs=1, seq=128 1.909 1.299 0.963 1.573

Time: 2.82 hours

yaoyaoding commented 1 year ago

2023-04-13

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.651 3.271 3.282 1.509
resnet50 f16[1,3,224,224] 5.466 3.403 3.446 1.143
model/bert-base-uncased f32, bs=1, seq=128 6.155 3.080 2.896 2.350
model/bert-base-uncased f16, bs=1, seq=128 6.249 1.426 1.097 1.831

Time: 7.95 hours

yaoyaoding commented 1 year ago

2023-04-15

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.539 1.409 1.404 1.291
resnet50 f16[1,3,224,224] 1.580 1.175 1.200 1.008
model/bert-base-uncased f32, bs=1, seq=128 3.045 2.720 2.677 2.003
model/bert-base-uncased f16, bs=1, seq=128 1.949 1.297 1.014 1.587

Time: 2.68 hours

yaoyaoding commented 1 year ago

2023-04-14

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.722 3.330 3.370 1.483
resnet50 f16[1,3,224,224] 5.482 3.421 3.434 1.141
model/bert-base-uncased f32, bs=1, seq=128 6.154 3.085 2.901 2.346
model/bert-base-uncased f16, bs=1, seq=128 6.346 1.426 1.097 1.828

Time: 8.10 hours

yaoyaoding commented 1 year ago

2023-04-16

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.532 1.401 1.396 1.289
resnet50 f16[1,3,224,224] 1.625 1.199 1.231 1.009
model/bert-base-uncased f32, bs=1, seq=128 3.078 2.861 2.682 2.005
model/bert-base-uncased f16, bs=1, seq=128 1.959 1.299 1.013 1.592

Time: 2.75 hours

yaoyaoding commented 1 year ago

2023-04-15

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.817 3.375 3.422 1.482
resnet50 f16[1,3,224,224] 5.511 3.415 3.452 1.142
model/bert-base-uncased f32, bs=1, seq=128 7.157 3.092 2.903 2.351
model/bert-base-uncased f16, bs=1, seq=128 6.398 1.426 1.098 1.829

Time: 8.09 hours

yaoyaoding commented 1 year ago

2023-04-17

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.542 1.402 1.404 1.280
resnet50 f16[1,3,224,224] 1.610 1.203 1.215 1.009
model/bert-base-uncased f32, bs=1, seq=128 2.954 2.710 2.678 1.962
model/bert-base-uncased f16, bs=1, seq=128 1.926 1.299 1.014 1.622

Time: 2.74 hours

yaoyaoding commented 1 year ago

2023-04-16

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.629 3.305 3.331 1.481
resnet50 f16[1,3,224,224] 5.719 3.485 3.522 1.144
model/bert-base-uncased f32, bs=1, seq=128 6.184 3.084 2.897 2.337
model/bert-base-uncased f16, bs=1, seq=128 6.372 1.426 1.096 1.838

Time: 8.08 hours

yaoyaoding commented 1 year ago

2023-04-18

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.543 1.410 1.406 1.281
resnet50 f16[1,3,224,224] 1.639 1.207 1.216 1.009
model/bert-base-uncased f32, bs=1, seq=128 3.082 2.855 2.556 2.021
model/bert-base-uncased f16, bs=1, seq=128 1.943 1.298 1.015 1.610

Time: 2.76 hours

yaoyaoding commented 1 year ago

2023-04-17

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.643 3.228 3.296 1.516
resnet50 f16[1,3,224,224] 5.600 3.431 3.448 1.146
model/bert-base-uncased f32, bs=1, seq=128 6.233 3.083 2.896 2.346
model/bert-base-uncased f16, bs=1, seq=128 6.154 1.424 1.095 1.828

Time: 8.54 hours

yaoyaoding commented 1 year ago

2023-04-18

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.707 3.267 3.308 1.506
resnet50 f16[1,3,224,224] 5.583 3.431 3.498 1.141
model/bert-base-uncased f32, bs=1, seq=128 6.188 3.084 2.898 2.339
model/bert-base-uncased f16, bs=1, seq=128 6.277 1.426 1.095 1.828

Time: 8.06 hours

yaoyaoding commented 1 year ago

2023-04-20

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.517 1.390 1.293 1.272
resnet50 f16[1,3,224,224] 1.623 1.197 1.228 1.042
model/bert-base-uncased f32, bs=1, seq=128 3.090 2.849 2.597 2.012
model/bert-base-uncased f16, bs=1, seq=128 1.922 1.297 0.990 1.572

Time: 2.55 hours

yaoyaoding commented 1 year ago

2023-04-21

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.535 1.409 1.403 1.282
resnet50 f16[1,3,224,224] 1.609 1.160 1.192 1.042
model/bert-base-uncased f32, bs=1, seq=128 2.907 2.809 2.650 2.046
model/bert-base-uncased f16, bs=1, seq=128 1.960 1.297 1.046 1.572

Time: 2.50 hours

yaoyaoding commented 1 year ago

2023-04-22

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.537 1.411 1.398 1.267
resnet50 f16[1,3,224,224] 1.645 1.211 1.233 1.044
model/bert-base-uncased f32, bs=1, seq=128 3.097 2.859 2.675 2.006
model/bert-base-uncased f16, bs=1, seq=128 1.951 1.298 1.046 1.577

Time: 2.56 hours

yaoyaoding commented 1 year ago

2023-04-22

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.780 3.355 3.410 1.506
resnet50 f16[1,3,224,224] 5.658 3.445 3.513 1.146
model/bert-base-uncased f32, bs=1, seq=128 6.238 3.083 2.895 2.350
model/bert-base-uncased f16, bs=1, seq=128 6.477 1.426 1.095 1.829

Time: 8.19 hours

yaoyaoding commented 1 year ago

2023-04-23

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.543 1.409 1.403 1.289
resnet50 f16[1,3,224,224] 1.620 1.192 1.218 1.045
model/bert-base-uncased f32, bs=1, seq=128 3.092 2.853 2.651 1.995
model/bert-base-uncased f16, bs=1, seq=128 1.939 1.299 1.046 1.545

Time: 2.61 hours

yaoyaoding commented 1 year ago

2023-04-23

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.851 3.350 3.398 1.480
resnet50 f16[1,3,224,224] 5.689 3.520 3.547 1.147
model/bert-base-uncased f32, bs=1, seq=128 6.228 3.082 2.898 2.334
model/bert-base-uncased f16, bs=1, seq=128 6.454 1.426 1.097 1.820

Time: 8.35 hours

yaoyaoding commented 1 year ago

2023-04-24

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.530 1.403 1.398 1.312
resnet50 f16[1,3,224,224] 1.605 1.191 1.231 1.045
model/bert-base-uncased f32, bs=1, seq=128 3.100 2.860 2.598 2.013
model/bert-base-uncased f16, bs=1, seq=128 1.926 1.298 1.047 1.546

Time: 2.47 hours

yaoyaoding commented 1 year ago

2023-04-24

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.863 3.351 3.393 1.505
resnet50 f16[1,3,224,224] 5.575 3.427 3.452 1.148
model/bert-base-uncased f32, bs=1, seq=128 6.187 3.083 2.897 2.341
model/bert-base-uncased f16, bs=1, seq=128 6.351 1.426 1.098 1.823

Time: 8.26 hours

yaoyaoding commented 1 year ago

2023-04-25

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.534 1.408 1.401 1.281
resnet50 f16[1,3,224,224] 1.622 1.195 1.214 1.045
model/bert-base-uncased f32, bs=1, seq=128 2.929 2.711 2.684 2.030
model/bert-base-uncased f16, bs=1, seq=128 1.928 1.298 1.045 1.550

Time: 2.48 hours

yaoyaoding commented 1 year ago

2023-04-25

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.791 3.366 3.390 1.508
resnet50 f16[1,3,224,224] 5.684 3.526 3.549 1.148
model/bert-base-uncased f32, bs=1, seq=128 6.243 3.083 2.898 2.355
model/bert-base-uncased f16, bs=1, seq=128 6.370 1.428 1.098 1.823

Time: 8.40 hours

yaoyaoding commented 1 year ago

2023-04-26

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.536 1.401 1.400 1.275
resnet50 f16[1,3,224,224] 1.662 1.218 1.236 1.043
model/bert-base-uncased f32, bs=1, seq=128 3.043 2.828 2.604 2.037
model/bert-base-uncased f16, bs=1, seq=128 1.915 1.298 1.046 1.544

Time: 2.49 hours

yaoyaoding commented 1 year ago

2023-04-26

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.791 3.338 3.374 1.479
resnet50 f16[1,3,224,224] 5.680 3.509 3.522 1.143
model/bert-base-uncased f32, bs=1, seq=128 6.283 3.082 2.898 2.337
model/bert-base-uncased f16, bs=1, seq=128 6.449 1.426 1.097 1.828

Time: 8.35 hours

yaoyaoding commented 1 year ago

2023-04-27

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.531 1.400 1.396 1.295
resnet50 f16[1,3,224,224] 1.635 1.194 1.228 1.046
model/bert-base-uncased f32, bs=1, seq=128 2.909 2.800 2.561 1.982
model/bert-base-uncased f16, bs=1, seq=128 1.917 1.298 1.048 1.547

Time: 2.48 hours

yaoyaoding commented 1 year ago

2023-04-27

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.768 3.347 3.379 1.501
resnet50 f16[1,3,224,224] 5.737 3.504 3.555 1.154
model/bert-base-uncased f32, bs=1, seq=128 6.356 3.086 2.899 2.341
model/bert-base-uncased f16, bs=1, seq=128 6.370 1.426 1.096 1.820

Time: 8.33 hours

yaoyaoding commented 1 year ago

2023-04-28

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.548 1.409 1.404 1.314
resnet50 f16[1,3,224,224] 1.616 1.200 1.222 1.046
model/bert-base-uncased f32, bs=1, seq=128 3.095 2.855 2.680 2.015
model/bert-base-uncased f16, bs=1, seq=128 1.946 1.297 1.046 1.544

Time: 2.48 hours

yaoyaoding commented 1 year ago

2023-04-28

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.797 3.364 3.390 1.507
resnet50 f16[1,3,224,224] 5.691 3.506 3.550 1.146
model/bert-base-uncased f32, bs=1, seq=128 6.270 3.083 2.900 2.340
model/bert-base-uncased f16, bs=1, seq=128 6.388 1.426 1.096 1.822

Time: 8.31 hours

yaoyaoding commented 1 year ago

2023-04-29

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.536 1.412 1.405 1.298
resnet50 f16[1,3,224,224] 1.614 1.185 1.272 1.008
model/bert-base-uncased f32, bs=1, seq=128 2.918 2.784 2.583 1.944
model/bert-base-uncased f16, bs=1, seq=128 1.960 1.297 1.045 1.616

Time: 2.39 hours

yaoyaoding commented 1 year ago

2023-04-29

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.706 3.335 3.351 1.479
resnet50 f16[1,3,224,224] 5.636 3.436 3.463 1.147
model/bert-base-uncased f32, bs=1, seq=128 6.115 3.082 2.897 2.348
model/bert-base-uncased f16, bs=1, seq=128 6.329 1.426 1.097 1.891

Time: 7.73 hours

yaoyaoding commented 1 year ago

2023-04-30

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.543 1.418 1.412 1.294
resnet50 f16[1,3,224,224] 1.702 1.335 1.366 1.044
model/bert-base-uncased f32, bs=1, seq=128 2.916 2.854 2.672 1.964
model/bert-base-uncased f16, bs=1, seq=128 1.946 1.299 1.047 1.607

Time: 2.66 hours

yaoyaoding commented 1 year ago

2023-04-30

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.709 3.314 3.365 1.477
resnet50 f16[1,3,224,224] 5.622 3.416 3.485 1.143
model/bert-base-uncased f32, bs=1, seq=128 6.157 3.082 2.897 2.356
model/bert-base-uncased f16, bs=1, seq=128 6.333 1.425 1.099 1.896

Time: 8.77 hours

yaoyaoding commented 1 year ago

2023-05-01

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.545 1.417 1.414 1.291
resnet50 f16[1,3,224,224] 1.710 1.331 1.383 1.041
model/bert-base-uncased f32, bs=1, seq=128 3.086 2.850 2.577 2.025
model/bert-base-uncased f16, bs=1, seq=128 1.953 1.300 1.048 1.608

Time: 2.65 hours

yaoyaoding commented 1 year ago

2023-05-01

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.820 3.326 3.360 1.479
resnet50 f16[1,3,224,224] 5.640 3.442 3.487 1.143
model/bert-base-uncased f32, bs=1, seq=128 6.292 3.085 2.901 2.366
model/bert-base-uncased f16, bs=1, seq=128 6.426 1.429 1.098 1.894

Time: 8.86 hours

yaoyaoding commented 1 year ago

2023-05-02

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.552 1.423 1.419 1.288
resnet50 f16[1,3,224,224] 1.744 1.337 1.357 1.010
model/bert-base-uncased f32, bs=1, seq=128 3.088 2.819 2.677 2.021
model/bert-base-uncased f16, bs=1, seq=128 1.948 1.300 1.046 1.605

Time: 2.69 hours

yaoyaoding commented 1 year ago

2023-05-02

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.733 3.347 3.367 1.481
resnet50 f16[1,3,224,224] 5.506 3.400 3.450 1.144
model/bert-base-uncased f32, bs=1, seq=128 6.109 3.079 2.895 2.352
model/bert-base-uncased f16, bs=1, seq=128 6.251 1.426 1.097 1.889

Time: 8.77 hours

yaoyaoding commented 1 year ago

2023-05-03

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.541 1.420 1.415 1.286
resnet50 f16[1,3,224,224] 1.718 1.346 1.397 1.042
model/bert-base-uncased f32, bs=1, seq=128 3.084 2.792 2.668 2.032
model/bert-base-uncased f16, bs=1, seq=128 1.958 1.299 1.047 1.621

Time: 2.68 hours

yaoyaoding commented 1 year ago

2023-05-03

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.667 3.340 3.375 1.479
resnet50 f16[1,3,224,224] 5.519 3.411 3.450 1.139
model/bert-base-uncased f32, bs=1, seq=128 6.129 3.081 2.897 2.354
model/bert-base-uncased f16, bs=1, seq=128 6.299 1.426 1.098 1.889

Time: 8.73 hours

yaoyaoding commented 1 year ago

2023-05-04

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.547 1.415 1.415 1.300
resnet50 f16[1,3,224,224] 1.702 1.332 1.366 1.042
model/bert-base-uncased f32, bs=1, seq=128 3.018 2.859 2.617 1.996
model/bert-base-uncased f16, bs=1, seq=128 1.971 1.298 1.046 1.610

Time: 2.62 hours

yaoyaoding commented 1 year ago

2023-05-04

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.619 3.263 3.301 1.479
resnet50 f16[1,3,224,224] 5.596 3.412 3.487 1.148
model/bert-base-uncased f32, bs=1, seq=128 6.120 3.095 2.900 2.356
model/bert-base-uncased f16, bs=1, seq=128 6.278 1.427 1.097 1.890

Time: 8.99 hours

yaoyaoding commented 1 year ago

2023-05-05

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.548 1.419 1.419 1.309
resnet50 f16[1,3,224,224] 1.729 1.341 1.356 1.007
model/bert-base-uncased f32, bs=1, seq=128 3.013 2.860 2.679 1.982
model/bert-base-uncased f16, bs=1, seq=128 2.002 1.299 1.046 1.551

Time: 2.15 hours

yaoyaoding commented 1 year ago

2023-05-05

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 4.607 3.284 3.301 1.504
resnet50 f16[1,3,224,224] 5.514 3.397 3.425 1.138
model/bert-base-uncased f32, bs=1, seq=128 6.078 3.083 2.900 2.329
model/bert-base-uncased f16, bs=1, seq=128 6.322 1.426 1.097 1.812

Time: 7.34 hours

yaoyaoding commented 1 year ago

2023-05-06

model inputs eager reduce-overhead max-autotune hidet(2)
resnet50 f32[1,3,224,224] 1.553 1.419 1.409 1.309
resnet50 f16[1,3,224,224] 1.718 1.345 1.362 1.039
model/bert-base-uncased f32, bs=1, seq=128 2.936 2.850 2.678 2.017
model/bert-base-uncased f16, bs=1, seq=128 2.013 1.300 1.048 1.550

Time: 2.16 hours