microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.37k stars 2.88k forks source link

performance of gpt2 inference with onnxruntime-gpu #6229

Open carter54 opened 3 years ago

carter54 commented 3 years ago

Describe the bug when I load a gpt2 model with onnxruntime-gpu, a lot of warning appeared. It shows that some node will be calculated on CPU. Is this as expected or I made something wrong when converting gpt2 model?

2020-12-29 15:11:18.089803597 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_5
2020-12-29 15:11:18.089838024 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_12
2020-12-29 15:11:18.089857576 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_13
2020-12-29 15:11:18.089880290 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_11
2020-12-29 15:11:18.089893351 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_15
2020-12-29 15:11:18.089908955 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_16
2020-12-29 15:11:18.089932518 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_152
2020-12-29 15:11:18.089945448 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_153
2020-12-29 15:11:18.089966887 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_158
2020-12-29 15:11:18.089979815 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_159
2020-12-29 15:11:18.089995017 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Sub_160
2020-12-29 15:11:18.090007972 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_161
2020-12-29 15:11:18.090020324 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_165
2020-12-29 15:11:18.090048646 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_208
2020-12-29 15:11:18.090061429 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_209
2020-12-29 15:11:18.090074773 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_210
2020-12-29 15:11:18.090089709 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_211
2020-12-29 15:11:18.090106206 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_200
2020-12-29 15:11:18.090119508 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_214
2020-12-29 15:11:18.090134815 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_203
2020-12-29 15:11:18.090147149 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_215
2020-12-29 15:11:18.090167593 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_216
2020-12-29 15:11:18.090190651 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_400
2020-12-29 15:11:18.090203922 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_401
2020-12-29 15:11:18.090225770 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_406
2020-12-29 15:11:18.090239761 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_407
2020-12-29 15:11:18.090255294 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Sub_408
2020-12-29 15:11:18.090268394 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_409
2020-12-29 15:11:18.090280816 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_413
2020-12-29 15:11:18.090308784 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_456
2020-12-29 15:11:18.090322815 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_457
2020-12-29 15:11:18.090335128 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_458
2020-12-29 15:11:18.090350296 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_459
2020-12-29 15:11:18.090367321 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_448
2020-12-29 15:11:18.090379888 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_462
2020-12-29 15:11:18.090395302 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_451
2020-12-29 15:11:18.090407821 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_463
2020-12-29 15:11:18.090425776 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_464
2020-12-29 15:11:18.090449830 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_648
2020-12-29 15:11:18.090463500 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_649
2020-12-29 15:11:18.090484324 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_654
2020-12-29 15:11:18.090496989 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_655
2020-12-29 15:11:18.090512316 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Sub_656
2020-12-29 15:11:18.090524995 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_657
2020-12-29 15:11:18.090537516 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_661
2020-12-29 15:11:18.090566579 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_704
2020-12-29 15:11:18.090579379 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_705
2020-12-29 15:11:18.090593223 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_706
2020-12-29 15:11:18.090609245 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_707
2020-12-29 15:11:18.090625876 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_696
2020-12-29 15:11:18.090638517 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_710
2020-12-29 15:11:18.090655030 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_699
2020-12-29 15:11:18.090667658 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_711
2020-12-29 15:11:18.090685563 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_712
2020-12-29 15:11:18.090708405 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_896
2020-12-29 15:11:18.090721165 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_897
2020-12-29 15:11:18.090741836 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_902
2020-12-29 15:11:18.090754415 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_903
2020-12-29 15:11:18.090769522 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Sub_904
2020-12-29 15:11:18.090781850 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_905
2020-12-29 15:11:18.090794142 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_909
2020-12-29 15:11:18.090821428 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_952
2020-12-29 15:11:18.090834009 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_953
2020-12-29 15:11:18.090846553 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_954
2020-12-29 15:11:18.090861810 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_955
2020-12-29 15:11:18.090878704 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_944
2020-12-29 15:11:18.090891250 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_958
2020-12-29 15:11:18.090906055 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_947
2020-12-29 15:11:18.090918265 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_959
2020-12-29 15:11:18.090935714 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_960
2020-12-29 15:11:18.090958588 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_1144
2020-12-29 15:11:18.090974433 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_1145
2020-12-29 15:11:18.090996190 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_1150
2020-12-29 15:11:18.091009008 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_1151
2020-12-29 15:11:18.091024385 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Sub_1152
2020-12-29 15:11:18.091037290 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1153
2020-12-29 15:11:18.091049804 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1157
2020-12-29 15:11:18.091079737 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_1200
2020-12-29 15:11:18.091093385 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_1201
2020-12-29 15:11:18.091106021 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1202
2020-12-29 15:11:18.091121401 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_1203
2020-12-29 15:11:18.091162087 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_1192
2020-12-29 15:11:18.091175962 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1206
2020-12-29 15:11:18.091191418 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_1195
2020-12-29 15:11:18.091204368 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1207
2020-12-29 15:11:18.091222116 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_1208
2020-12-29 15:11:18.091245358 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_1392
2020-12-29 15:11:18.091259049 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_1393
2020-12-29 15:11:18.091280110 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_1398
2020-12-29 15:11:18.091292640 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_1399
2020-12-29 15:11:18.091308106 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Sub_1400
2020-12-29 15:11:18.091339951 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1401
2020-12-29 15:11:18.091353579 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1405
2020-12-29 15:11:18.091381085 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_1448
2020-12-29 15:11:18.091393664 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_1449
2020-12-29 15:11:18.091406019 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1450
2020-12-29 15:11:18.091421237 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_1451
2020-12-29 15:11:18.091440258 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_1440
2020-12-29 15:11:18.091466109 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1454
2020-12-29 15:11:18.091481849 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_1443
2020-12-29 15:11:18.091494356 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1455
2020-12-29 15:11:18.091512395 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_1456
2020-12-29 15:11:18.091536664 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_1640
2020-12-29 15:11:18.091549479 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_1641
2020-12-29 15:11:18.091570499 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_1646
2020-12-29 15:11:18.091583476 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_1647
2020-12-29 15:11:18.091598468 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Sub_1648
2020-12-29 15:11:18.091610837 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1649
2020-12-29 15:11:18.091624567 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1653
2020-12-29 15:11:18.091662032 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_1696
2020-12-29 15:11:18.091675841 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_1697
2020-12-29 15:11:18.091688426 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1698
2020-12-29 15:11:18.091703576 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_1699
2020-12-29 15:11:18.091720123 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_1688
2020-12-29 15:11:18.091734073 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1702
2020-12-29 15:11:18.091749102 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_1691
2020-12-29 15:11:18.091761450 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1703
2020-12-29 15:11:18.091779164 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_1704
2020-12-29 15:11:18.091803327 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_1888
2020-12-29 15:11:18.091815909 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_1889
2020-12-29 15:11:18.091836599 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_1894
2020-12-29 15:11:18.091849089 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_1895
2020-12-29 15:11:18.091864009 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Sub_1896
2020-12-29 15:11:18.091876426 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1897
2020-12-29 15:11:18.091888956 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1901
2020-12-29 15:11:18.091916899 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_1944
2020-12-29 15:11:18.091929563 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_1945
2020-12-29 15:11:18.091941788 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1946
2020-12-29 15:11:18.091956872 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_1947
2020-12-29 15:11:18.091974135 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_1936
2020-12-29 15:11:18.091986756 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1950
2020-12-29 15:11:18.092001784 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_1939
2020-12-29 15:11:18.092014675 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_1951
2020-12-29 15:11:18.092032346 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_1952
2020-12-29 15:11:18.092055935 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_2136
2020-12-29 15:11:18.092068643 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_2137
2020-12-29 15:11:18.092089262 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_2142
2020-12-29 15:11:18.092101741 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_2143
2020-12-29 15:11:18.092116664 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Sub_2144
2020-12-29 15:11:18.092129282 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2145
2020-12-29 15:11:18.092141395 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2149
2020-12-29 15:11:18.092181552 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_2192
2020-12-29 15:11:18.092195461 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_2193
2020-12-29 15:11:18.092207925 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2194
2020-12-29 15:11:18.092223344 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_2195
2020-12-29 15:11:18.092239904 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_2184
2020-12-29 15:11:18.092261512 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2198
2020-12-29 15:11:18.092277036 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_2187
2020-12-29 15:11:18.092289570 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2199
2020-12-29 15:11:18.092307311 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_2200
2020-12-29 15:11:18.092331529 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_2384
2020-12-29 15:11:18.092343976 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_2385
2020-12-29 15:11:18.092364884 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_2390
2020-12-29 15:11:18.092377754 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_2391
2020-12-29 15:11:18.092409982 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Sub_2392
2020-12-29 15:11:18.092424173 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2393
2020-12-29 15:11:18.092436730 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2397
2020-12-29 15:11:18.092463953 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_2440
2020-12-29 15:11:18.092476502 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_2441
2020-12-29 15:11:18.092494123 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2442
2020-12-29 15:11:18.092509901 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_2443
2020-12-29 15:11:18.092526763 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_2432
2020-12-29 15:11:18.092539615 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2446
2020-12-29 15:11:18.092555125 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_2435
2020-12-29 15:11:18.092579074 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2447
2020-12-29 15:11:18.092597751 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_2448
2020-12-29 15:11:18.092620762 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_2632
2020-12-29 15:11:18.092633555 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_2633
2020-12-29 15:11:18.092654627 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_2638
2020-12-29 15:11:18.092667565 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_2639
2020-12-29 15:11:18.092682663 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Sub_2640
2020-12-29 15:11:18.092695446 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2641
2020-12-29 15:11:18.092707578 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2645
2020-12-29 15:11:18.092751967 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_2688
2020-12-29 15:11:18.092765826 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_2689
2020-12-29 15:11:18.092778632 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2690
2020-12-29 15:11:18.092794129 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_2691
2020-12-29 15:11:18.092812019 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_2680
2020-12-29 15:11:18.092824536 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2694
2020-12-29 15:11:18.092839817 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_2683
2020-12-29 15:11:18.092863079 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2695
2020-12-29 15:11:18.092882402 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_2696
2020-12-29 15:11:18.092899259 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_2
2020-12-29 15:11:18.092912058 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_3023
2020-12-29 15:11:18.092932888 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_34
2020-12-29 15:11:18.092945469 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_35
2020-12-29 15:11:18.092958013 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_3025
2020-12-29 15:11:18.092985999 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_3026
2020-12-29 15:11:18.093008424 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_2880
2020-12-29 15:11:18.093021302 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_2881
2020-12-29 15:11:18.093042428 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_2886
2020-12-29 15:11:18.093055161 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_2887
2020-12-29 15:11:18.093070900 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Sub_2888
2020-12-29 15:11:18.093083459 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2889
2020-12-29 15:11:18.093095756 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2893
2020-12-29 15:11:18.093133254 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Slice_2936
2020-12-29 15:11:18.093147028 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Squeeze_2937
2020-12-29 15:11:18.093167926 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2938
2020-12-29 15:11:18.093183940 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_2939
2020-12-29 15:11:18.093206562 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_2928
2020-12-29 15:11:18.093220038 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2942
2020-12-29 15:11:18.093235710 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_2931
2020-12-29 15:11:18.093260203 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Unsqueeze_2943
2020-12-29 15:11:18.093279424 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Concat_2944

Urgency If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

To Reproduce I followed these steps to convert gpt2 model from transformers (pytorch version) to onnx:

python -m onnxruntime_tools.transformers.convert_to_onnx -m gpt2 --model_class GPT2LMHeadModel --output gpt2.onnx -p fp32
python -m onnxruntime_tools.optimizer_cli --input gpt2.onnx --output gpt2_fp16.onnx --num_heads 12 --hidden_size 768 --float16

Inference speed when I set input sequence length 200, batch size 5, output token number 1, no concurrent, no IO binding. No past state is used, which means the inference time includes calculating initial state with 200 input tokens and predict the next token. The average inference speed is about 300ms in a 100 test, longer than I expected...

tianleiwu commented 3 years ago

@carter54,

I suggest to use convert_to_onnx to output the optimized fp16 GPT-2 model directly like

python convert_to_onnx.py -m gpt2 --model_class GPT2LMHeadModel --output gpt2.onnx -p fp16 -o --use_gpu

I tried benchmark in a V100 machine (I changed sequence_length=1 to sequence_length=200 in benchmark_gpt2.py in the master branch):

python benchmark_gpt2.py -m gpt2 --model_class GPT2LMHeadModel --test_times 100 -o --use_gpu -p fp16 -b 5 -s 0                      

The output is like the following:

Arguments:Namespace(batch_sizes=[5], cache_dir='./cache_models', include_copy_output_latency=False, model_class='GPT2LMHeadModel', model_name_or_path='gpt2', onnx_dir='./onnx_models', optimize_onnx=True, past_sequence_lengths=[0], precision=<Precision.FLOAT16: 'fp16'>, result_csv=None, test_times=100, thread_num=-1, torchscript=False, use_gpu=True, validate_onnx=False, verbose=False)
ATen/Parallel:
        at::get_num_threads() : 24
        at::get_num_interop_threads() : 12
OpenMP 201511 (a.k.a. OpenMP 4.5)
        omp_get_max_threads() : 24
Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
        mkl_get_max_threads() : 24
Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
std::thread::hardware_concurrency() : 24
Environment variables:
        OMP_NUM_THREADS : 16
        MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP

PyTorch Version:1.6.0
Transformers Version:3.1.0
Onnxruntime Version:1.5.2
Shapes: input_ids=torch.Size([1, 1]) past=torch.Size([2, 1, 12, 1, 64]) output=torch.Size([1, 1, 50257]) present=torch.Size([2, 1, 12, 2, 64])
/bert_ort/tlwu/py36/lib/python3.6/site-packages/transformers/modeling_gpt2.py:558: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert batch_size > 0, "batch_size has to be defined and > 0"
/bert_ort/tlwu/py36/lib/python3.6/site-packages/transformers/modeling_gpt2.py:165: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  w = w / (float(v.size(-1)) ** 0.5)
/bert_ort/tlwu/py36/lib/python3.6/site-packages/transformers/modeling_gpt2.py:170: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  mask = self.bias[:, :, ns - nd : ns, :ns]
Fused LayerNormalization count: 25
Fused FastGelu count: 12
Removed Reshape and Expand count: 0
Fused Attention(with past) count: 12
Graph pruned: 0 inputs, 0 outputs and 741 nodes are removed
Graph pruned: 0 inputs, 0 outputs and 312 nodes are removed
postprocess: remove Reshape count:48
Fused FastGelu(add bias) count: 12
opset verion: 11
Output model to ./onnx_models/gpt2_past_fp16.onnx
batch_size=5, past_sequence_length=0, torch_latency=64.92, ort_latency=92.17, ort_io_latency=17.29

In my test, the latency with IO Binding is 17ms, and without IO Binding is 92ms.

carter54 commented 3 years ago

@tianleiwu Thx for the reply. I can reproduce the similar result with yours now with this:

python benchmark_gpt2.py -m gpt2 --model_class GPT2LMHeadModel --test_times 100 -o --use_gpu -p fp16 -b 5 -s 0

But following error appeared when I run

python benchmark_gpt2.py -m gpt2 --model_class GPT2LMHeadModel --test_times 100 -o --use_gpu -p fp16 -b 5 -s 200

after changing '-s 0' to '-s 200'

Arguments:Namespace(batch_sizes=[5], cache_dir='./cache_models', include_copy_output_latency=False, model_class='GPT2LMHeadModel', model_name_or_path='/home/hr/PycharmProjects/test/', onnx_dir='./onnx_models', optimize_onnx=True, past_sequence_lengths=[200], precision=<Precision.FLOAT16: 'fp16'>, result_csv=None, test_times=100, thread_num=-1, torchscript=False, use_gpu=True, validate_onnx=False, verbose=False)
ATen/Parallel:
        at::get_num_threads() : 24
        at::get_num_interop_threads() : 12
OpenMP 201511 (a.k.a. OpenMP 4.5)
        omp_get_max_threads() : 24
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
        mkl_get_max_threads() : 24
Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
std::thread::hardware_concurrency() : 24
Environment variables:
        OMP_NUM_THREADS : [not set]
        MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP

PyTorch Version:1.7.1
Transformers Version:3.1.0
Onnxruntime Version:1.5.2
/home/hr/anaconda3/lib/python3.8/site-packages/transformers/modeling_gpt2.py:710: FutureWarning: The `past` argument is deprecated and will be removed in a future version, use `past_key_values` instead.
  warnings.warn(
Shapes: input_ids=torch.Size([1, 1]) past=torch.Size([2, 1, 12, 1, 64]) output=torch.Size([1, 1, 30000]) present=torch.Size([2, 1, 12, 2, 64])
/home/hr/anaconda3/lib/python3.8/site-packages/transformers/modeling_gpt2.py:558: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert batch_size > 0, "batch_size has to be defined and > 0"
/home/hr/anaconda3/lib/python3.8/site-packages/transformers/modeling_gpt2.py:165: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  w = w / (float(v.size(-1)) ** 0.5)
/home/hr/anaconda3/lib/python3.8/site-packages/transformers/modeling_gpt2.py:170: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  mask = self.bias[:, :, ns - nd : ns, :ns]
Fused LayerNormalization count: 25
Fused FastGelu count: 12
Fused Attention(with past) count: 12
Graph pruned: 0 inputs, 0 outputs and 741 nodes are removed
Graph pruned: 0 inputs, 0 outputs and 312 nodes are removed
postprocess: remove Reshape count:48
Fused FastGelu(add bias) count: 12
opset verion: 11
Output model to ./onnx_models/model_past_fp16.onnx
Exception
Traceback (most recent call last):
  File "benchmark_gpt2.py", line 214, in main
    ort_io_outputs, ort_io_latency = Gpt2Helper.onnxruntime_inference_with_binded_io(
  File "/home/hr/anaconda3/lib/python3.8/site-packages/onnxruntime/transformers/gpt2_helper.py", line 453, in onnxruntime_inference_with_binded_io
    io_binding = Gpt2Helper.prepare_io_binding(ort_session, inputs.input_ids, inputs.position_ids,
  File "/home/hr/anaconda3/lib/python3.8/site-packages/onnxruntime/transformers/gpt2_helper.py", line 410, in prepare_io_binding
    assert position_ids.is_contiguous()
AssertionError

if I remove line 410 in gpt2_helper.py

assert position_ids.is_contiguous()

this script works fine. I can get following result:

Arguments:Namespace(batch_sizes=[5], cache_dir='./cache_models', include_copy_output_latency=False, model_class='GPT2LMHeadModel', model_name_or_path='/home/hr/PycharmProjects/test/', onnx_dir='./onnx_models', optimize_onnx=True, past_sequence_lengths=[200], precision=<Precision.FLOAT16: 'fp16'>, result_csv=None, test_times=100, thread_num=-1, torchscript=False, use_gpu=True, validate_onnx=False, verbose=False)
ATen/Parallel:
        at::get_num_threads() : 24
        at::get_num_interop_threads() : 12
OpenMP 201511 (a.k.a. OpenMP 4.5)
        omp_get_max_threads() : 24
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
        mkl_get_max_threads() : 24
Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
std::thread::hardware_concurrency() : 24
Environment variables:
        OMP_NUM_THREADS : [not set]
        MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP

PyTorch Version:1.7.1
Transformers Version:3.1.0
Onnxruntime Version:1.5.2
/home/hr/anaconda3/lib/python3.8/site-packages/transformers/modeling_gpt2.py:710: FutureWarning: The `past` argument is deprecated and will be removed in a future version, use `past_key_values` instead.
  warnings.warn(
Shapes: input_ids=torch.Size([1, 1]) past=torch.Size([2, 1, 12, 1, 64]) output=torch.Size([1, 1, 30000]) present=torch.Size([2, 1, 12, 2, 64])
/home/hr/anaconda3/lib/python3.8/site-packages/transformers/modeling_gpt2.py:558: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert batch_size > 0, "batch_size has to be defined and > 0"
/home/hr/anaconda3/lib/python3.8/site-packages/transformers/modeling_gpt2.py:165: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  w = w / (float(v.size(-1)) ** 0.5)
/home/hr/anaconda3/lib/python3.8/site-packages/transformers/modeling_gpt2.py:170: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  mask = self.bias[:, :, ns - nd : ns, :ns]
Fused LayerNormalization count: 25
Fused FastGelu count: 12
Fused Attention(with past) count: 12
Graph pruned: 0 inputs, 0 outputs and 741 nodes are removed
Graph pruned: 0 inputs, 0 outputs and 312 nodes are removed
postprocess: remove Reshape count:48
Fused FastGelu(add bias) count: 12
opset verion: 11
Output model to ./onnx_models/test_past_fp16.onnx
batch_size=5, past_sequence_length=200, torch_latency=25.55, ort_latency=33.30, ort_io_latency=4.46
Results are saved to file benchmark_result_20210108-150937.csv