GPT2 inference performance need to be boosted, if we want to beat pytorch

Is your feature request related to a problem? Please describe. When using mindnlp to infer GPT2, I found that the inference speed is 10X slower than pytorch. Here is the torch version implementation: https://github.com/graykode/gpt-2-Pytorch

The hardware I use is Nvidia V100. MindSpore version: 2.2.12, 2.1.1 Pytorch version: 2.2.0 args I use: ms_dtype=mindspore.float16, use_cache=True

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

mindspore-lab / mindnlp

GPT2 inference performance need to be boosted, if we want to beat pytorch #990