[Feat] accelerate fp16 inference with cudnn library

microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

MIT License

948 stars 158 forks source link

Closed LeiWang1999 closed 2 years ago

LeiWang1999 commented 2 years ago

The cuda code nnfusion generated doesn't use tensor core.

This PR handles it, and takes about 2x speed up on fp16 models' inference.