microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
MIT License
952 stars 158 forks source link

Fix FP16 codegen for ONNXmodels #396

Closed jlxue closed 2 years ago

jlxue commented 2 years ago

Use CUDAExecutionProvider by default to optimize FP16 models, this can avoid unintentionaly cast all fp16 inputs/weights to fp32.