Fix FP16 codegen for ONNXmodels

microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

MIT License

952 stars 158 forks source link

Closed jlxue closed 2 years ago

jlxue commented 2 years ago

Use CUDAExecutionProvider by default to optimize FP16 models, this can avoid unintentionaly cast all fp16 inputs/weights to fp32.