microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.86k stars 2.94k forks source link

[question] yolov5-onnx-float16 not improve on GPU #11151

Open MrRace opened 2 years ago

MrRace commented 2 years ago

I use torch.onnx.export to convert yolov5 which is Pytorch model to onnx format. The raw onnx model is model.onnx. Then I use onnxmltools to convert the float32 onnx to float16 onnx:

from onnx import load_model, save_model
from onnxmltools.utils import float16_converter
onnx_model = load_model("./models/model.onnx")
trans_model = float16_converter.convert_float_to_float16(onnx_model,keep_io_types=True)
save_model(trans_model, "./models/model_fp16.onnx")

When use onnxruntime-gpu(1.8) to do inference for model_fp16.onnx it does not improve~ Why?

skottmckay commented 2 years ago

Without info on how exactly you're running the model it's hard to say.

MrRace commented 2 years ago

Without info on how exactly you're running the model it's hard to say. Both T4 GPU and V100 GPU can not improve the inference, my inference code:

import onnxruntime as rt
sess_options = rt.SessionOptions()
# Set graph optimization level
sess_options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_ENABLE_EXTENDED
# To enable model serialization after graph optimization set this
sess_options.optimized_model_filepath = "<model_output_path\optimized_model.onnx>"
session = rt.InferenceSession("<model_path>", sess_options)
MrRace commented 2 years ago

@skottmckay Here is some previous experiments

skottmckay commented 2 years ago

You could temporarily set the logging level to see what EP nodes are assigned to, to check they're all assigned to the CUDA EP. Look for 'Node placements' in the output.

sess_options.log_severity_level = 0  # VERBOSE level

If all nodes are assigned to the CUDA EP it's possible some of the operators don't have an fp16 implementation, causing the insertion of Cast nodes to cast between fp32 and fp16. You could inspect the optimized model written to optimized_model_filepath to see if that is the case.