microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.78k stars 2.94k forks source link

Inferencing FP16 model using onnxruntime #21737

Open navyverma opened 3 months ago

navyverma commented 3 months ago

Describe the issue

I have a detector with FP16 and FP32 weights(onnx). Below is the code for FP32 which gives the correct detections when inferencing on FP32 weights.

void  process_image(cv::Mat& preProcessedImage,std::vector<Ort::Value>& outputTensors)
{
    std::vector<Ort::Value> inputTensors;

    std::vector<float> inputTensorValues(inputTensorSize);
    std::copy(preProcessedImage.begin<float>(), preProcessedImage.end<float>(), inputTensorValues.begin());
    inputTensors.push_back(Ort::Value::CreateTensor<float>(
    memoryInfo, inputTensorValues.data(), inputTensorSize,
    inputTensorShape.data(), inputTensorShape.size()
    ));
    outputTensors = session.Run(Ort::RunOptions{ nullptr },
        inputNames.data(),
        inputTensors.data(),
        session.GetInputCount(),
        outputNames.data(),
        session.GetOutputCount());
}

Below is the code for FP16 which gives the correct grabage detections when inferencing on FP16 weights.

void  i2v::OnnxRuntimeProcessor::process_image16(cv::Mat& preProcessedImage,std::vector<Ort::Value>& outputTensors)
{
    std::vector<Ort::Value> inputTensors;

    std::vector<Ort::Float16_t> fp16_values;
    fp16_values.reserve(inputTensorSize);
    for (size_t i = 0; i < inputTensorSize; ++i) 
    {
        auto a = preProcessedImage.at<float>(i);
        Ort::Float16_t b (a);
        fp16_values.push_back(b);
        }

    inputTensors.push_back(Ort::Value::CreateTensor<Ort::Float16_t>(memoryInfo, fp16_values.data(), inputTensorSize,inputTensorShape.data(), inputTensorShape.size()));

    outputTensors = session.Run(Ort::RunOptions{ nullptr },
        inputNames.data(),
        inputTensors.data(),
        session.GetInputCount(),
        outputNames.data(),
        session.GetOutputCount());
}

What could be the issue in FP16 inferencing?

To reproduce

Use above process_image16 method to reproduce the issue.

Urgency

No response

Platform

Linux

OS Version

20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.3

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

yuslepukhin commented 3 months ago

That's something you would want to debug, or give us the model if possible. Otherwise, there is nothing to go with.

navyverma commented 2 months ago

@yuslepukhin I am asking whether FP16 implementation is correct or not?

tianleiwu commented 2 months ago

@navyverma,

CPU provider does not support FP16 for most operators. So it does not make sense to run float16 in CPU. Even float16 model can run (internally ORT will add fp16 <-> fp32 conversions), float32 model possibly runs faster than float16 model in CPU.

If you want to get benefit of fp16 model, try CUDA provider on a GPU instead.

Also, try onnxruntime 1.19 which might have fixed some bugs.

navyverma commented 2 months ago

@tianleiwu Thanks for your response. I will try latest onnxruntime.

majisama commented 2 months ago

That's something you would want to debug, or give us the model if possible. Otherwise, there is nothing to go with.

Have you guys completely missed the fact that if the model input is float16 it doesn't work at all? float16 mode output is completely wrong, mainly because of the data preprocessing stage, but it works fine in python.