intel / onnxruntime

ONNX Runtime: cross-platform, high performance scoring engine for ML models
MIT License
56 stars 22 forks source link

Fix an issue with applying FP16 transformations for the CPU device #346

Closed sspintel closed 6 months ago

sspintel commented 6 months ago

As illustrated here, the ConvertFP32ToFP16 pass is meant for converting FP32 weights to FP16 weights. The CPU device doesn't have the capabilities to infer with the FP16 precision (See supported inference data types on CPU).

The original PR intended it to be used for the GPU but it is no longer advised to use this pass while using inference_precision hint for the GPU.

The CPU device will assume FP32 precision even if the inference hint is explicitly set to FP16 because the device lacks the support for FP16 inference precision. (See inference precision)

This fix is required to run Whisper and potentially any such model in the (pseudo) CPU_FP16 mode.

preetha-intel commented 6 months ago

@sspintel I agree with your changes. We can remove the convert pass in read_model. I approve on this PR. In addition, as a follow up in another PR We have to explore on the ways to support CPU_FP16 as for now its equivalent to CPU_FP32 and the device CPU_FP16 is currently a stale option with no changes in FEIL/FIL compare to CPU_FP32.

sspintel commented 6 months ago

I agree that this precision has been a stale option for a while and I'm not sure why it was added in the first place. The alternative is to use this slot to enable BF16 wherever applicable.

sspintel commented 6 months ago

In that case, I'll continue the work in a separate branch for whisper. Thanks.

preetha-intel commented 6 months ago

Not Pulling in Since not priority for QDQ Models until March Release. Unless Preetha says it is useful for QDQ Models

@sfatimar - This is not with QDQ but generally this is deprecated code and we can remove it. As it helps Whisper and doesn't have any affect on NPU we can merge this and add it to March release.

sspintel commented 6 months ago

Closing as this fix will be taken up in a later PR.