vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.61k stars 411 forks source link

[Feature]: Add precision control setting to OpenVINO #3494

Open hygehyge opened 12 hours ago

hygehyge commented 12 hours ago

Feature description

As described here, OpenVINO has precision control hints.
This allows users to select which is important, ACCURACY or PERFORMANCE.
And in my environment(OpenVINO + Intel Iris Xe iGPU), these flags affect to inference result significantly.

Version Platform Description

SD.Next:hash=5c684cb0 branch=master

Disty0 commented 5 hours ago

OpenVINO will use FP16 by default on GPU or BF16 on CPU if the device supports them, otherwise FP32.

Also is there any point to add another option for FP32? You will double the VRAM usage and double the generation time for no reason, FP16 and FP32 outputs are basically the same. And almost every model on CivitAI or Huggingface is an FP16 model. You will do nothing but wasting compute and memory by using FP32.

hygehyge commented 3 hours ago

@Disty0

OpenVINO will use FP16 by default on GPU or BF16 on CPU if the device supports them, otherwise FP32.

Yes, that's right.
And because of this, if model is trained in FP32 and use it with OpenVINO (which is my case), the inference result is different from expected.
If we can control precision hint for device that supports FP32 will fix this isuue.

For example, my iGPU supports FP32 and INFERENCE_PRECISION_HINT is FP16.

[ INFO ] GPU : [ INFO ] SUPPORTED_PROPERTIES: [ INFO ] AVAILABLE_DEVICES: 0 [ INFO ] DEVICE_ARCHITECTURE: GPU: vendor=0x8086 arch=v12.3.0 [ INFO ] FULL_DEVICE_NAME: Intel(R) Iris(R) Xe Graphics (iGPU) [ INFO ] OPTIMIZATION_CAPABILITIES: $\color{red}{\textsf{FP32}}$, BIN, FP16, INT8, EXPORT_IMPORT [ INFO ] INFERENCE_PRECISION_HINT: <Type: $\color{red}{\textsf{'float16'}}$ >

And when I set execution_mode to ACCURACY like below, the result seem to be as expected.
(it tooks about 2s/it, when FP16 mode tooks about 1s/it, as you mentioned.)

# in modules/intel/openvino/__init__.py
core.set_property(
    "GPU",
    {hints.execution_mode: hints.ExecutionMode.ACCURACY},
)

So I think to add precision control setting is useful.