openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.08k stars 2.22k forks source link

`bf16_simulation`: Issues with BF16 simulation #15527

Closed lalith-mcw closed 1 year ago

lalith-mcw commented 1 year ago

Trying to run a BF16 simulation on Intel I7-1035G, which doesn't have a native avx512_bf16 implementation

Reference: https://docs.openvino.ai/2021.4/openvino_docs_IE_DG_Bfloat16Inference.html?sw_type=switcher-python

But my model is failing with following error:

self._unet = ie.read_network(
File "ie_api.pyx", line 367, in openvino.inference_engine.ie_api.IECore.read_network
File "ie_api.pyx", line 410, in openvino.inference_engine.ie_api.IECore.read_network
RuntimeError: Check 'lower_bound[i] >= -1 && upper_bound[i] >= -1' failed at C:\j\workspace\private-ci\ie\build-windows-vs2019@3\b\repos\openvino\src\core\src\op\reshape.cpp:98:
While validating node 'v1::Reshape Reshape_28790 (/conv_in/Conv[0]:f32{2,320,64,64}, /down_blocks.0/resnets.0/norm1/Constant[0]:i64{3}) -> (f32{?,?,?})' with friendly_name 'Reshape_28790':
Dim size cannot be less than -1

When using dynamic shapes with default inferencing(fp32 inferencing using fp16 models) it doesn't fail and if the same is used in BF16 inferencing with fp16 models the model is failing. Earlier tried using dynamic shapes in FP32 inferencing with FP32 model thats working fine without issues for dynamic shapes

lalith-mcw commented 1 year ago
bf16_config = {"ENFORCE_BF16" : "YES"}
network = iecore.read_network(model = ".\\unet-camvid-onnx-0001\\intel\\unet-camvid-onnx-0001\\FP16\\unet-camvid-onnx-0001.xml", weights = ".\\unet-camvid-onnx-0001\\intel\\unet-camvid-onnx-0001\\FP16\\unet-camvid-onnx-0001.bin" )
iecore.load_network(network = network, device_name="CPU", config=bf16_config)

Tried the same command with model available online this is reading and loading the model without any errors being thrown

lalith-mcw commented 1 year ago

No nodes in my model does have a shape of -1 still unsure why the error is being popped up

dmitry-gorokhov commented 1 year ago

Hi, @lalith-mcw. It is possbile to share the model which cannot be loaded? Also could you please clarify OpenVINO version you are using? I see the exception is thrown on read_network stage, however ENFORCE_BF16 shouldn't affect this stage at all, so it is hard to predict how this can be connected.

lalith-mcw commented 1 year ago

@dmitry-gorokhov did some mistake from my end was reading fp16 weights file for a fp32 model

Still I do have issues while running the model without any precision hint settings in iGPU

dmitry-gorokhov commented 1 year ago

@lalith-mcw Could you please provide more details on issues you are facing with iGPU launch?

lalith-mcw commented 1 year ago
[ERR] 2023-02-07T08:06:35z core\src\util.cpp 87 malloc failed to allocate memory of size 1073741889
Traceback (most recent call last):
  File "C:\Users\amduser2\Documents\Lalith\openvino_fp16\fp16\demo.py", line 99, in <module>
    main(args)
  File "C:\Users\amduser2\Documents\Lalith\openvino_fp16\fp16\demo.py", line 34, in main
    engine = StableDiffusionEngine(
  File "C:\Users\amduser2\Documents\Lalith\openvino_fp16\fp16\stable_diffusion_engine.py", line 42, in __init__
    self.unet = self.core.compile_model(self._unet, device)
  File "C:\Users\amduser2\Documents\Lalith\check_fp16\lib\site-packages\openvino\runtime\ie_api.py", line 266, in compile_model
    super().compile_model(model, device_name, {} if config is None else config)
RuntimeError: bad allocation

My system does have 16GB of memory, this is regarding FP32 models in iGPU - I7-1165G unet_fp32_static.zip

Link for the binary file: https://drive.google.com/file/d/1J0aX9GlonZDy4PS2i24QIREEwvgZGM9J/view?usp=share_link

lalith-mcw commented 1 year ago

Whereas I was able to run fp16 compressed models without issues in iGPU, issue is only regarding fp32 models

vladimir-paramuzov commented 1 year ago

@lalith-mcw If FP16 model works, then most likely you have not enough memory to run FP32 model. As I can see, weights size is ~3.6GB for FP32 and intermediate tensors are also quite huge (~2.5-3GB in total), so GPU plugin requires ~6GB memory to execute this model. So if your stable diffusion demo retains original ov::Model after compilation (+3.6GB), load another models or use streams/batch, then memory consumption may exceed 16GB and that may cause bad alloc exception.

Could you check if single unet model can be successfully loaded to GPU plugin using becnhmark_app on your machine?

Also, you can try to query memory statistics from GPU plugin using ov::intel_gpu::memory_statistics property to check how much memory is used on different pipeline stages (e.g. after compilation of each model).

lalith-mcw commented 1 year ago

@vladimir-paramuzov

Failed to set property to 'GPU' which is not found in the target devices list 'CPU'!

But with query_device script - its failing due to undefined precision-hint I suppose https://github.com/openvinotoolkit/openvino/issues/15552 :

[ INFO ] Available devices:
[ INFO ] CPU :
[ INFO ]        SUPPORTED_PROPERTIES:
[ INFO ]                AVAILABLE_DEVICES:
[ INFO ]                RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 1, 1
[ INFO ]                RANGE_FOR_STREAMS: 1, 8
[ INFO ]                FULL_DEVICE_NAME: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
[ INFO ]                OPTIMIZATION_CAPABILITIES: WINOGRAD, FP32, FP16, INT8, BIN, EXPORT_IMPORT
[ INFO ]                CACHING_PROPERTIES: {}
[ INFO ]                CACHE_DIR:
[ INFO ]                NUM_STREAMS: 1
[ INFO ]                AFFINITY: Affinity.NONE
[ INFO ]                INFERENCE_NUM_THREADS: 0
[ INFO ]                PERF_COUNT: False
[ INFO ]                INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ]                PERFORMANCE_HINT: PerformanceMode.UNDEFINED
[ INFO ]                PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ]
[ INFO ] GPU :
[ INFO ]        SUPPORTED_PROPERTIES:
[ INFO ]                AVAILABLE_DEVICES: 0
[ INFO ]                RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 2, 1
[ INFO ]                RANGE_FOR_STREAMS: 1, 2
[ INFO ]                OPTIMAL_BATCH_SIZE: 1
[ INFO ]                MAX_BATCH_SIZE: 1
[ INFO ]                CACHING_PROPERTIES: {'GPU_UARCH_VERSION': 'RO', 'GPU_EXECUTION_UNITS_COUNT': 'RO', 'GPU_DRIVER_VERSION': 'RO', 'GPU_DEVICE_ID': 'RO'}
[ INFO ]                DEVICE_ARCHITECTURE: GPU: v12.0.0
[ INFO ]                FULL_DEVICE_NAME: Intel(R) Iris(R) Xe Graphics (iGPU)
[ INFO ]                DEVICE_UUID: UNSUPPORTED TYPE
[ INFO ]                DEVICE_TYPE: Type.INTEGRATED
[ INFO ]                DEVICE_GOPS: UNSUPPORTED TYPE
[ INFO ]                OPTIMIZATION_CAPABILITIES: FP32, BIN, FP16, INT8
[ INFO ]                GPU_DEVICE_TOTAL_MEM_SIZE: UNSUPPORTED TYPE
[ INFO ]                GPU_UARCH_VERSION: 12.0.0
[ INFO ]                GPU_EXECUTION_UNITS_COUNT: 96
[ INFO ]                GPU_MEMORY_STATISTICS: UNSUPPORTED TYPE
[ INFO ]                PERF_COUNT: False
[ INFO ]                MODEL_PRIORITY: Priority.MEDIUM
[ INFO ]                GPU_HOST_TASK_PRIORITY: Priority.MEDIUM
[ INFO ]                GPU_QUEUE_PRIORITY: Priority.MEDIUM
[ INFO ]                GPU_QUEUE_THROTTLE: Priority.MEDIUM
[ INFO ]                GPU_ENABLE_LOOP_UNROLLING: True
[ INFO ]                CACHE_DIR:
[ INFO ]                PERFORMANCE_HINT: PerformanceMode.UNDEFINED
[ INFO ]                COMPILATION_NUM_THREADS: 8
[ INFO ]                NUM_STREAMS: 1
[ INFO ]                PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ]                INFERENCE_PRECISION_HINT: <Type: 'undefined'>
[ INFO ]                DEVICE_ID: 0
vladimir-paramuzov commented 1 year ago

@lalith-mcw you need to call get_property(), not set_property(). Something like

stat = core.get_property('GPU', 'GPU_MEMORY_STATISTICS')
avitial commented 1 year ago

Closing this, I hope previous responses were sufficient to help you proceed. Feel free to reopen to ask any questions related to this topic.