Model Server hangs when inference on Intel GPU

geekboood commented 8 months ago

Describe the bug Inference hangs when using A770

Logs server logs

[2024-02-23 15:54:44.147][2184239][serving][error][modelinstance.cpp:1193] Async caught an exception Internal inference error: Exception from src/inference/src/infer_request.cpp:256:
clFlush

[2024-02-23 15:55:08.725][2184301][serving][error][modelinstance.cpp:1193] Async caught an exception Internal inference error: Exception from src/inference/src/infer_request.cpp:256:
clFlush

[2024-02-23 15:55:33.291][2184634][serving][error][modelinstance.cpp:1193] Async caught an exception Internal inference error: Exception from src/inference/src/infer_request.cpp:256:
clFlush

kernel logs

[1331510.701350] i915 0000:03:00.0: [drm] GPU HANG: ecode 12:10:85def5fa, in ovms [809080]
[1331510.701372] i915 0000:03:00.0: [drm] ovms[809080] context reset due to GPU hang
[1331516.943270] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f0e!
[1331517.368428] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f16!
[1331517.543874] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f18!
[1331531.202434] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f1a!
[1331531.204035] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f1c!
[1331531.204263] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f1e!
[1331531.204844] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f20!
[1331531.210043] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f22!
[1331531.210182] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f24!
[1331531.212604] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f26!
[1331531.212840] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f28!
[1331531.214194] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f2a!
[1331531.214293] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f2e!
[1331531.214379] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f2c!
[1331531.218911] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f30!
[1331531.224320] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f32!
[1331531.224845] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f36!

Configuration OpenVINO Model Server 2023.3.4e91aac76 OpenVINO backend 2023.3.0.13775.ceeafaf64f3 Bazel build flags: --strip=always --define MEDIAPIPE_DISABLE=0 --cxxopt=-DMEDIAPIPE_DISABLE=0 --define PYTHON_DISABLE=1 --cxxopt=-DPYTHON_DISABLE=1

mzegla commented 8 months ago

Could you check your model with OpenVINO benchmark app: https://docs.openvino.ai/2023.3/openvino_sample_benchmark_tool.html ? Run with -d GPU option to run on GPU. Please also share the command you use to start OVMS.

p-durandin commented 8 months ago

@geekboood please provide information about Linux kernel and GPU driver versions

geekboood commented 8 months ago

My environment is pretty complicated... My Host server uses Debian, and i915 kernel driver. I passthrough the GPU to LXC container that installed ubuntu 22.04 Intel GPU dependencies. And I run multiple models on a single GPU (I tweaked the compute runtime parameter to use Multi-CCS Modes which should be helpful), each model is part of a inference pipeline. When the pipeline goes through high loads, sometimes model server hangs.

openvinotoolkit / model_server

Model Server hangs when inference on Intel GPU #2336