intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.13k stars 232 forks source link

Runtime hangs on DG2 (and Gen12 iGPU maybe?) #706

Open tazz4843 opened 7 months ago

tazz4843 commented 7 months ago

I'm running into random hangs when my app is running during normal use, that began occurring several months ago, roughly September 2023. A stack trace is attached, see end for it. I was doing some digging and found this related comment with the exact same stack trace, although only on DG2 and running an unsupported kernel, while I was able to occasionally reproduce this on Gen12 iGPUs and on a much more modern kernel version. I'm using whisper.cpp with its OpenCL backend to run arbitrary speech-to-text. If one thread ends up hanging, all other runtime threads also end up hanging, spinning multiple cores to 100%.

I'm very new to all of this so please let me know if there's any information I can supply :)

Host details: GPU: Arc A770 Arch Linux w/ kernel 6.7.3-arch1-1.1 intel-compute-runtime-23.48.27912.11-1

backtrace.txt

eero-t commented 7 months ago

Looking at the backtrace:

geekboood commented 7 months ago

I have the same issue when running openvino model server

tazz4843 commented 7 months ago

Sorry this took me so long to get back to.

Looking at the backtrace:

  • 1 "scripty_stt_ser" thread is Tokyo Rust code directly hanging in futex_wait() syscall

From what I've looked at the code, it seems that this runtime worker is waiting for compute runtime code to return thus making me think this is the issue. Disabling the OpenCL runtime and falling back to CPU makes this issue completely disappear, even after weeks of runtime, compared to usually at most 1 week before it locks up and starts spinning on CPU with OpenCL integration.