intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.1k stars 229 forks source link

Gemini Lake clpeak fails on Ubuntu 22.04 latest kernel #679

Open looi opened 9 months ago

looi commented 9 months ago

System: Dell Wyse 5070 Intel Celeron J4105 (Gemini Lake) Intel Compute Runtime 23.30.26918.9 installed with official instructions.

What works

What doesn't work

Ar-Ray-code commented 7 months ago

I have the same problem on both Celeron J4125 and N4000. I am seeing the following error in GPU inference in OpenVINO 2023.1.0.

terminate called after throwing an instance of 'InferenceEngine::GeneralError'
  what(): [ GENERAL_ERROR ] Check 'false' failed at src/plugins/intel_gpu/src/plugin/program.cpp:401:
GPU program build failed!
[GPU] clWaitForEvents, error code: -14

Note that Linux kernel 5.15 does not have the problem.

nyanmisaka commented 7 months ago

I heard about this issue a year ago but I no longer have a GLK device now. This is a kernel regression because Gemini Lake/GLK only fails when using the new kernel.

@looi @Ar-Ray-code Better to file an issue in drm/intel. https://gitlab.freedesktop.org/drm/intel/-/issues/?label_name%5B%5D=Community

looi commented 6 months ago

I don't think this is necessarily a kernel regression, because as I have stated above, vulkan compute works fine.

Personally, I have switched to using vulkan. The performance is comparable (especially making proper use of vulkan subgroups), but more importantly, it seems to be much more stable on both Windows and Linux. Intel Compute Runtime / OpenCL has weird issues like this one. Vulkan also seems to work much better on non-intel GPUs, especially nvidia, where they refuse to support basic features like subgroups and half-precision floats in OpenCL. So I feel like vulkan is the future and OpenCL is dying anyways.

nyanmisaka commented 6 months ago

What works Ubuntu 22.04 kernel 5.15: Doesn't work out of the box, but works with i915.enable_hangcheck=0

What doesn't work Ubuntu 22.04 kernel 6.2

Your input suggests this is a kernel regression. The only difference between whether it works or not is the kernel version, bisect the commit between the two should find the culprit.

This isn't the first time I've seen i915 regression, last time it even broke both the Vulkan compute and OpenCL.

looi commented 6 months ago

I agree that a kernel change broke Intel Compute Runtime. I guess whether or not it's a kernel regression is a subjective question depending on what exactly caused the breakage. Maybe Intel Compute Runtime is making incorrect assumptions about i915 or relying on undefined behavior, in which case it would not be a kernel regression. Given that vulkan compute still works, I think it is a likely possibility.