Open pjaaskel opened 2 years ago
@pjaaskel could you share more details about neo driver version?
Seems I have quite an old version (1.0.0). I've been under assumption that I'd get updates through apt package `intel-oneapi-runtime-opencl', but seems it's only the CPU driver? I'm still supposed to upgrade the GPU OpenCL driver via the github .debs? I'm confused. I'll try upgrading via debs to see if the latest version fixes it.
please run clinfo
and check Driver Version
Seems I still get the same CL_OUT_OF_RESOURCES problem with Driver Version 22.43.24558
. Works when I prune down the number of tests. Is there a way I can debug the actual reason (which resource it runs out) somehow?
@pjaaskel, we are getting a similar error. Did you find out how to debug the issue?
Interestingly enough, we only get the issue on Intel GPUs (1550 Max) but it runs perfectly on intel CPUs and Nvidia GPUs.
Is there a (relatively low) size limit for the built SPIR-V modules? I'm getting CL_OUT_OF_RESOURCES when trying to build (via the CHIP-SPV runtime) a unit test in rocPRIM which has a bunch of test kernels. Omitting some of the kernels makes the test pass (I can also enable the omitted ones in turn and they pass if I disable some of the others). This reproduces both via OpenCL and LevelZero.
In this case it's not a question of a large monolithic kernel that might fill up an instruction memory, but a dozen or so of smaller kernels which are launched separately, thus a lazy kernel binary deployment strategy at launch time should avoid an imem limit issue, if that's the case here.
The kernels use a bit of shared memory, but not much. Is there a way to dump more info of the reason for out of resources in the driver?
SPIR-Vs of the working and non-working cases: spvs.zip