intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.16k stars 234 forks source link

Abort from device_binary_format/patchtokens_decoder #641

Open abagusetty opened 1 year ago

abagusetty commented 1 year ago

I am running a HPC workload with PVC in explicit-scaling mode using SYCL API (via L0 plugin) and was wondering what could possibly have triggered the following error.

It was bit hard to debug the error from the application side since the application was quite complex, and the error prompt didn't quite reveal anything useful. Is this some how related to excessive allocation of device-memory ? Just wanted to reach out for pointers before I dive into manual debugging.

Version: 23.09.25812.14 Device: PVC (explicit-scaling mode)

Abort was called at 49 line in file:
/..../intel-gpu-umd/driver/intel-compute-runtime/shared/source/device_binary_format/patchtokens_decoder.cpp

Corresponds to: https://github.com/intel/compute-runtime/blob/master/shared/source/device_binary_format/patchtokens_decoder.cpp#L49

JablonskiMateusz commented 1 year ago

Hi @abagusetty could you attach kernel binary or repro steps? It looks like invalid binary

dimitryn commented 7 months ago

Hi @JablonskiMateusz , I getting same error on our Ubuntu 2024 machine. I using openvino 2024 and trying to run our model on GPU. image We using Neo release version 24.05.28454.6 Please advise how to resolve ?

hollste commented 5 days ago

I recently got this error as well running Debian 12 and release 24.39.31294.12. Error only appeared when running as a non-sudo user. I solved it by deleting the neo_compiler_cache folder rm -r /home/<user>/.cache/neo_compiler_cache/. I would assume the cache folder was corrupt somehow causing this issue. Does it make sense?