intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.26k stars 743 forks source link

deploy-sycl-toolchain build target missing L0 library dependencies #15986

Open EwanC opened 3 weeks ago

EwanC commented 3 weeks ago

Describe the bug

From testing the L0 loader bump PRs:

I have discovered that the deploy-sycl-toolchain build target does not include all the necessary L0 dependencies.

To reproduce

With git branch from https://github.com/intel/llvm/pull/15967

$ python3 buildbot/configure.py
$ python3 buildbot/compile.py
$ cd build
$ ./bin/sycl-ls
SYCL Exception encountered: Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
$ ninja deploy-sycl-toolchain
$ ./bin/sycl-ls
SYCL Exception encountered: Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
$  ninja libze_tracing_layer.so
$ ./bin/sycl-ls
<works correctly>

In this case manually building libze_tracing_layer.so fixes the issue, however I am not knowledgeable about DPC++ CMake or L0 to know if this is the only missing dependency.

EDIT On another system I also had to manually build the ninja libze_validation_layer.so target

Environment

Additional context

No response

ianayl commented 3 days ago

I am unfortunately not familiar with L0 either, but it would seem that libze_tracing_layer.so was intentionally not built by default: It is not required for L0 to run.

From my experimentation, I noticed that older commits were using an UR was using an incompatible version of libze_tracing_layer.so provided via LD_LIBRARY_PATH. Are you still getting this error if you remove libze_tracing_layer.so from LD_LIBRARY_PATH? I suspect using a more recent version of the UR, i.e. https://github.com/intel/llvm/commit/18737897b57d535608215f98a848ce420089daad would fix the issue, but I would need to do further experimentation. Using the most recent version of the UR should fix this error on the latest L0 drivers, although be warned that the latest versions of L0 drivers will most likely have issues with the UR, runtime, etc

In summary:

Please let me know if this fixes the issue you, and if I'm good to close the ticket. Thanks!