Open qnixsynapse opened 3 months ago
Yeah, it's not in an usable state yet, especially on Arc...
Any update on this?, I tried using 2024.0 and 2024.1 version of oneapi and with latest version of jax as well as the version prescribed in the readme. Both didnt work.
using 2024.0 version of oneapi and jax 0.4.25, here is the error I am getting on a Intel GPU Max system:
>>> import jax
>>> jax.local_devices()
INFO: Intel Extension for OpenXLA version: 0.3.0, commit: 9a484818
Jax plugin configuration error: Exception when calling jax_plugins.intel_extension_for_openxla.initialize()
Traceback (most recent call last):
File "/home/sdp/.conda/envs/jax/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 482, in discover_pjrt_plugins
plugin_module.initialize()
File "/home/sdp/.conda/envs/jax/lib/python3.11/site-packages/jax_plugins/intel_extension_for_openxla/__init__.py", line 39, in initialize
c_api = xb.register_plugin("xpu",
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sdp/.conda/envs/jax/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 544, in register_plugin
c_api = xla_client.load_pjrt_plugin_dynamically(plugin_name, library_path) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sdp/.conda/envs/jax/lib/python3.11/site-packages/jaxlib/xla_client.py", line 155, in load_pjrt_plugin_dynamically
return _xla.load_pjrt_plugin(plugin_name, library_path, c_api=None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to open /home/sdp/.conda/envs/jax/lib/python3.11/site-packages/jax_plugins/intel_extension_for_openxla/pjrt_plugin_xpu.so: /home/sdp/.conda/envs/jax/lib/python3.11/site-packages/jax_plugins/intel_extension_for_openxla/pjrt_plugin_xpu.so: undefined symbol: _ZNK4sycl3_V16detail16AccessorBaseHost25isMemoryObjectUsedByGraphEv
[CpuDevice(id=0)]
>>>
Using latest jax, 2024.1 oneapi and latest openxla I am getting a core dump.
Any update on this?, I tried using 2024.0 and 2024.1 version of oneapi and with latest version of jax as well as the version prescribed in the readme. Both didnt work.
using 2024.0 version of oneapi and jax 0.4.25, here is the error I am getting on a Intel GPU Max system:
undefined symbol
error if works with 2024.0. Please rebuild Extension in your environment if you don't want to switch to 2024.1.I am currently using 2024.1 and it stopped detecting my GPU.
The output of jax.local_devices()
shows [CpuDevice(id=0)]
I am currently using 2024.1 and it stopped detecting my GPU.
The output of
jax.local_devices()
shows[CpuDevice(id=0)]
Do you meet the undefined symbol error even with 2024.1? If so, can you help to
pip list |egrep jax
?dpkg -l |egrep level-zero-gpu
?The undefined symbol is fixed when upgraded to 2024.1. However, it stopped detecting the pjrt plugin/GPU.
$ pip list |egrep jax
egrep: warning: egrep is obsolescent; using grep -E
jax 0.4.25
jaxlib 0.4.25
$ sudo pacman -Qs level-zero
local/intel-compute-runtime 24.13.29138.7-1
Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver
local/level-zero-loader 1.16.15-1
API for accessing low level interfaces in oneAPI platform devices (loader)
@qnixsynapse JAX v0.4.25 doesn't match released Extension v0.3.0, so please try to use any of the following 2 solutions first:
More version-matching info can be found in the release notes: https://github.com/intel/intel-extension-for-openxla/releases. We will add a table in the home page to tell the matching info later.
@Zantares Same issue with v0.4.24:
Edit: It seems a python version mismatch, reason why pip is pulling a pre release old pywheel of openxla plugin. Arch Linux has python version: 3.12.3, the new wheels are built with 3.11.
@Zantares Same issue with v0.4.24:
Edit: It seems a python version mismatch, reason why pip is pulling a pre release old pywheel of openxla plugin. Arch Linux has python version: 3.12.3, the new wheels are built with 3.11.
Thanks for the trial and the new info. We will check the error with Python 3.12 on Arc.
BTW, isn't Extension v0.3.0 installed but an old version if it's mismatched?
Seems like it. Pip is refusing to install the 0.3.0 because it is trying to search for a wheel built for Python 3.12
I guess the last one is the one getting installed.
Also, this:
That's it. We will discuss it internally first to see if we can release more Python packages (means more test processes). Right now you can try to build it by yourself if Python 3.12 is hard requested.
I tried to build it today, hit with an error. Tbh, I am not familiar with bazel.
$ bazel build //xla/tools/pip_package:build_pip_package
Starting local Bazel server and connecting to it...
INFO: Reading rc options for 'build' from /home/qnixsynapse/.cache/work/intel-extension-for-openxla/.bazelrc:
'build' options: --nocheck_visibility --announce_rc --config=gpu --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone --strategy=Genrule=standalone -c opt --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --distinct_host_configuration=false
ERROR: --distinct_host_configuration=false :: Unrecognized option: --distinct_host_configuration=false
Removing(commenting that option) gives this error:
$ bazel build //xla/tools/pip_package:build_pip_package
Starting local Bazel server and connecting to it...
INFO: Reading rc options for 'build' from /home/qnixsynapse/.cache/work/intel-extension-for-openxla/.bazelrc:
'build' options: --nocheck_visibility --announce_rc --config=gpu --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone --strategy=Genrule=standalone -c opt --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --distinct_host_configuration=false
ERROR: --distinct_host_configuration=false :: Unrecognized option: --distinct_host_configuration=false
(mldev) $ nano .bazelrc
(mldev) $ bazel build //xla/tools/pip_package:build_pip_package
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=145
INFO: Reading rc options for 'build' from /home/qnixsynapse/.cache/work/intel-extension-for-openxla/.bazelrc:
Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/qnixsynapse/.cache/work/intel-extension-for-openxla/.bazelrc:
'build' options: --nocheck_visibility --announce_rc --config=gpu --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone --strategy=Genrule=standalone -c opt --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include
INFO: Reading rc options for 'build' from /home/qnixsynapse/.cache/work/intel-extension-for-openxla/.xla_extension_configure.bazelrc:
'build' options: --action_env PYTHON_BIN_PATH=/home/qnixsynapse/.pyenvs/mldev/bin/python --action_env PYTHON_LIB_PATH=/home/qnixsynapse/.pyenvs/mldev/lib/python3.12/site-packages --python_path=/home/qnixsynapse/.pyenvs/mldev/bin/python --action_env TF_CXX11_ABI_FLAG=1 --action_env TF_NEED_SYCL=1 --action_env SYCL_TOOLKIT_PATH=/opt/intel/oneapi/compiler/latest --action_env LD_LIBRARY_PATH=/opt/intel/oneapi/compiler/latest/lib:/opt/intel/oneapi/compiler/latest/compiler/lib/intel64_lin --action_env LIBRARY_PATH=/opt/intel/oneapi/compiler/latest/lib:/opt/intel/oneapi/compiler/latest/compiler/lib/intel64_lin
INFO: Found applicable config definition build:gpu in file /home/qnixsynapse/.cache/work/intel-extension-for-openxla/.bazelrc: --crosstool_top=@local_config_sycl//crosstool:toolchain --define=using_sycl=true --repo_env TF_NEED_SYCL=1 --define=tensorflow_mkldnn_contraction_kernel=0 --cxxopt=-std=c++17 --host_cxxopt=-std=c++17
WARNING: --enable_bzlmod is set, but no MODULE.bazel file was found at the workspace root. Bazel will create an empty MODULE.bazel file. Please consider migrating your external dependencies from WORKSPACE to MODULE.bazel. For more details, please refer to https://github.com/bazelbuild/bazel/issues/18958.
DEBUG: /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/xla/third_party/repo.bzl:132:14:
Warning: skipping import of repository 'llvm-raw' because it already exists.
DEBUG: /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/tsl/third_party/repo.bzl:132:14:
Warning: skipping import of repository 'nvtx_archive' because it already exists.
ERROR: Traceback (most recent call last):
File "/home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/build_bazel_rules_apple/apple/internal/rule_support.bzl", line 221, column 36, in <toplevel>
deps_cfg = apple_common.multi_arch_split,
Error: 'apple_common' value has no field or method 'multi_arch_split'
ERROR: error loading package '@@com_github_grpc_grpc//': at /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/com_github_grpc_grpc/bazel/grpc_build_system.bzl:28:6: at /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/build_bazel_rules_apple/apple/ios.bzl:33:5: at /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/build_bazel_rules_apple/apple/internal/ios_rules.bzl:71:5: initialization of module 'apple/internal/rule_support.bzl' failed
ERROR: /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/tsl/tsl/BUILD:460:11: error loading package '@@com_github_grpc_grpc//': at /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/com_github_grpc_grpc/bazel/grpc_build_system.bzl:28:6: at /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/build_bazel_rules_apple/apple/ios.bzl:33:5: at /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/build_bazel_rules_apple/apple/internal/ios_rules.bzl:71:5: initialization of module 'apple/internal/rule_support.bzl' failed and referenced by '@@tsl//tsl:grpc++'
ERROR: Analysis of target '//xla/tools/pip_package:build_pip_package' failed; build aborted: Analysis failed
INFO: Elapsed time: 92.196s, Critical Path: 0.02s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully
FAILED:
Fetching repository @@bazel_tools~cc_configure_extension~local_config_cc; starting
Fetching repository @@double_conversion; starting
Fetching repository @@eigen_archive; starting
Fetching repository @@com_googlesource_code_re2; starting
Fetching repository @@snappy; starting
Fetching repository @@nsync; starting
Fetching https://storage.googleapis.com/mirror.tensorflow.org/github.com/google/double-conversion/archive/v3.2.0.tar.gz; 252.1 KiB (3.7%)
Fetching https://storage.googleapis.com/.../eigen-aa6964bf3a34fd607837dd8123bc42465185c4f8.tar.gz; 79.9 KiB (2.9%) ... (12 fetches)
Hi @qnixsynapse thanks for your trial, but I have never saw this error before... Have you missed any steps in the instruction https://github.com/intel/intel-extension-for-openxla?tab=readme-ov-file#install-from-source-build like configure
? If not, can you list your environment info including HW, OS and related SW? I can bridge a support engineer to you if you don't know what info is needed here.
Yes. I was following that while trying to build. And I ran ./configure
before.
I tried to build two times but I am getting the exact same error..
As I mentioned before, I am not familiar with bazel. My OS is an up to date Arch Linux on an Intel 12th gen (CPU + ARC A750) PC. The Intel oneapi basekit is latest and is installed in /opt.
Just curious, what is the version of bazel that is used to build this package?
I'm using Bazel 6.1.0. The configure script will check Bazel version: https://github.com/intel/intel-extension-for-openxla/blob/main/configure.py#L801, but I'm not sure what will happen if use a very new Bazel...
Got it. The version of bazel on my system is 7.1.1. This might be creating the problem.
@qnixsynapse Thanks for your reply. Do you still have build problem?
@feng-intel Yes, with the latest release tag.
Hi @qnixsynapse, is it possible to install some lower version of python like python 3.11 in your environment? If yes, maybe we can back to install the package instead of build from source.
I tried to go through the install step by step on one linux unbuntu 22.04 + Arc 770 machine. follow the step in release 0.3 version: https://github.com/intel/intel-extension-for-openxla/tree/r0.3 ( pleas note : not the main page of main branch)
1) conda create -n openxla python=3.11 System restart required Last login: Wed May 22 17:06:20 2024 from 10.124.233.186 (base) yhu5@arc770-tce:~$ conda create -n openxla python=3.11 Retrieving notices: ...working... done Channels:
environment location: /home/yhu5/miniconda3/envs/openxla
added / updated specs:
2) (base) yhu5@arc770-tce:~$ conda activate openxla (openxla) yhu5@arc770-tce:~$ pip install jax==0.4.24 jaxlib==0.4.24 (please not here, it is not 0.4.25) 3) pip install --upgrade intel-extension-for-openxla 4) (openxla) yhu5@arc770-tce:~$ pip list Package Version
intel-extension-for-openxla 0.3.0 jax 0.4.24 jaxlib 0.4.24 ml-dtypes 0.4.0 numpy 1.26.4 opt-einsum 3.3.0 pip 24.0 scipy 1.11.4 setuptools 69.5.1 wheel 0.43.0
5) i have oneAPI 2024.1 installed , so call them directly: source /opt/intel/oneapi/compiler/2024.1/env/vars.sh source /opt/intel/oneapi/mkl/2024.1/env/vars.sh
6) please verify if your Arc 750 are recognized (openxla) yhu5@arc770-tce:~$ sycl-ls [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2024.17.3.0.08_160000] [opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000] [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.52.28202.51] [opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics 770 OpenCL 3.0 NEO [23.52.28202.51] [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.28202] [ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) UHD Graphics 770 1.3 [1.3.28202]
7) prepare the test_jax.py as https://github.com/intel/intel-extension-for-openxla/tree/r0.3
import jax
import jax.numpy as jnp
import jax
print("jax.local_devices(): ", jax.local_devices())
@jax.jit
def lax_conv():
key = jax.random.PRNGKey(0)
lhs = jax.random.uniform(key, (2,1,9,9), jnp.float32)
rhs = jax.random.uniform(key, (1,1,4,4), jnp.float32)
side = jax.random.uniform(key, (1,1,1,1), jnp.float32)
out = jax.lax.conv_with_general_padding(lhs, rhs, (1,1), ((0,0),(0,0)), (1,1), (1,1))
out = jax.nn.relu(out)
out = jnp.multiply(out, side)
return out
print(lax_conv())
8) $python test_jax.py and first run, i got the error 'version GLIBCXX_3.4.30' not found, so upgrade libstdc++ to the latest, conda
conda install libstdcxx-ng==12.2.0 -c conda-forge
then everything should be fine. Please feel free to try them and let us know the result.
@yinghu5 Actually, I use venv not conda because the latter has some issues with pip dependencies on some of my projects. Looks like I have to install conda and give it a try to see if it is possible to create an environment with python 3.11.
Thank you for your help. But since many Linux distros will eventually switch/already switched to python 3.12, I think it will be better to support python 3.12 as well.
This is the error I am getting: