intel / intel-extension-for-openxla

Apache License 2.0
33 stars 9 forks source link

feat: Support Python 3.12 #28

Open qnixsynapse opened 3 months ago

qnixsynapse commented 3 months ago

This is the error I am getting:

jax_plugins/intel_extension_for_openxla/pjrt_plugin_xpu.so: undefined symbol: _ZNK4sycl3_V16detail16AccessorBaseHost25isMemoryObjectUsedByGraphEv
fredlarochelle commented 3 months ago

Yeah, it's not in an usable state yet, especially on Arc...

rahulunair commented 2 months ago

Any update on this?, I tried using 2024.0 and 2024.1 version of oneapi and with latest version of jax as well as the version prescribed in the readme. Both didnt work.

using 2024.0 version of oneapi and jax 0.4.25, here is the error I am getting on a Intel GPU Max system:

>>> import jax
>>> jax.local_devices()
INFO: Intel Extension for OpenXLA version: 0.3.0, commit: 9a484818
Jax plugin configuration error: Exception when calling jax_plugins.intel_extension_for_openxla.initialize()
Traceback (most recent call last):
  File "/home/sdp/.conda/envs/jax/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 482, in discover_pjrt_plugins
    plugin_module.initialize()
  File "/home/sdp/.conda/envs/jax/lib/python3.11/site-packages/jax_plugins/intel_extension_for_openxla/__init__.py", line 39, in initialize
    c_api = xb.register_plugin("xpu",
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sdp/.conda/envs/jax/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 544, in register_plugin
    c_api = xla_client.load_pjrt_plugin_dynamically(plugin_name, library_path)  # type: ignore
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sdp/.conda/envs/jax/lib/python3.11/site-packages/jaxlib/xla_client.py", line 155, in load_pjrt_plugin_dynamically
    return _xla.load_pjrt_plugin(plugin_name, library_path, c_api=None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to open /home/sdp/.conda/envs/jax/lib/python3.11/site-packages/jax_plugins/intel_extension_for_openxla/pjrt_plugin_xpu.so: /home/sdp/.conda/envs/jax/lib/python3.11/site-packages/jax_plugins/intel_extension_for_openxla/pjrt_plugin_xpu.so: undefined symbol: _ZNK4sycl3_V16detail16AccessorBaseHost25isMemoryObjectUsedByGraphEv
[CpuDevice(id=0)]
>>>

Using latest jax, 2024.1 oneapi and latest openxla I am getting a core dump.

Zantares commented 2 months ago

Any update on this?, I tried using 2024.0 and 2024.1 version of oneapi and with latest version of jax as well as the version prescribed in the readme. Both didnt work.

using 2024.0 version of oneapi and jax 0.4.25, here is the error I am getting on a Intel GPU Max system:

  1. 2024.0 is not supported if you are using the Extension from pip. It's built by 2024.1 and will show undefined symbol error if works with 2024.0. Please rebuild Extension in your environment if you don't want to switch to 2024.1.
  2. You mentioned that JAX v0.4.25 is used, so maybe you are suffering similar issue as https://github.com/intel/intel-extension-for-openxla/issues/18. Each Extension version can only work with specific JAX version, such as 0.2.0 vs. 0.4.20. 0.3.0 vs. 0.4.24... Currently you can find the match info in the release notes: https://github.com/intel/intel-extension-for-openxla/releases .
qnixsynapse commented 2 months ago

I am currently using 2024.1 and it stopped detecting my GPU.

The output of jax.local_devices() shows [CpuDevice(id=0)]

Zantares commented 2 months ago

I am currently using 2024.1 and it stopped detecting my GPU.

The output of jax.local_devices() shows [CpuDevice(id=0)]

Do you meet the undefined symbol error even with 2024.1? If so, can you help to

qnixsynapse commented 2 months ago

The undefined symbol is fixed when upgraded to 2024.1. However, it stopped detecting the pjrt plugin/GPU.

$ pip list |egrep jax
egrep: warning: egrep is obsolescent; using grep -E
jax                         0.4.25
jaxlib                      0.4.25
$ sudo pacman -Qs level-zero
local/intel-compute-runtime 24.13.29138.7-1
    Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver
local/level-zero-loader 1.16.15-1
    API for accessing low level interfaces in oneAPI platform devices (loader)
Zantares commented 2 months ago

@qnixsynapse JAX v0.4.25 doesn't match released Extension v0.3.0, so please try to use any of the following 2 solutions first:

  1. Downgrade JAX version to v0.4.24
  2. Rebuild the newest Extension main branch by yourself, it has upgraded to support JAX v0.4.25

More version-matching info can be found in the release notes: https://github.com/intel/intel-extension-for-openxla/releases. We will add a table in the home page to tell the matching info later.

qnixsynapse commented 2 months ago

@Zantares Same issue with v0.4.24:

image

Edit: It seems a python version mismatch, reason why pip is pulling a pre release old pywheel of openxla plugin. Arch Linux has python version: 3.12.3, the new wheels are built with 3.11.

Zantares commented 2 months ago

@Zantares Same issue with v0.4.24:

image

Edit: It seems a python version mismatch, reason why pip is pulling a pre release old pywheel of openxla plugin. Arch Linux has python version: 3.12.3, the new wheels are built with 3.11.

Thanks for the trial and the new info. We will check the error with Python 3.12 on Arc.

BTW, isn't Extension v0.3.0 installed but an old version if it's mismatched?

qnixsynapse commented 2 months ago

Seems like it. Pip is refusing to install the 0.3.0 because it is trying to search for a wheel built for Python 3.12 image

I guess the last one is the one getting installed. image

Also, this: image

Zantares commented 2 months ago

That's it. We will discuss it internally first to see if we can release more Python packages (means more test processes). Right now you can try to build it by yourself if Python 3.12 is hard requested.

qnixsynapse commented 1 month ago

I tried to build it today, hit with an error. Tbh, I am not familiar with bazel.

$ bazel build //xla/tools/pip_package:build_pip_package
Starting local Bazel server and connecting to it...
INFO: Reading rc options for 'build' from /home/qnixsynapse/.cache/work/intel-extension-for-openxla/.bazelrc:
  'build' options: --nocheck_visibility --announce_rc --config=gpu --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone --strategy=Genrule=standalone -c opt --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --distinct_host_configuration=false
ERROR: --distinct_host_configuration=false :: Unrecognized option: --distinct_host_configuration=false

Removing(commenting that option) gives this error:

$ bazel build //xla/tools/pip_package:build_pip_package
Starting local Bazel server and connecting to it...
INFO: Reading rc options for 'build' from /home/qnixsynapse/.cache/work/intel-extension-for-openxla/.bazelrc:
  'build' options: --nocheck_visibility --announce_rc --config=gpu --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone --strategy=Genrule=standalone -c opt --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --distinct_host_configuration=false
ERROR: --distinct_host_configuration=false :: Unrecognized option: --distinct_host_configuration=false
(mldev) $ nano .bazelrc 
(mldev) $ bazel build //xla/tools/pip_package:build_pip_package
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=145
INFO: Reading rc options for 'build' from /home/qnixsynapse/.cache/work/intel-extension-for-openxla/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/qnixsynapse/.cache/work/intel-extension-for-openxla/.bazelrc:
  'build' options: --nocheck_visibility --announce_rc --config=gpu --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone --strategy=Genrule=standalone -c opt --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include
INFO: Reading rc options for 'build' from /home/qnixsynapse/.cache/work/intel-extension-for-openxla/.xla_extension_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/home/qnixsynapse/.pyenvs/mldev/bin/python --action_env PYTHON_LIB_PATH=/home/qnixsynapse/.pyenvs/mldev/lib/python3.12/site-packages --python_path=/home/qnixsynapse/.pyenvs/mldev/bin/python --action_env TF_CXX11_ABI_FLAG=1 --action_env TF_NEED_SYCL=1 --action_env SYCL_TOOLKIT_PATH=/opt/intel/oneapi/compiler/latest --action_env LD_LIBRARY_PATH=/opt/intel/oneapi/compiler/latest/lib:/opt/intel/oneapi/compiler/latest/compiler/lib/intel64_lin --action_env LIBRARY_PATH=/opt/intel/oneapi/compiler/latest/lib:/opt/intel/oneapi/compiler/latest/compiler/lib/intel64_lin
INFO: Found applicable config definition build:gpu in file /home/qnixsynapse/.cache/work/intel-extension-for-openxla/.bazelrc: --crosstool_top=@local_config_sycl//crosstool:toolchain --define=using_sycl=true --repo_env TF_NEED_SYCL=1 --define=tensorflow_mkldnn_contraction_kernel=0 --cxxopt=-std=c++17 --host_cxxopt=-std=c++17
WARNING: --enable_bzlmod is set, but no MODULE.bazel file was found at the workspace root. Bazel will create an empty MODULE.bazel file. Please consider migrating your external dependencies from WORKSPACE to MODULE.bazel. For more details, please refer to https://github.com/bazelbuild/bazel/issues/18958.
DEBUG: /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/xla/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'llvm-raw' because it already exists.
DEBUG: /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/tsl/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'nvtx_archive' because it already exists.
ERROR: Traceback (most recent call last):
    File "/home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/build_bazel_rules_apple/apple/internal/rule_support.bzl", line 221, column 36, in <toplevel>
        deps_cfg = apple_common.multi_arch_split,
Error: 'apple_common' value has no field or method 'multi_arch_split'
ERROR: error loading package '@@com_github_grpc_grpc//': at /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/com_github_grpc_grpc/bazel/grpc_build_system.bzl:28:6: at /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/build_bazel_rules_apple/apple/ios.bzl:33:5: at /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/build_bazel_rules_apple/apple/internal/ios_rules.bzl:71:5: initialization of module 'apple/internal/rule_support.bzl' failed
ERROR: /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/tsl/tsl/BUILD:460:11: error loading package '@@com_github_grpc_grpc//': at /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/com_github_grpc_grpc/bazel/grpc_build_system.bzl:28:6: at /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/build_bazel_rules_apple/apple/ios.bzl:33:5: at /home/qnixsynapse/.cache/bazel/_bazel_qnixsynapse/30b9de68561309b1e1c2b2989f0a0749/external/build_bazel_rules_apple/apple/internal/ios_rules.bzl:71:5: initialization of module 'apple/internal/rule_support.bzl' failed and referenced by '@@tsl//tsl:grpc++'
ERROR: Analysis of target '//xla/tools/pip_package:build_pip_package' failed; build aborted: Analysis failed
INFO: Elapsed time: 92.196s, Critical Path: 0.02s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully
FAILED: 
    Fetching repository @@bazel_tools~cc_configure_extension~local_config_cc; starting
    Fetching repository @@double_conversion; starting
    Fetching repository @@eigen_archive; starting
    Fetching repository @@com_googlesource_code_re2; starting
    Fetching repository @@snappy; starting
    Fetching repository @@nsync; starting
    Fetching https://storage.googleapis.com/mirror.tensorflow.org/github.com/google/double-conversion/archive/v3.2.0.tar.gz; 252.1 KiB (3.7%)
    Fetching https://storage.googleapis.com/.../eigen-aa6964bf3a34fd607837dd8123bc42465185c4f8.tar.gz; 79.9 KiB (2.9%) ... (12 fetches)
Zantares commented 1 month ago

Hi @qnixsynapse thanks for your trial, but I have never saw this error before... Have you missed any steps in the instruction https://github.com/intel/intel-extension-for-openxla?tab=readme-ov-file#install-from-source-build like configure? If not, can you list your environment info including HW, OS and related SW? I can bridge a support engineer to you if you don't know what info is needed here.

qnixsynapse commented 1 month ago

Yes. I was following that while trying to build. And I ran ./configure before.

I tried to build two times but I am getting the exact same error..

As I mentioned before, I am not familiar with bazel. My OS is an up to date Arch Linux on an Intel 12th gen (CPU + ARC A750) PC. The Intel oneapi basekit is latest and is installed in /opt.

Just curious, what is the version of bazel that is used to build this package?

Zantares commented 1 month ago

I'm using Bazel 6.1.0. The configure script will check Bazel version: https://github.com/intel/intel-extension-for-openxla/blob/main/configure.py#L801, but I'm not sure what will happen if use a very new Bazel...

qnixsynapse commented 1 month ago

Got it. The version of bazel on my system is 7.1.1. This might be creating the problem.

feng-intel commented 1 month ago

@qnixsynapse Thanks for your reply. Do you still have build problem?

qnixsynapse commented 1 month ago

@feng-intel Yes, with the latest release tag.

yinghu5 commented 1 month ago

Hi @qnixsynapse, is it possible to install some lower version of python like python 3.11 in your environment? If yes, maybe we can back to install the package instead of build from source.

I tried to go through the install step by step on one linux unbuntu 22.04 + Arc 770 machine. follow the step in release 0.3 version: https://github.com/intel/intel-extension-for-openxla/tree/r0.3 ( pleas note : not the main page of main branch)

1) conda create -n openxla python=3.11 System restart required Last login: Wed May 22 17:06:20 2024 from 10.124.233.186 (base) yhu5@arc770-tce:~$ conda create -n openxla python=3.11 Retrieving notices: ...working... done Channels:

Package Plan

environment location: /home/yhu5/miniconda3/envs/openxla

added / updated specs:

2) (base) yhu5@arc770-tce:~$ conda activate openxla (openxla) yhu5@arc770-tce:~$ pip install jax==0.4.24 jaxlib==0.4.24 (please not here, it is not 0.4.25) 3) pip install --upgrade intel-extension-for-openxla 4) (openxla) yhu5@arc770-tce:~$ pip list Package Version


intel-extension-for-openxla 0.3.0 jax 0.4.24 jaxlib 0.4.24 ml-dtypes 0.4.0 numpy 1.26.4 opt-einsum 3.3.0 pip 24.0 scipy 1.11.4 setuptools 69.5.1 wheel 0.43.0

5) i have oneAPI 2024.1 installed , so call them directly: source /opt/intel/oneapi/compiler/2024.1/env/vars.sh source /opt/intel/oneapi/mkl/2024.1/env/vars.sh

6) please verify if your Arc 750 are recognized (openxla) yhu5@arc770-tce:~$ sycl-ls [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2024.17.3.0.08_160000] [opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000] [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.52.28202.51] [opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics 770 OpenCL 3.0 NEO [23.52.28202.51] [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.28202] [ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) UHD Graphics 770 1.3 [1.3.28202]

7) prepare the test_jax.py as https://github.com/intel/intel-extension-for-openxla/tree/r0.3

import jax
import jax.numpy as jnp
import jax
print("jax.local_devices(): ", jax.local_devices())

@jax.jit
def lax_conv():
  key = jax.random.PRNGKey(0)
  lhs = jax.random.uniform(key, (2,1,9,9), jnp.float32)
  rhs = jax.random.uniform(key, (1,1,4,4), jnp.float32)
  side = jax.random.uniform(key, (1,1,1,1), jnp.float32)
  out = jax.lax.conv_with_general_padding(lhs, rhs, (1,1), ((0,0),(0,0)), (1,1), (1,1))
  out = jax.nn.relu(out)
  out = jnp.multiply(out, side)
  return out

print(lax_conv())

8) $python test_jax.py and first run, i got the error 'version GLIBCXX_3.4.30' not found, so upgrade libstdc++ to the latest, conda

conda install libstdcxx-ng==12.2.0 -c conda-forge

image

then everything should be fine. Please feel free to try them and let us know the result.

qnixsynapse commented 1 month ago

@yinghu5 Actually, I use venv not conda because the latter has some issues with pip dependencies on some of my projects. Looks like I have to install conda and give it a try to see if it is possible to create an environment with python 3.11.

Thank you for your help. But since many Linux distros will eventually switch/already switched to python 3.12, I think it will be better to support python 3.12 as well.