Installation issues - Githubissues

narendrachaudhary51 commented 6 months ago

Hi,

I have tried to install intel-xpu-backend-for-triton on several machines. But I am not able to get it installed and get it working. These are the following configurations that I have tried.

PVC on internal cluster (borealis)
- OS version - SLES 15.4
- OneAPI version - Intel oneAPI SDK 2024.1.0.588 PUBLIC (2024.1.0.588.PUBLIC_IDP_2024.1.0_688)
- XPU driver - agama-ci-devel-803.29
- IPEX version - 2.1.0.post0+cxx11.abi
- Torch version - 2.1.20+xpu
- Source of IPEX and pytorch - https://intel.github.io/intel-extension-for-pytorch/#installation?platform=gpu&version=v2.1.20%2bxpu&os=linux%2fwsl2&package=pip
- Installation method - Install from source into anaconda environment with scripts/compile-triton.sh git clone https://github.com/intel/intel-xpu-backend-for-triton.git -b llvm-target cd intel-xpu-backend-for-triton scripts/compile-triton.sh
- I am not sure if I need to install triton separately

I got the following error log when running the scripts/compile-triton.sh file. error.log

Hardware on cluster - [0]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=448, gpu_eu_count=448)

PVC on internal cluster (tiergarten)
- OS version - Rocky Linux 9.2
- OneAPI version - Intel oneAPI 2024.1
- XPU driver - Not sure
- IPEX version - 2.1.0.post0+cxx11.abi
- Torch version - 2.1.20+xpu
- Source of IPEX and pytorch - https://intel.github.io/intel-extension-for-pytorch/#installation?platform=gpu&version=v2.1.20%2bxpu&os=linux%2fwsl2&package=pip
- Installation method - Install from source into anaconda environment with scripts/compile-triton.sh git clone https://github.com/intel/intel-xpu-backend-for-triton.git -b llvm-target cd intel-xpu-backend-for-triton scripts/compile-triton.sh
- I am not sure if I need to install triton separately
- I was able to finish the installation. However, I got the following error when running the scripts/test-triton or with running the python examples/ tutorials inside the python/examples or python/tutorials folder. Traceback (most recent call last): File "/data/nfs_home/nchaudh1/projects/intel-xpu-backend-for-triton/python/examples/empty.py", line 21, in pgm = kernel[(1, )](X, 1, 1, BLOCK=1024) File "/data/nfs_home/nchaudh1/projects/intel-xpu-backend-for-triton/python/triton/runtime/jit.py", line 209, in return lambda *args, *kwargs: self.run(grid=grid, warmup=False, args, kwargs) File "/data/nfs_home/nchaudh1/projects/intel-xpu-backend-for-triton/python/triton/runtime/jit.py", line 471, in run device = driver.active.get_current_device() File "/data/nfs_home/nchaudh1/projects/intel-xpu-backend-for-triton/python/triton/backends/intel/driver.py", line 404, in get_current_device return self.utils.get_current_device() File "/data/nfs_home/nchaudh1/projects/intel-xpu-backend-for-triton/python/triton/backends/intel/driver.py", line 398, in getattr self.utils = XPUUtils() File "/data/nfs_home/nchaudh1/projects/intel-xpu-backend-for-triton/python/triton/backends/intel/driver.py", line 56, in init self.context = mod.init_context(self.get_sycl_queue()) TypeError: an integer is required**

Hardware on cluster - [0]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1100', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=49152MB, max_compute_units=448, gpu_eu_count=448) [1]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1100', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=49152MB, max_compute_units=448, gpu_eu_count=448)

PVC in docker (same cluster as the previous one)
- I ran this to run everything on ubuntu 22.04 os.
- All other settings are same as the previous run.
- I got the same error as the previous run.

Please also add instructions to install the repo on clusters that do not have internet connections and/or sudo access.

ienkovich commented 6 months ago

self.context = mod.init_context(self.get_sycl_queue()) TypeError: an integer is required**

This looks like version mismatch for PyTorch/IPEX. There is scripts/compile-pytorch-ipex.sh to build/install proper versions. You can also run scripts/test-triton.sh after building Triton, this script should install dependencies (including PyTorch and IPEX) and run tests to make sure everything works fine.

pbchekin commented 6 months ago

You also can install pre-built wheels that we build nightly and attach as artifacts to this workflow: https://github.com/intel/intel-xpu-backend-for-triton/actions/workflows/nightly-wheels.yml. For example, the latest run: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/8824466261.

narendrachaudhary51 commented 6 months ago

Hi @ienkovich, I was not able to build compile the pytorch and ipex. It gave me an error during the build.

Thanks, @pbchekin I will try with the wheels.

narendrachaudhary51 commented 6 months ago

on Machine 2 - I was able build/install the latest version of pytorch and ipex with scripts/test-triton.sh. I was also able to run the examples and tutorials. The wheels didn't work for me due to GLIBC version mismatch.

I am still facing installation errors on the first machine.

pbchekin commented 6 months ago

The wheels didn't work for me due to GLIBC version mismatch.

We currently support only Ubuntu 22.04 for the nightly wheels, with other Linux distributions or Ubuntu versions you may have GLIBC version mismatch. What OS did you use?

vlad-penkin commented 6 months ago

@narendrachaudhary51, few comments on Machine 1 and 2 configurations:

As of now we support Ubuntu only, SLES 15.4 is installed on Machine 1 as per your report.
Agama 803.29 is Jan's LTS dirver version, a new update (Agama 803.45) was released last week
We recommend to use latest Agama Rolling driver 821.35
Machine 1 configuration mentions IDP, we recommend to use oneAPI Basekit 2024.1.0 with no IDP or conda-based DPC++ or oneMKL packages

narendrachaudhary51 commented 6 months ago

@vlad-penkin Thanks for your response.

When do you think you will have the support for other Linux distributions? I currently do not have an option of using ubuntu.
I also noticed that on machine 2, compilation of torch and ipex throws an error with gcc-12.3.0. I am currently compiling with gcc 11.4.1. Is this an expected behavior?

vlad-penkin commented 6 months ago

@narendrachaudhary51

We have no current plans to support Linux distros other than Ubuntu as of now. Is docker container an option for you?
Pytorch should compile with gcc 12, let's revisit this question after we will switch to the PyTorch top of the main branch within couple of the weeks.

fcharras commented 5 months ago

In #1110 I am also encountering the same TypeError: an integer is required, while running ubuntu 22.04 and using glibc-2.35 . I installed latest ipex and corresponding torch release using conda, and the 3.0.0b2 triton released wheel from this repository using pip in the same conda env.

intel / intel-xpu-backend-for-triton

Installation issues #973