intel / tiny-dpcpp-nn

SYCL implementation of Fused MLPs for Intel GPUs
BSD 3-Clause "New" or "Revised" License
36 stars 9 forks source link

Fail to compile pybind #11

Open olegmikul opened 2 weeks ago

olegmikul commented 2 weeks ago

Failed at compiling from source. The message:

The source directory /home/username/tiny-dpcpp-nn/extern/pybind11 does not contain a CMakeLists.txt file.

If I try to checkout extern/pybind through git, I have permission errors:

Cloning into '/home/username/tiny-dpcpp-nn/extern/pybind11'... git@github.com: Permission denied (publickey). fatal: Could not read from remote repository.

cbauinge commented 2 weeks ago

Hi @olegmikul, can you try cloning pybind manually into the extern directory?

cd /home/username/tiny-dpcpp-nn/extern git clone https://github.com/pybind/pybind11.git

Best regard

olegmikul commented 1 week ago

Thank you.

Cloning from external git pybind help a bit, but stopped later. The error was:

Could not find a package configuration file provided by "Torch" with any of the following names:

TorchConfig.cmake
torch-config.cmake

Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set "Torch_DIR" to a directory containing one of the above files. <<< I've pointed "export Torch_DIR=/home/username/tiny_dpcpp_nn/extern/libtorch/share/cmake/Torch/" and after that got another error:

static library kineto_LIBRARY-NOTFOUND not found <<< I can't isntall kineto since I don't have CUDA/NVidia GPU on that system. BTW, I have installed and running "intel-extension-for-pytorch", it would be good to have located installed version of that instead of cloning into external folder of tiny_dpcpp_nn.

For the reference, installed C++ testbench version of tiny_dpcpp_nn runs on my ARC 750 (it is hanging for some cases, while finishing fine for some other cases, most likely to limited VRAM memory on ARC 750).

Please help. Thanks.

cbauinge commented 1 week ago

Hi, Regarding the Arc750 hangs. Can you paste the output, oneapi version and (if you know it) driver version? I agree that it most likely is a memory issue but there may be a bug hidden.

Regarding location of the installed IPEX version: Agreed. We will have a look at it.

Regarding the torch issue. Could you gives us more details how you installed it? Which versions are you using? If you run Python interactively, can you do the following without error?

import torch import intel_extension_for_pytorch

Thanks and best regards, Christoph

olegmikul commented 1 week ago

Hi, Arc 750 output:

./benchmark-inference Inference Running on Intel(R) Arc(TM) A750 Graphics n_hidden_layers = 4, WIDTH = 64, batch_size = 4194304, typename = bf16, type size = 2 bytes MPI world_size = 1 Finished benchmark.

Iterations = 1000

Time = 7.65581 s AI (infinite $) = 159.994 flops/byte BW (infinite $) = 140.257 GB/s Throughput = 22440.3 Gflops/s

Inference Running on Intel(R) Arc(TM) A750 Graphics n_hidden_layers = 4, WIDTH = 64, batch_size = 4194304, typename = sycl::half, type size = 2 bytes MPI world_size = 1 Finished benchmark.

Iterations = 1000

Time = 7.67759 s AI (infinite $) = 159.994 flops/byte BW (infinite $) = 139.859 GB/s Throughput = 22376.6 Gflops/s

Inference Running on Intel(R) Arc(TM) A750 Graphics n_hidden_layers = 4, WIDTH = 16, batch_size = 4194304, typename = bf16, type size = 2 bytes MPI world_size = 1 Finished benchmark.

Iterations = 1000

Time = 0.925378 s AI (infinite $) = 39.9996 flops/byte BW (infinite $) = 290.085 GB/s Throughput = 11603.3 Gflops/s

Inference Running on Intel(R) Arc(TM) A750 Graphics n_hidden_layers = 4, WIDTH = 32, batch_size = 4194304, typename = bf16, type size = 2 bytes MPI world_size = 1 Finished benchmark.

Iterations = 1000

Time = 1.41277 s AI (infinite $) = 79.9985 flops/byte BW (infinite $) = 380.021 GB/s Throughput = 30401.1 Gflops/s

Inference Running on Intel(R) Arc(TM) A750 Graphics n_hidden_layers = 4, WIDTH = 128, batch_size = 4194304, typename = bf16, type size = 2 bytes MPI world_size = 1 ... and hangs ... <<< oneapi - v 2024.2 I have created virtual environment for python with ipex:

intel-extension-for-pytorch 2.1.30+xpu torch 2.1.0.post2+cxx11.abi torchaudio 2.1.0.post2+cxx11.abi torchvision 0.16.0.post2+cxx11.abi <<< import torch import intel_extension_for_pytorch works fine, I can run code with and without intel_extension_for_pytorch, I have observed speeding up by running pytorch_extension (about 2x vs i9-12900K) for some inference code. I followed instructions for installing IPEX, so torch, torchaudio, torchvision were installed according those instructions.

My OS - ubuntu 22.04.4 LTS with all updated drivers, libraries, packages. Linux kernel - 6.5.0-41-generic. I run pytorch with intel extensions for ARC also as arrayfire for opencl on ARC (C/C++ high performance library for various back ends).

My outputs during installations with pybinding:

cmake -DBUILD_PYBIND=ON -DTARGET_DEVICE=ARC .. -- The CXX compiler identification is IntelLLVM 2024.2.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /opt/intel/oneapi/mpi/2021.13/bin/mpiicpx - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Building for ARC CMake Warning at CMakeLists.txt:38 (message): cmake build system is still WIP.

-- libtorch already exists in /home/username/tiny/extern/. -- Found IntelSYCL: /opt/intel/oneapi/mpi/2021.13/include (found version "202001") -- oneDPL: ONEDPL_PAR_BACKEND=tbb, disable OpenMP backend -- Performing Test _fsycl_option -- Performing Test _fsycl_option - Success -- Looking for C++ include sycl/sycl.hpp -- Looking for C++ include sycl/sycl.hpp - found -- Adding -fsycl compiler option -- oneDPL: ONEDPL_PAR_BACKEND=tbb, disable OpenMP backend -- Adding -fsycl compiler option -- Found MPI_CXX: /opt/intel/oneapi/mpi/2021.13/bin/mpiicpx (found version "3.1") -- Found MPI: TRUE (found version "3.1") -- pybind11 v2.13.0 dev1 -- Found PythonInterp: /home/username/ai_gpu/bin/python (found suitable version "3.10.12", minimum required is "3.7") -- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.10.so -- Performing Test HAS_INTEL_IPO -- Performing Test HAS_INTEL_IPO - Success CMake Warning at extern/libtorch/share/cmake/Torch/TorchConfig.cmake:22 (message): static library kineto_LIBRARY-NOTFOUND not found. Call Stack (most recent call first): extern/libtorch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found) dpcpp_bindings/CMakeLists.txt:7 (find_package)

-- Found Torch: /home/username/tiny/extern/libtorch/lib/libtorch.so CMake Warning at extern/libtorch/share/cmake/Torch/TorchConfig.cmake:22 (message): static library kineto_LIBRARY-NOTFOUND not found. Call Stack (most recent call first): extern/libtorch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found) extern/libtorch/share/cmake/IPEX/IPEXConfig.cmake:90 (FIND_PACKAGE) dpcpp_bindings/CMakeLists.txt:13 (find_package)

-- Found IPEX: /home/username/tiny/extern/libtorch/lib/libintel-ext-pt-gpu.so -- Configuring done (4.5s) CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files: TORCH_PYTHON_LIBRARY linked by target "tiny_dpcpp_nn_pybind_module" in directory /home/username/tiny/dpcpp_bindings

-- Generating done (0.0s) CMake Generate step failed. Build files cannot be regenerated correctly.

<<<

Thank you, Oleg

Yunaik commented 3 days ago

Hi Oleg,

This is indeed weird. TORCH_PYTHON_LIBRARY should come with pytorch. Did you install PyTorch + IPEX according to: IPEX Installation Guide?

Further, it seems /home/username/ai_gpu/bin/python is used as Python Interpreter. Can you please ensure that this interpreter is using the IPEX installation? You can confirm with which python, which Python interpreter you're currently using and thus, where pip install torch is installing IPEX to.

olegmikul commented 2 days ago

Hi,

Yes, I have installed IPEX as in installation guide. Under my virtual environment, so the right python interpreter was used. I can confirm it also by "which python"