Open mudler opened 7 months ago
not sure the root cause, but i see the container(3.11) and host(3.10) have different python version in your env
besides, I see in the blog Llama2 inference it needs
source {ONEAPI_PATH}/setvars.sh
before execute the python cmd
python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"
btw, could you be able to inference llama2-13B model on 2 Arc A770 gpus? Thanks~
@mudler - To start with, I would look into the drivers - specifically the UMD. I point to UMD because the intel-published docker container is picking up devices well.
Since level-zero is installed,
can you run clinfo -l
and post the output here?
You may have to install 'clinfo'.
If the outputs look well, then I concur with @BismarckDD , that likely the proper oneAPI environment variables were not sourced well, before running the python command. So - please try that as well.
@intel-ravig what I should look for UMD? I just installed the drivers as per the Intel docs I've linked, with the steps in the issue. This is a newly installed 22.04 LTS box.
Just for reference, here are the steps:
mudler@arc:~$ source activate diffusers
(diffusers) mudler@arc:~$ conda env list
# conda environments:
#
diffusers * /home/mudler/.conda/envs/diffusers
base /opt/conda
(diffusers) mudler@arc:~$ source /opt/intel/oneapi/setvars.sh
:: initializing oneAPI environment ...
-bash: BASH_VERSION = 5.1.16(1)-release
args: Using "$@" for setvars.sh arguments:
:: advisor -- latest
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: ipp -- latest
:: ippcp -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vtune -- latest
:: oneAPI environment initialized ::
(diffusers) mudler@arc:~$ python3 -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"
2.1.0a0+cxx11.abi
2.1.10+xpu
(diffusers) mudler@arc:~$ clinfo -l
Platform #0: Intel(R) FPGA Emulation Platform for OpenCL(TM)
`-- Device #0: Intel(R) FPGA Emulation Device
Platform #1: Intel(R) OpenCL
`-- Device #0: AMD Ryzen 7 5700G with Radeon Graphics
Platform #2: Intel(R) OpenCL Graphics
+-- Device #0: Intel(R) Arc(TM) A770 Graphics
`-- Device #1: Intel(R) Arc(TM) A770 Graphics
to reiterate: on the same box with llama.cpp it all works fine and I can offload correctly to the GPUs. It just looks a problem with ipex
@mudler - I am able to duplicate your issue on your conda environment. However, it runs fine on python venv and docker environments. I will discuss with engineering team for next course of action.
@mudler - I am able to duplicate your issue on your conda environment. However, it runs fine on python venv and docker environments. I will discuss with engineering team for next course of action.
thanks @intel-ravig !
for the time being in LocalAI I'll go with supporting it without conda - however conda support is much wanted as otherwise implementations become quite convoluted and harder to follow
@mudler - I got in touch with the engineering team and got a solution. The conda environment issue is known and documented here.
Problem: Number of dpcpp devices should be greater than zero.
Cause: If you use Intel® Extension for PyTorch* in a conda environment, you might encounter this error. Conda also ships the libstdc++.so dynamic library file that may conflict with the one shipped in the OS.
Solution: Export the libstdc++.so file path in the OS to an environment variable LD_PRELOAD.
I tried these steps: a. /sbin/ldconfig -p | grep stdc++ b. Pick the location of stdc++ for 64bit c. export LD_PRELOAD=<location of stdc++> d. 'conda activate' your environment e. source oneAPI variables f. Test the same python device command.
I was able to solve the issue with conda env.
Please check and let us know.
There currently exists a similar issue with intel/compute-runtime, however it's been observed on kernel 6.8 (kernel 6.7 apparently works normally). It might also manifest on your ancient kernel. Could this be the same cause? Here is the fix: https://github.com/intel/compute-runtime/issues/710#issuecomment-2002646557
@mudler - I got in touch with the engineering team and got a solution. The conda environment issue is known and documented here.
Problem: Number of dpcpp devices should be greater than zero. Cause: If you use Intel® Extension for PyTorch* in a conda environment, you might encounter this error. Conda also ships the libstdc++.so dynamic library file that may conflict with the one shipped in the OS. Solution: Export the libstdc++.so file path in the OS to an environment variable LD_PRELOAD.
I tried these steps: a. /sbin/ldconfig -p | grep stdc++ b. Pick the location of stdc++ for 64bit c. export LD_PRELOAD=<location of stdc++> d. 'conda activate' your environment e. source oneAPI variables f. Test the same python device command.
I was able to solve the issue with conda env.
Please check and let us know.
Hi @mudler Did this solution fix your issue?
Describe the bug
Context: I'm the author of LocalAI, and I'm trying to bring diffusers and transformers support to it ( https://github.com/mudler/LocalAI/pull/1746 ).
I'm starting by following the documentation in https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu&version=v2.1.10%2Bxpu , however It seems after successfully installing with conda all the dependencies, running the "Sanity" test I cannot find the devices in my system.
I have 2 Intel Arc A770, but when running:
The result is just:
By printing torch.xpu.device_count(), it returns 0.
My user is in the video/render group:
Running conda install is successfull, indeed seems I have all the packages:
system dependencies are there, indeed, I can run llama.cpp just fine and offloading everything to the GPU:
Since I am able to run llama.cpp within this host successfully (also via containers and kubernetes) I'm suspecting is somehow the python environment that cannot detect the devices.
Any help and hint would be greatly appreciated, thanks!
Versions
Oddly enough, from the docker container it seems to detect the devices just fine: