Closed brendondgr closed 4 months ago
Addenum - Windows 10 So I wanted to add that whenever I use the Windows 10 instructions and create a conda environment with all of the requirements (proper torch, torchvision and torchaudio versions), it seems to work fine. I had also installed the versions of this that relate to the most recent release of Intel extension for pytorch. However, whenever I add a new library into the mix (such as torchmetrics, monai, etc), it seems to either downgrade, upgrade, or completely remove the previously mentioned requirements.
Here is the error whenever I add in the new libraries, specifically seems to highlight the importing of torch. As I stated, everything was working fine until I had decided to install matplotlib:
import warnings
warnings.filterwarnings("ignore")
# Import Torch, MONAI and other libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision.transforms import Compose, Normalize, ToTensor
from torch.utils.data import DataLoader
# from torch.utils.tensorboard import SummaryWriter
from monai.transforms import Compose, ScaleIntensity, EnsureChannelFirst, Resize, EnsureChannelFirstd
from monai.networks.layers import Norm
from monai.networks.nets import UNet
from monai.losses import DiceLoss
from monai.inferers import sliding_window_inference
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Set random seed
torch.manual_seed(0)
# Import custom classes/functions
from cg_Dataset import load_data
Specifically the error is the following:
OSError: [WinError 127] The specified procedure could not be found. Error loading "..\anaconda3\envs\dl_bu\lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.
My Windows Solution (So far) I'm consistently working out the kinks on this, so as I come up with solutions, I will be sharing them. To ensure that everything works with Windows, I had to do the following order.
conda create -n general intelpython3_core python=3.x
conda activate general
conda install pkg-config libuv
python -m pip install --force-reinstall Pillow
python -m pip install --force-reinstall matplotlib
So far the libraries that I haven't been able to get working are the following:
@brendondgr any luck navigating through the dependency issue ? The original issue seems related to this
@brendondgr any luck navigating through the dependency issue ? The original issue seems related to this
It seems like their issue was also on Windows Distribution, which I had issues with initially. The error involving backend_with_compiler.dll
was due to certain libraries (most likely torchmetrics in my case), that was downgrading or completely wiping out some of the Intel Python libraries.
Along with this, I noticed that downloading all of the libraries I need before running the command to download the proper PyTorch and IPEX versions was needed in cases where these libraries would be replaced by the previously installed libraries, as well.
So currently: I am still having issues with the original post still, where I have the import errors for "libmkl_sycl_blas.so.4" and "libze_loader.so.1". Once I activated OneAPI, the "libmkl" error seemed to go away, but now results in the "libze" error. This error specifically occurs on Fedora Workstation 40.
libze_loader error might be due to issues with the level_zero driver. Can you activate oneapi (source /opt/intel/oneapi/setvars.sh), and run the command 'sycl-ls' and check if its properly listing the 'oneapi_level_zero:gpu' ?
libze_loader error might be due to issues with the level_zero driver. Can you activate oneapi (source /opt/intel/oneapi/setvars.sh), and run the command 'sycl-ls' and check if its properly listing the 'oneapi_level_zero:gpu' ?
When running the command sycl-ls
I get the following output (After activating OneAPI with setvars.sh):
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2024.17.3.0.08_160000]
[opencl:cpu:1] Intel(R) OpenCL, AMD Ryzen 5 3600 6-Core Processor OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000]
[opencl:cpu:2] Intel(R) OpenCL, AMD Ryzen 5 3600 6-Core Processor OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [24.09.28717.17]
Seems level_zero support is missing. Have you already tried installing the drivers for A770 ?
Currently, for linux distros I see that A770 drivers are validated only for Ubuntu 22.04. Possible to switch to Ubuntu ?
If not, check if the RHEL install path works for Fedora --> https://dgpu-docs.intel.com/driver/installation.html#red-hat-enterprise-linux-package-repository
Seems level_zero support is missing. Have you already tried installing the drivers for A770 ?
These should be installed, given when running the command glxinfo | grep -e "OpenGL vendor" -e "OpenGL renderer"
results in the following output:
OpenGL vendor string: Intel
OpenGL renderer string: Mesa Intel(R) Arc(tm) A770 Graphics (DG2)
With the drivers being "Mesa". Also running sudo intel_gpu_top
also results in the proper functionality. Either way, if there's no clear solution on Fedora I may have to switch back to Ubuntu, but this may take a few hours.
These should be installed, given when running the command glxinfo | grep -e "OpenGL vendor" -e "OpenGL renderer" results in the following output:
OpenGL and Mesa drivers provide the Media runtime which seems to be installed fine. But IPEX needs the Compute runtime through the level zero driver.
Either way, if there's no clear solution on Fedora I may have to switch back to Ubuntu, but this may take a few hours.
Sure, once on Ubuntu 22.04,
sycl-ls
lists the level_zero gpu deviceCurrently, for linux distros I see that A770 drivers are validated only for Ubuntu 22.04. Possible to switch to Ubuntu ?
So I switched over to Ubuntu 22.04 LTS and everything seems to be functioning properly. The only issue I am now having is that my Jupyter Environment is not detecting the graphics card. I have confirmed several times over that the graphics drivers are installed properly, so I don't believe this is an issue.
However, I was curious if there is a reason why it is not detecting it properly?
Activate oneAPI, and verify sycl-ls lists the level_zero gpu device
Here is my output for sycl-ls
on Ubuntu:
(base) bdgr@bdgr-MS-7B86:~$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2024.17.5.0.08_160000.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, AMD Ryzen 5 3600 6-Core Processor OpenCL 3.0 (Build 0) [2024.17.5.0.08_160000.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.43.027642]
[opencl:cpu:3] Intel(R) OpenCL, AMD Ryzen 5 3600 6-Core Processor OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000]
Is this what you were looking for or is there still an issue?
Edit: I realize that the level-zero is not available according to that output. I am going to look into other ways to resolve this. If you or anyone has a suggestion, I am all ears.
Is this what you were looking for or is there still an issue?
Still can't see the level_zero gpu getting listed.
Are you able to successfully run the following on the CLI ?:
Sanity check
python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"
Imports should work, package versions and the gpu name should get printed
Try running an IPEX sample here
Sanity check The output is the following:
2.1.0.post2+cxx11.abi 2.1.30+xpu
Which matches what I am finding in my Jupyter Notebook.
Running an example, such as the ResNet50 Example, results in the following error, which is just a Kernel crashing error:
The Kernel crashed while executing code in the current cell or a previous cell.
Please review the code in the cell(s) to identify a possible cause of the failure.
Click [here](https://aka.ms/vscodeJupyterKernelCrash) for more info.
View Jupyter [log](command:jupyter.viewOutput) for further details.
When looking into the log output, all it says is:
17:06:17.391 [error] Disposing session as kernel process died ExitCode: undefined, Reason:
Sanity check The output is the following: 2.1.0.post2+cxx11.abi 2.1.30+xpu
Its not listing the GPU name.
One last try, check if the current user is on the 'render group', reference
# check if a group named 'render' exists
stat -c "%G" /dev/dri/render*
# check if current user is in render group
groups ${USER}
# Add current user to render group and refresh terminal
sudo gpasswd -a ${USER} render && newgrp render
Login to a new terminal >> activate oneapi >> try sycl-ls again
Login to a new terminal >> activate oneapi >> try sycl-ls again
It's adding my user to the render group and I have confirmed I have been added. However, sycl-ls still looks the same and doing the "sanity check" does not print out the graphics card.
Okay, I believe I resolved it. I wasn't sure what was occurring and I had a sneaking suspicion that it had something to do with the intel-compute-runtime not being installed correctly, so I had run the following commands to install DKMS Kernel modules, then installed the compute, media and display runtimes. First I added the Intel Graphics Repository.
Graphics Repository:
sudo apt-get install -y gpg-agent wget
wget -qO - https://repositories.intel.com/graphics/intel-graphics.key | sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg
echo 'deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/graphics/ubuntu jammy arc' | sudo tee /etc/apt/sources.list.d/intel.gpu.jammy.list
Installing the Modules and Runtimes:
sudo apt-get install -y intel-platform-vsec-dkms intel-platform-cse-dkms intel-i915-dkms intel-fw-gpu
sudo apt-get install -y intel-opencl-icd intel-level-zero-gpu level-zero intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2
I then ran my Python code and it found the GPU immediately. Just in case, however, I had also utilized the following command to install the Intel Compute Runtime Package by itself (but I did not check to see if this by itself resolved the issue...):
sudo apt-get install -y intel-opencl-icd
I will keep this open temporarily if there are any further issues or if there are any questions for me.
Oh that's great! Glad that it finally worked
On Sun, 9 Jun, 2024, 3:56 am brendondgr, @.***> wrote:
Okay, I believe I resolved it. I wasn't sure what was occurring and I had a sneaking suspicion that it had something to do with the intel-compute-runtime not being installed correctly, so I had run the following commands to install DKMS Kernel modules, then installed the compute, media and display runtimes. First I added the Intel Graphics Repository.
Graphics Repository:
sudo apt-get install -y gpg-agent wget wget -qO - https://repositories.intel.com/graphics/intel-graphics.key | sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg echo 'deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/graphics/ubuntu jammy arc' | sudo tee /etc/apt/sources.list.d/intel.gpu.jammy.list
Installing the Modules and Runtimes:
sudo apt-get install -y intel-platform-vsec-dkms intel-platform-cse-dkms intel-i915-dkms intel-fw-gpu sudo apt-get install -y intel-opencl-icd intel-level-zero-gpu level-zero intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2
I then ran my Python code and it found the GPU immediately. Just in case, however, I had also utilized the following command to install the Intel Compute Runtime Package by itself (but I did not check to see if this by itself resolved the issue...):
sudo apt-get install -y intel-opencl-icd
I will keep this open temporarily if there are any further issues or if there are any questions for me.
— Reply to this email directly, view it on GitHub https://github.com/intel/intel-extension-for-pytorch/issues/651#issuecomment-2156209536, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIB4WE3QYYR3QNPAWFIGN23ZGOAJ7AVCNFSM6AAAAABI3MNINOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJWGIYDSNJTGY . You are receiving this because you were assigned.Message ID: @.***>
I haven't done too much coding with deep learning since last posting, but when I had it seems to be working properly. I'm going to close this now since the previous steps seemed to have worked so far.
Thank you @vishnumadhu365 for the assistance and ideas. :)
Describe the issue
The Issue I am having: When attempting to import this library, I am getting the following error:
import intel_extension_for_pytorch as ipex
Instructions I followed for download: https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu&version=v2.1.30%2bxpu&os=linux%2fwsl2&package=pip
Here are the version in case you don't want to read through it:
From these, this may be overkill, but I added the following paths to .bashrc:
Other Solutions I have Attempted: I have also activated OneAPI environment as well as the conda environment prior to attempting the import, but this results in a different error. Here is the order of commands (Note, conda env "dl" contains all mentioned libaries):
source /home/bdgr/intel/oneapi/setvars.sh
conda activate dl
python -c "import torch; import intel_extension_for_pytorch as ipex;"
This import (after activating OneAPI) results in the following Error:
I don't know what this error is. The other one I knew had something to do with OneAPI, but any help would be appreciated! I have also tried to find if there was a GPU/XPU in a Jupyter notebook, but it was not detected.
Specifications: Operating System: Fedora Workstation 40 Graphics Card: Intel Arc A770 CPU: Ryzen 5 3600 Motherboard: MSI B450-A Pro MAX (Updated to Latest Kernel version to support Re-Bar)