intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Apache License 2.0
1.61k stars 247 forks source link

F32 Example Training Gets Stuck after One Iteration of For Loop #253

Open tedliosu opened 2 years ago

tedliosu commented 2 years ago
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ sudo -H ./build.sh
[a bunch of output here]
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ IMAGE_NAME=intel-extension-for-pytorch:gpu
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ VIDEO=$(getent group video | sed -E 's,^video:[^:]*:([^:]*):.*$,\1,')
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ RENDER=$(getent group render | sed -E 's,^render:[^:]*:([^:]*):.*$,\1,')
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ test -z "$RENDER" || RENDER_GROUP="--group-add ${RENDER}"
(base) tedliosu@victus-ted:~/Documents/all_git/intel-extension-for-pytorch/docker$ sudo -H docker run --rm -v /home/tedliosu/intel_pytorch_workspace:/workspace --group-add ${VIDEO} ${RENDER_GROUP} --device=/dev/dri --ipc=host -it $IMAGE_NAME bash
[sudo] password for tedliosu: 
groups: cannot find name for group ID 109
root@8e852a62c8b4:/# cd workspace/
root@d4958d53cb7c:/workspace# python3 -m trace -t ipex_f32_example.py 2>&1 | tee ipex_f32_example_py_trace.txt | grep ipex_f32_example
 --- modulename: ipex_f32_example, funcname: <module>
ipex_f32_example.py(1): import torch
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): ipex_f32_example.py(2): import torchvision
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): ipex_f32_example.py(4): import intel_extension_for_pytorch as ipex
<frozen importlib._bootstrap>(186): <frozen importlib._bootstrap>(187): <frozen importlib._bootstrap>(191): <frozen importlib._bootstrap>(192): <frozen importlib._bootstrap>(194): ipex_f32_example.py(7): LR = 0.001
ipex_f32_example.py(8): DOWNLOAD = True
ipex_f32_example.py(9): DATA = 'datasets/cifar10/'
ipex_f32_example.py(11): transform = torchvision.transforms.Compose([
ipex_f32_example.py(12):     torchvision.transforms.Resize((224, 224)),
ipex_f32_example.py(13):     torchvision.transforms.ToTensor(),
ipex_f32_example.py(14):     torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
ipex_f32_example.py(11): transform = torchvision.transforms.Compose([
ipex_f32_example.py(16): train_dataset = torchvision.datasets.CIFAR10(
ipex_f32_example.py(17):         root=DATA,
ipex_f32_example.py(18):         train=True,
ipex_f32_example.py(19):         transform=transform,
ipex_f32_example.py(20):         download=DOWNLOAD,
ipex_f32_example.py(16): train_dataset = torchvision.datasets.CIFAR10(
ipex_f32_example.py(22): train_loader = torch.utils.data.DataLoader(
ipex_f32_example.py(23):         dataset=train_dataset,
ipex_f32_example.py(24):         batch_size=128
ipex_f32_example.py(22): train_loader = torch.utils.data.DataLoader(
ipex_f32_example.py(27): model = torchvision.models.resnet50()
ipex_f32_example.py(28): criterion = torch.nn.CrossEntropyLoss().to("xpu")
ipex_f32_example.py(29): optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
ipex_f32_example.py(30): model.train()
ipex_f32_example.py(32): model = model.to("xpu")
ipex_f32_example.py(33): model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.float32)
ipex_f32_example.py(36): for batch_idx, (data, target) in enumerate(train_loader):
ipex_f32_example.py(37):     print("Begin 1 loop iteration")
ipex_f32_example.py(39):     data = data.to("xpu")
ipex_f32_example.py(40):     print("Moved data onto XPU")
ipex_f32_example.py(41):     target = target.to("xpu")
ipex_f32_example.py(42):     print("Moved target onto XPU")
ipex_f32_example.py(44):     optimizer.zero_grad()
ipex_f32_example.py(45):     print("About to apply model to data")
ipex_f32_example.py(46):     output = model(data)
ipex_f32_example.py(47):     print("Finished applying model to data")
ipex_f32_example.py(48):     loss = criterion(output, target)
ipex_f32_example.py(49):     print("About to execute loss.backward()")
ipex_f32_example.py(50):     loss.backward()
ipex_f32_example.py(51):     print("About to execute optimizer.step()")
ipex_f32_example.py(52):     optimizer.step()
ipex_f32_example.py(53):     print("Current batch id : %d" % (batch_idx))
ipex_f32_example.py(54):     data = None
ipex_f32_example.py(55):     target = None
ipex_f32_example.py(36): for batch_idx, (data, target) in enumerate(train_loader):
[I killed the process after ***90 minutes*** of being stuck here]
root@d4958d53cb7c:/workspace# tail -n35 ipex_f32_example_py_trace.txt
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
root@d4958d53cb7c:/workspace# pip list
Package                     Version            
--------------------------- -------------------
contourpy                   1.0.6              
cycler                      0.11.0             
fonttools                   4.38.0             
intel-extension-for-pytorch 1.10.200+gpu       
kiwisolver                  1.4.4              
matplotlib                  3.6.1              
numpy                       1.23.4             
packaging                   21.3               
Pillow                      9.3.0              
pip                         20.0.2             
pyparsing                   3.0.9              
python-dateutil             2.8.2              
setuptools                  45.2.0             
six                         1.16.0             
torch                       1.10.0a0+git3d5f2d4
torchvision                 0.11.3             
typing-extensions           4.4.0              
wheel                       0.34.2

Contents of ipex_f32_example.py (as you can see it's basically the Float32 example from here):

import torch
import torchvision
############# code changes ###############
import intel_extension_for_pytorch as ipex
############# code changes ###############

LR = 0.001
DOWNLOAD = True
DATA = 'datasets/cifar10/'

transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize((224, 224)),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_dataset = torchvision.datasets.CIFAR10(
        root=DATA,
        train=True,
        transform=transform,
        download=DOWNLOAD,
)
train_loader = torch.utils.data.DataLoader(
        dataset=train_dataset,
        batch_size=128
)

model = torchvision.models.resnet50()
criterion = torch.nn.CrossEntropyLoss().to("xpu")
optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
model.train()
#################################### code changes ################################
model = model.to("xpu")
model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.float32)
#################################### code changes ################################

for batch_idx, (data, target) in enumerate(train_loader):
    print("Begin 1 loop iteration")
    ########## code changes ##########
    data = data.to("xpu")
    print("Moved data onto XPU")
    target = target.to("xpu")
    print("Moved target onto XPU")
    ########## code changes ##########
    optimizer.zero_grad()
    print("About to apply model to data")
    output = model(data)
    print("Finished applying model to data")
    loss = criterion(output, target)
    print("About to execute loss.backward()")
    loss.backward()
    print("About to execute optimizer.step()")
    optimizer.step()
    print("Current batch id : %d" % (batch_idx))
    data = None
    target = None
torch.save({
     'model_state_dict': model.state_dict(),
     'optimizer_state_dict': optimizer.state_dict(),
     }, 'checkpoint.pth')

As you can see in the command line output I noted that the ipex_f32_example.py script basically froze for 90 minutes when I was running it after it reached the for batch_idx, (data, target) in enumerate(train_loader): line; when I was running it without the tracing it froze at data = data.to("xpu") for over 8 hours before I had to simply kill the process. I have no idea if this is a driver issue or a torchvision issue or whatever, but this is really annoying and I'd be more than happy to provide extra info about my system to help solve this freezing problem. Also note that tail -n35 ipex_f32_example_py_trace.txt displays the last 35 lines of the trace I ran on the script to see exactly where the execution of the script is freezing.

P.S. since I already mentioned this issue in here before I made this separate issue, I saw the reply here to my initial comment about this issue but I have no idea how to apply that person's comment to help solve this issue :confused:

jingxu10 commented 2 years ago

May I know which driver have you installed on your machine? Is the graphics card a Flex Series 170 GPU card?

tedliosu commented 2 years ago

May I know which driver have you installed on your machine? Is the graphics card a Flex Series 170 GPU card?

(base) tedliosu@victus-ted:~$ dpkg -l | grep -E "intel-opencl-icd|intel-level-zero-gpu|level-zero|intel-media-va-driver-non-free|libmfx1|libmfxgen1|libvpl2|libigc-dev|intel-igc-cm|libigdfcl-dev|libigfxcmrt-dev|level-zero-dev"
ii  intel-igc-cm                                                            1.0.160+i815~u20.04                                            amd64        Intel(R) C for Metal Compiler -- CM Frontend lib
ii  intel-level-zero-gpu                                                    1.3.24055+i815~u20.04                                          amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  intel-media-va-driver-non-free:amd64                                    22.5.3+i815~u20.04                                             amd64        VAAPI driver for the Intel GEN8+ Graphics family
ii  intel-opencl-icd                                                        22.35.24055+i815~u20.04                                        amd64        Intel graphics compute runtime for OpenCL
ii  level-zero                                                              1.8.5+i815~u20.04                                              amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  level-zero-dev                                                          1.8.5+i815~u20.04                                              amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  libigc-dev                                                              1.0.11702.1+i815~u20.04                                        amd64        Intel graphics compiler for OpenCL -- core development files
ii  libigdfcl-dev                                                           1.0.11702.1+i815~u20.04                                        amd64        Intel graphics compiler for OpenCL -- OpenCL development files
ii  libigfxcmrt-dev:amd64                                                   22.5.3+i815~u20.04                                             amd64        Intel C for Media Runtime -- development files
ii  libmfx1:amd64                                                           22.5.3+i815~u20.04                                             amd64        Intel Media SDK -- shared library
ii  libmfxgen1:amd64                                                        22.5.3+i815~u20.04                                             amd64        Intel oneVPL GPU Runtime -- shared library
ii  libvpl2:amd64                                                           2022.2.0.0+i815~u20.04                                         amd64        Intel oneVPL Dispatcher -- shared library
(base) tedliosu@victus-ted:~$ uname -r
5.19.0-16.4-liquorix-amd64
(base) tedliosu@victus-ted:~$ modinfo i915
filename:       /lib/modules/5.19.0-16.4-liquorix-amd64/kernel/drivers/gpu/drm/i915/i915.ko.xz
license:        GPL and additional rights
description:    Intel Graphics
author:         Intel Corporation
author:         Tungsten Graphics, Inc.
import_ns:      DMA_BUF
firmware:       i915/skl_huc_2.0.0.bin
firmware:       i915/bxt_huc_2.0.0.bin
firmware:       i915/kbl_huc_4.0.0.bin
firmware:       i915/glk_huc_4.0.0.bin
firmware:       i915/kbl_huc_4.0.0.bin
firmware:       i915/kbl_huc_4.0.0.bin
firmware:       i915/cml_huc_4.0.0.bin
firmware:       i915/icl_huc_9.0.0.bin
firmware:       i915/ehl_huc_9.0.0.bin
firmware:       i915/ehl_huc_9.0.0.bin
firmware:       i915/tgl_huc_7.9.3.bin
firmware:       i915/tgl_huc_7.9.3.bin
firmware:       i915/dg1_huc_7.9.3.bin
firmware:       i915/tgl_huc_7.9.3.bin
firmware:       i915/tgl_huc_7.9.3.bin
firmware:       i915/tgl_guc_69.0.3.bin
firmware:       i915/adlp_guc_69.0.3.bin
firmware:       i915/skl_guc_70.1.1.bin
firmware:       i915/bxt_guc_70.1.1.bin
firmware:       i915/kbl_guc_70.1.1.bin
firmware:       i915/glk_guc_70.1.1.bin
firmware:       i915/kbl_guc_70.1.1.bin
firmware:       i915/kbl_guc_70.1.1.bin
firmware:       i915/cml_guc_70.1.1.bin
firmware:       i915/icl_guc_70.1.1.bin
firmware:       i915/ehl_guc_70.1.1.bin
firmware:       i915/ehl_guc_70.1.1.bin
firmware:       i915/tgl_guc_70.1.1.bin
firmware:       i915/tgl_guc_70.1.1.bin
firmware:       i915/dg1_guc_70.1.1.bin
firmware:       i915/tgl_guc_70.1.1.bin
firmware:       i915/adlp_guc_70.1.1.bin
firmware:       i915/dg2_guc_70.1.2.bin
firmware:       i915/bxt_dmc_ver1_07.bin
firmware:       i915/skl_dmc_ver1_27.bin
firmware:       i915/kbl_dmc_ver1_04.bin
firmware:       i915/glk_dmc_ver1_04.bin
firmware:       i915/icl_dmc_ver1_09.bin
firmware:       i915/tgl_dmc_ver2_12.bin
firmware:       i915/rkl_dmc_ver2_03.bin
firmware:       i915/dg1_dmc_ver2_02.bin
firmware:       i915/adls_dmc_ver2_01.bin
firmware:       i915/adlp_dmc_ver2_16.bin
srcversion:     903EBFAE7C6843AB544D159
alias:          pci:v00008086d000056B2sv*sd*bc03sc*i*
alias:          pci:v00008086d00005697sv*sd*bc03sc*i*
alias:          pci:v00008086d00005696sv*sd*bc03sc*i*
alias:          pci:v00008086d000056B0sv*sd*bc03sc*i*
alias:          pci:v00008086d00005695sv*sd*bc03sc*i*
alias:          pci:v00008086d00005694sv*sd*bc03sc*i*
alias:          pci:v00008086d00005693sv*sd*bc03sc*i*
alias:          pci:v00008086d00005692sv*sd*bc03sc*i*
alias:          pci:v00008086d00005691sv*sd*bc03sc*i*
alias:          pci:v00008086d00005690sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A7A9sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A7A8sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A7A1sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A7A0sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A721sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A720sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A78Bsv*sd*bc03sc*i*
alias:          pci:v00008086d0000A78Asv*sd*bc03sc*i*
alias:          pci:v00008086d0000A789sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A788sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A783sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A782sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A781sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A780sv*sd*bc03sc*i*
alias:          pci:v00008086d00004909sv*sd*bc03sc*i*
alias:          pci:v00008086d00004908sv*sd*bc03sc*i*
alias:          pci:v00008086d00004907sv*sd*bc03sc*i*
alias:          pci:v00008086d00004906sv*sd*bc03sc*i*
alias:          pci:v00008086d00004905sv*sd*bc03sc*i*
alias:          pci:v00008086d000046D2sv*sd*bc03sc*i*
alias:          pci:v00008086d000046D1sv*sd*bc03sc*i*
alias:          pci:v00008086d000046D0sv*sd*bc03sc*i*
alias:          pci:v00008086d000046C3sv*sd*bc03sc*i*
alias:          pci:v00008086d000046C2sv*sd*bc03sc*i*
alias:          pci:v00008086d000046C1sv*sd*bc03sc*i*
alias:          pci:v00008086d000046C0sv*sd*bc03sc*i*
alias:          pci:v00008086d000046B3sv*sd*bc03sc*i*
alias:          pci:v00008086d000046B2sv*sd*bc03sc*i*
alias:          pci:v00008086d000046B1sv*sd*bc03sc*i*
alias:          pci:v00008086d000046B0sv*sd*bc03sc*i*
alias:          pci:v00008086d00004628sv*sd*bc03sc*i*
alias:          pci:v00008086d00004626sv*sd*bc03sc*i*
alias:          pci:v00008086d0000462Asv*sd*bc03sc*i*
alias:          pci:v00008086d000046AAsv*sd*bc03sc*i*
alias:          pci:v00008086d000046A8sv*sd*bc03sc*i*
alias:          pci:v00008086d000046A6sv*sd*bc03sc*i*
alias:          pci:v00008086d000046A3sv*sd*bc03sc*i*
alias:          pci:v00008086d000046A2sv*sd*bc03sc*i*
alias:          pci:v00008086d000046A1sv*sd*bc03sc*i*
alias:          pci:v00008086d000046A0sv*sd*bc03sc*i*
alias:          pci:v00008086d00004693sv*sd*bc03sc*i*
alias:          pci:v00008086d00004692sv*sd*bc03sc*i*
alias:          pci:v00008086d00004690sv*sd*bc03sc*i*
alias:          pci:v00008086d0000468Asv*sd*bc03sc*i*
alias:          pci:v00008086d00004688sv*sd*bc03sc*i*
alias:          pci:v00008086d00004682sv*sd*bc03sc*i*
alias:          pci:v00008086d00004680sv*sd*bc03sc*i*
alias:          pci:v00008086d00004C9Asv*sd*bc03sc*i*
alias:          pci:v00008086d00004C90sv*sd*bc03sc*i*
alias:          pci:v00008086d00004C8Csv*sd*bc03sc*i*
alias:          pci:v00008086d00004C8Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00004C8Asv*sd*bc03sc*i*
alias:          pci:v00008086d00004C80sv*sd*bc03sc*i*
alias:          pci:v00008086d00009AF8sv*sd*bc03sc*i*
alias:          pci:v00008086d00009AD9sv*sd*bc03sc*i*
alias:          pci:v00008086d00009AC9sv*sd*bc03sc*i*
alias:          pci:v00008086d00009AC0sv*sd*bc03sc*i*
alias:          pci:v00008086d00009A78sv*sd*bc03sc*i*
alias:          pci:v00008086d00009A59sv*sd*bc03sc*i*
alias:          pci:v00008086d00009A49sv*sd*bc03sc*i*
alias:          pci:v00008086d00009A40sv*sd*bc03sc*i*
alias:          pci:v00008086d00009A70sv*sd*bc03sc*i*
alias:          pci:v00008086d00009A68sv*sd*bc03sc*i*
alias:          pci:v00008086d00009A60sv*sd*bc03sc*i*
alias:          pci:v00008086d00004E71sv*sd*bc03sc*i*
alias:          pci:v00008086d00004E61sv*sd*bc03sc*i*
alias:          pci:v00008086d00004E57sv*sd*bc03sc*i*
alias:          pci:v00008086d00004E55sv*sd*bc03sc*i*
alias:          pci:v00008086d00004E51sv*sd*bc03sc*i*
alias:          pci:v00008086d00004571sv*sd*bc03sc*i*
alias:          pci:v00008086d00004557sv*sd*bc03sc*i*
alias:          pci:v00008086d00004555sv*sd*bc03sc*i*
alias:          pci:v00008086d00004551sv*sd*bc03sc*i*
alias:          pci:v00008086d00004541sv*sd*bc03sc*i*
alias:          pci:v00008086d00008A5Dsv*sd*bc03sc*i*
alias:          pci:v00008086d00008A51sv*sd*bc03sc*i*
alias:          pci:v00008086d00008A71sv*sd*bc03sc*i*
alias:          pci:v00008086d00008A70sv*sd*bc03sc*i*
alias:          pci:v00008086d00008A5Csv*sd*bc03sc*i*
alias:          pci:v00008086d00008A5Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00008A5Asv*sd*bc03sc*i*
alias:          pci:v00008086d00008A59sv*sd*bc03sc*i*
alias:          pci:v00008086d00008A58sv*sd*bc03sc*i*
alias:          pci:v00008086d00008A57sv*sd*bc03sc*i*
alias:          pci:v00008086d00008A56sv*sd*bc03sc*i*
alias:          pci:v00008086d00008A54sv*sd*bc03sc*i*
alias:          pci:v00008086d00008A53sv*sd*bc03sc*i*
alias:          pci:v00008086d00008A52sv*sd*bc03sc*i*
alias:          pci:v00008086d00008A50sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BCCsv*sd*bc03sc*i*
alias:          pci:v00008086d00009BCAsv*sd*bc03sc*i*
alias:          pci:v00008086d00009B41sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BACsv*sd*bc03sc*i*
alias:          pci:v00008086d00009BAAsv*sd*bc03sc*i*
alias:          pci:v00008086d00009B21sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BF6sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BE6sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BC8sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BC6sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BC5sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BC4sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BC2sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BA8sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BA5sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BA4sv*sd*bc03sc*i*
alias:          pci:v00008086d00009BA2sv*sd*bc03sc*i*
alias:          pci:v00008086d00003EA2sv*sd*bc03sc*i*
alias:          pci:v00008086d000087CAsv*sd*bc03sc*i*
alias:          pci:v00008086d00003EA3sv*sd*bc03sc*i*
alias:          pci:v00008086d00003EA0sv*sd*bc03sc*i*
alias:          pci:v00008086d00003EA4sv*sd*bc03sc*i*
alias:          pci:v00008086d00003EA1sv*sd*bc03sc*i*
alias:          pci:v00008086d00003EA8sv*sd*bc03sc*i*
alias:          pci:v00008086d00003EA7sv*sd*bc03sc*i*
alias:          pci:v00008086d00003EA6sv*sd*bc03sc*i*
alias:          pci:v00008086d00003EA5sv*sd*bc03sc*i*
alias:          pci:v00008086d00003EA9sv*sd*bc03sc*i*
alias:          pci:v00008086d00003E9Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00003E94sv*sd*bc03sc*i*
alias:          pci:v00008086d00003E9Csv*sd*bc03sc*i*
alias:          pci:v00008086d00003E9Asv*sd*bc03sc*i*
alias:          pci:v00008086d00003E98sv*sd*bc03sc*i*
alias:          pci:v00008086d00003E96sv*sd*bc03sc*i*
alias:          pci:v00008086d00003E92sv*sd*bc03sc*i*
alias:          pci:v00008086d00003E91sv*sd*bc03sc*i*
alias:          pci:v00008086d00003E99sv*sd*bc03sc*i*
alias:          pci:v00008086d00003E93sv*sd*bc03sc*i*
alias:          pci:v00008086d00003E90sv*sd*bc03sc*i*
alias:          pci:v00008086d000087C0sv*sd*bc03sc*i*
alias:          pci:v00008086d0000591Csv*sd*bc03sc*i*
alias:          pci:v00008086d0000593Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00005927sv*sd*bc03sc*i*
alias:          pci:v00008086d00005923sv*sd*bc03sc*i*
alias:          pci:v00008086d00005926sv*sd*bc03sc*i*
alias:          pci:v00008086d0000591Dsv*sd*bc03sc*i*
alias:          pci:v00008086d0000591Bsv*sd*bc03sc*i*
alias:          pci:v00008086d0000591Asv*sd*bc03sc*i*
alias:          pci:v00008086d00005917sv*sd*bc03sc*i*
alias:          pci:v00008086d00005912sv*sd*bc03sc*i*
alias:          pci:v00008086d0000591Esv*sd*bc03sc*i*
alias:          pci:v00008086d00005921sv*sd*bc03sc*i*
alias:          pci:v00008086d00005916sv*sd*bc03sc*i*
alias:          pci:v00008086d0000590Bsv*sd*bc03sc*i*
alias:          pci:v00008086d0000590Asv*sd*bc03sc*i*
alias:          pci:v00008086d00005908sv*sd*bc03sc*i*
alias:          pci:v00008086d00005902sv*sd*bc03sc*i*
alias:          pci:v00008086d00005915sv*sd*bc03sc*i*
alias:          pci:v00008086d0000590Esv*sd*bc03sc*i*
alias:          pci:v00008086d00005913sv*sd*bc03sc*i*
alias:          pci:v00008086d00005906sv*sd*bc03sc*i*
alias:          pci:v00008086d00003185sv*sd*bc03sc*i*
alias:          pci:v00008086d00003184sv*sd*bc03sc*i*
alias:          pci:v00008086d00005A85sv*sd*bc03sc*i*
alias:          pci:v00008086d00005A84sv*sd*bc03sc*i*
alias:          pci:v00008086d00001A85sv*sd*bc03sc*i*
alias:          pci:v00008086d00001A84sv*sd*bc03sc*i*
alias:          pci:v00008086d00000A84sv*sd*bc03sc*i*
alias:          pci:v00008086d0000193Dsv*sd*bc03sc*i*
alias:          pci:v00008086d0000193Bsv*sd*bc03sc*i*
alias:          pci:v00008086d0000193Asv*sd*bc03sc*i*
alias:          pci:v00008086d00001932sv*sd*bc03sc*i*
alias:          pci:v00008086d0000192Dsv*sd*bc03sc*i*
alias:          pci:v00008086d0000192Bsv*sd*bc03sc*i*
alias:          pci:v00008086d0000192Asv*sd*bc03sc*i*
alias:          pci:v00008086d00001927sv*sd*bc03sc*i*
alias:          pci:v00008086d00001926sv*sd*bc03sc*i*
alias:          pci:v00008086d00001923sv*sd*bc03sc*i*
alias:          pci:v00008086d0000191Dsv*sd*bc03sc*i*
alias:          pci:v00008086d0000191Bsv*sd*bc03sc*i*
alias:          pci:v00008086d0000191Asv*sd*bc03sc*i*
alias:          pci:v00008086d00001912sv*sd*bc03sc*i*
alias:          pci:v00008086d0000191Esv*sd*bc03sc*i*
alias:          pci:v00008086d00001921sv*sd*bc03sc*i*
alias:          pci:v00008086d00001916sv*sd*bc03sc*i*
alias:          pci:v00008086d00001917sv*sd*bc03sc*i*
alias:          pci:v00008086d0000190Bsv*sd*bc03sc*i*
alias:          pci:v00008086d0000190Asv*sd*bc03sc*i*
alias:          pci:v00008086d00001902sv*sd*bc03sc*i*
alias:          pci:v00008086d00001915sv*sd*bc03sc*i*
alias:          pci:v00008086d0000190Esv*sd*bc03sc*i*
alias:          pci:v00008086d00001913sv*sd*bc03sc*i*
alias:          pci:v00008086d00001906sv*sd*bc03sc*i*
alias:          pci:v00008086d000022B3sv*sd*bc03sc*i*
alias:          pci:v00008086d000022B2sv*sd*bc03sc*i*
alias:          pci:v00008086d000022B1sv*sd*bc03sc*i*
alias:          pci:v00008086d000022B0sv*sd*bc03sc*i*
alias:          pci:v00008086d0000163Dsv*sd*bc03sc*i*
alias:          pci:v00008086d0000163Asv*sd*bc03sc*i*
alias:          pci:v00008086d00001632sv*sd*bc03sc*i*
alias:          pci:v00008086d0000163Esv*sd*bc03sc*i*
alias:          pci:v00008086d0000163Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00001636sv*sd*bc03sc*i*
alias:          pci:v00008086d0000162Dsv*sd*bc03sc*i*
alias:          pci:v00008086d0000162Asv*sd*bc03sc*i*
alias:          pci:v00008086d00001622sv*sd*bc03sc*i*
alias:          pci:v00008086d0000162Esv*sd*bc03sc*i*
alias:          pci:v00008086d0000162Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00001626sv*sd*bc03sc*i*
alias:          pci:v00008086d0000161Dsv*sd*bc03sc*i*
alias:          pci:v00008086d0000161Asv*sd*bc03sc*i*
alias:          pci:v00008086d00001612sv*sd*bc03sc*i*
alias:          pci:v00008086d0000161Esv*sd*bc03sc*i*
alias:          pci:v00008086d0000161Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00001616sv*sd*bc03sc*i*
alias:          pci:v00008086d0000160Dsv*sd*bc03sc*i*
alias:          pci:v00008086d0000160Asv*sd*bc03sc*i*
alias:          pci:v00008086d00001602sv*sd*bc03sc*i*
alias:          pci:v00008086d0000160Esv*sd*bc03sc*i*
alias:          pci:v00008086d0000160Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00001606sv*sd*bc03sc*i*
alias:          pci:v00008086d00000F33sv*sd*bc03sc*i*
alias:          pci:v00008086d00000F32sv*sd*bc03sc*i*
alias:          pci:v00008086d00000F31sv*sd*bc03sc*i*
alias:          pci:v00008086d00000F30sv*sd*bc03sc*i*
alias:          pci:v00008086d00000D2Esv*sd*bc03sc*i*
alias:          pci:v00008086d00000D2Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00000D2Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000D26sv*sd*bc03sc*i*
alias:          pci:v00008086d00000D22sv*sd*bc03sc*i*
alias:          pci:v00008086d00000C2Esv*sd*bc03sc*i*
alias:          pci:v00008086d00000C2Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00000C2Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000C26sv*sd*bc03sc*i*
alias:          pci:v00008086d00000C22sv*sd*bc03sc*i*
alias:          pci:v00008086d0000042Esv*sd*bc03sc*i*
alias:          pci:v00008086d0000042Bsv*sd*bc03sc*i*
alias:          pci:v00008086d0000042Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000426sv*sd*bc03sc*i*
alias:          pci:v00008086d00000422sv*sd*bc03sc*i*
alias:          pci:v00008086d00000A2Esv*sd*bc03sc*i*
alias:          pci:v00008086d00000A2Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00000A2Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000A26sv*sd*bc03sc*i*
alias:          pci:v00008086d00000A22sv*sd*bc03sc*i*
alias:          pci:v00008086d00000D1Esv*sd*bc03sc*i*
alias:          pci:v00008086d00000D1Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00000D1Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000D16sv*sd*bc03sc*i*
alias:          pci:v00008086d00000D12sv*sd*bc03sc*i*
alias:          pci:v00008086d00000C1Esv*sd*bc03sc*i*
alias:          pci:v00008086d00000C1Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00000C1Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000C16sv*sd*bc03sc*i*
alias:          pci:v00008086d00000C12sv*sd*bc03sc*i*
alias:          pci:v00008086d0000041Esv*sd*bc03sc*i*
alias:          pci:v00008086d0000041Bsv*sd*bc03sc*i*
alias:          pci:v00008086d0000041Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000416sv*sd*bc03sc*i*
alias:          pci:v00008086d00000412sv*sd*bc03sc*i*
alias:          pci:v00008086d00000A1Esv*sd*bc03sc*i*
alias:          pci:v00008086d00000A1Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00000A1Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000A16sv*sd*bc03sc*i*
alias:          pci:v00008086d00000A12sv*sd*bc03sc*i*
alias:          pci:v00008086d00000D0Esv*sd*bc03sc*i*
alias:          pci:v00008086d00000D0Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00000D0Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000D06sv*sd*bc03sc*i*
alias:          pci:v00008086d00000D02sv*sd*bc03sc*i*
alias:          pci:v00008086d00000C0Esv*sd*bc03sc*i*
alias:          pci:v00008086d00000C0Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00000C0Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000C06sv*sd*bc03sc*i*
alias:          pci:v00008086d00000C02sv*sd*bc03sc*i*
alias:          pci:v00008086d0000040Esv*sd*bc03sc*i*
alias:          pci:v00008086d0000040Bsv*sd*bc03sc*i*
alias:          pci:v00008086d0000040Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000406sv*sd*bc03sc*i*
alias:          pci:v00008086d00000402sv*sd*bc03sc*i*
alias:          pci:v00008086d00000A0Esv*sd*bc03sc*i*
alias:          pci:v00008086d00000A0Bsv*sd*bc03sc*i*
alias:          pci:v00008086d00000A0Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000A06sv*sd*bc03sc*i*
alias:          pci:v00008086d00000A02sv*sd*bc03sc*i*
alias:          pci:v00008086d0000016Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000162sv*sd*bc03sc*i*
alias:          pci:v00008086d0000015Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000152sv*sd*bc03sc*i*
alias:          pci:v00008086d00000166sv*sd*bc03sc*i*
alias:          pci:v00008086d00000156sv*sd*bc03sc*i*
alias:          pci:v00008086d0000016Asv0000152Dsd00008990bc03sc*i*
alias:          pci:v00008086d00000126sv*sd*bc03sc*i*
alias:          pci:v00008086d00000116sv*sd*bc03sc*i*
alias:          pci:v00008086d00000106sv*sd*bc03sc*i*
alias:          pci:v00008086d00000122sv*sd*bc03sc*i*
alias:          pci:v00008086d00000112sv*sd*bc03sc*i*
alias:          pci:v00008086d0000010Asv*sd*bc03sc*i*
alias:          pci:v00008086d00000102sv*sd*bc03sc*i*
alias:          pci:v00008086d00000046sv*sd*bc03sc*i*
alias:          pci:v00008086d00000042sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A011sv*sd*bc03sc*i*
alias:          pci:v00008086d0000A001sv*sd*bc03sc*i*
alias:          pci:v00008086d00002E92sv*sd*bc03sc*i*
alias:          pci:v00008086d00002E42sv*sd*bc03sc*i*
alias:          pci:v00008086d00002E32sv*sd*bc03sc*i*
alias:          pci:v00008086d00002E22sv*sd*bc03sc*i*
alias:          pci:v00008086d00002E12sv*sd*bc03sc*i*
alias:          pci:v00008086d00002E02sv*sd*bc03sc*i*
alias:          pci:v00008086d00002A42sv*sd*bc03sc*i*
alias:          pci:v00008086d00002A12sv*sd*bc03sc*i*
alias:          pci:v00008086d00002A02sv*sd*bc03sc*i*
alias:          pci:v00008086d000029D2sv*sd*bc03sc*i*
alias:          pci:v00008086d000029C2sv*sd*bc03sc*i*
alias:          pci:v00008086d000029B2sv*sd*bc03sc*i*
alias:          pci:v00008086d000029A2sv*sd*bc03sc*i*
alias:          pci:v00008086d00002992sv*sd*bc03sc*i*
alias:          pci:v00008086d00002982sv*sd*bc03sc*i*
alias:          pci:v00008086d00002972sv*sd*bc03sc*i*
alias:          pci:v00008086d000027AEsv*sd*bc03sc*i*
alias:          pci:v00008086d000027A2sv*sd*bc03sc*i*
alias:          pci:v00008086d00002772sv*sd*bc03sc*i*
alias:          pci:v00008086d00002592sv*sd*bc03sc*i*
alias:          pci:v00008086d0000258Asv*sd*bc03sc*i*
alias:          pci:v00008086d00002582sv*sd*bc03sc*i*
alias:          pci:v00008086d00002572sv*sd*bc03sc*i*
alias:          pci:v00008086d0000358Esv*sd*bc03sc*i*
alias:          pci:v00008086d00003582sv*sd*bc03sc*i*
alias:          pci:v00008086d00002562sv*sd*bc03sc*i*
alias:          pci:v00008086d00003577sv*sd*bc03sc*i*
depends:        ttm,drm,drm_display_helper,drm_kms_helper,video,cec,drm_buddy,intel-gtt,i2c-algo-bit
retpoline:      Y
intree:         Y
name:           i915
vermagic:       5.19.0-16.4-liquorix-amd64 SMP preempt mod_unload 
parm:           modeset:Use kernel modesetting [KMS] (0=disable, 1=on, -1=force vga console preference [default]) (int)
parm:           enable_dc:Enable power-saving display C-states. (-1=auto [default]; 0=disable; 1=up to DC5; 2=up to DC6; 3=up to DC5 with DC3CO; 4=up to DC6 with DC3CO) (int)
parm:           enable_fbc:Enable frame buffer compression for power savings (default: -1 (use per-chip default)) (int)
parm:           lvds_channel_mode:Specify LVDS channel mode (0=probe BIOS [default], 1=single-channel, 2=dual-channel) (int)
parm:           panel_use_ssc:Use Spread Spectrum Clock with panels [LVDS/eDP] (default: auto from VBT) (int)
parm:           vbt_sdvo_panel_type:Override/Ignore selection of SDVO panel mode in the VBT (-2=ignore, -1=auto [default], index in VBT BIOS table) (int)
parm:           reset:Attempt GPU resets (0=disabled, 1=full gpu reset, 2=engine reset [default]) (uint)
parm:           vbt_firmware:Load VBT from specified file under /lib/firmware (charp)
parm:           error_capture:Record the GPU state following a hang. This information in /sys/class/drm/card<N>/error is vital for triaging and debugging hangs. (bool)
parm:           enable_hangcheck:Periodically check GPU activity for detecting hangs. WARNING: Disabling this can cause system wide hangs. (default: true) (bool)
parm:           enable_psr:Enable PSR (0=disabled, 1=enable up to PSR1, 2=enable up to PSR2) Default: -1 (use per-chip default) (int)
parm:           psr_safest_params:Replace PSR VBT parameters by the safest and not optimal ones. This is helpful to detect if PSR issues are related to bad values set in  VBT. (0=use VBT parameters, 1=use safest parameters) (bool)
parm:           enable_psr2_sel_fetch:Enable PSR2 selective fetch (0=disabled, 1=enabled) Default: 0 (bool)
parm:           force_probe:Force probe the driver for specified devices. See CONFIG_DRM_I915_FORCE_PROBE for details. (charp)
parm:           disable_power_well:Disable display power wells when possible (-1=auto [default], 0=power wells always on, 1=power wells disabled when possible) (int)
parm:           enable_ips:Enable IPS (default: true) (int)
parm:           fastboot:Try to skip unnecessary mode sets at boot time (0=disabled, 1=enabled) Default: -1 (use per-chip default) (int)
parm:           load_detect_test:Force-enable the VGA load detect code for testing (default:false). For developers only. (bool)
parm:           force_reset_modeset_test:Force a modeset during gpu reset for testing (default:false). For developers only. (bool)
parm:           invert_brightness:Invert backlight brightness (-1 force normal, 0 machine defaults, 1 force inversion), please report PCI device ID, subsystem vendor and subsystem device ID to dri-devel@lists.freedesktop.org, if your machine needs it. It will then be included in an upcoming module version. (int)
parm:           disable_display:Disable display (default: false) (bool)
parm:           memtest:Perform a read/write test of all device memory on module load (default: off) (bool)
parm:           mmio_debug:Enable the MMIO debug code for the first N failures (default: off). This may negatively affect performance. (int)
parm:           verbose_state_checks:Enable verbose logs (ie. WARN_ON()) in case of unexpected hw state conditions. (bool)
parm:           nuclear_pageflip:Force enable atomic functionality on platforms that don't have full support yet. (bool)
parm:           edp_vswing:Ignore/Override vswing pre-emph table selection from VBT (0=use value from vbt [default], 1=low power swing(200mV),2=default swing(400mV)) (int)
parm:           enable_guc:Enable GuC load for GuC submission and/or HuC load. Required functionality can be selected using bitmask values. (-1=auto [default], 0=disable, 1=GuC submission, 2=HuC load) (int)
parm:           guc_log_level:GuC firmware logging level. Requires GuC to be loaded. (-1=auto [default], 0=disable, 1..4=enable with verbosity min..max) (int)
parm:           guc_firmware_path:GuC firmware path to use instead of the default one (charp)
parm:           huc_firmware_path:HuC firmware path to use instead of the default one (charp)
parm:           dmc_firmware_path:DMC firmware path to use instead of the default one (charp)
parm:           enable_dp_mst:Enable multi-stream transport (MST) for new DisplayPort sinks. (default: true) (bool)
parm:           enable_dpcd_backlight:Enable support for DPCD backlight control(-1=use per-VBT LFP backlight type setting [default], 0=disabled, 1=enable, 2=force VESA interface, 3=force Intel interface) (int)
parm:           enable_gvt:Enable support for Intel GVT-g graphics virtualization host support(default:false) (bool)
parm:           request_timeout_ms:Default request/fence/batch buffer expiration timeout. (uint)
parm:           lmem_size:Set the lmem size(in MiB) for each region. (default: 0, all memory) (uint)
parm:           mitigations:Selectively enable security mitigations for all Intel® GPUs in the system.

  auto -- enables all mitigations required for the platform [default]
  off  -- disables all mitigations

Individual mitigations can be enabled by passing a comma-separated string,
e.g. mitigations=residuals to enable only clearing residuals or
mitigations=auto,noresiduals to disable only the clear residual mitigation.
Either '!' or 'no' may be used to switch from enabling the mitigation to
disabling it.

Active mitigations for Ivybridge, Baytrail, Haswell:
  residuals -- clear all thread-local registers between contexts
(base) tedliosu@victus-ted:~$ clinfo -l | grep HD
Platform #2: Intel(R) OpenCL HD Graphics
 `-- Device #0: Intel(R) UHD Graphics [0x9a68]

I have 11400H tiger lake integrated graphics for my intel GPU; is it not officially supported? @jingxu10

sanchitintel commented 2 years ago

This current release is for Discrete Graphics cards. While it only mentions Flex Series 170 GPU, it also supports the Intel Arc Alchemist series GPUs.

Intel Extension for PyTorch is currently not officially supported for integrated GPUs. We may support them in the near future, though. However, if you'd like, you'd be able to build from source with these instructions, except that you'd have to set the environment variable USE_AOT_DEVLIST as xe, or modify USE_AOT_DEVLIST in CMakeLists.txt:

set(USE_AOT_DEVLIST "xe" CACHE STRING "Set device list for AOT build (for example, skl,ats,...)")
jingxu10 commented 2 years ago

There are several things involved.

  1. Computation power of iGPU is not compatible with that of Flex Series 170. It is expected performance is slow.
  2. The prebuilt wheel files are AOTed for Flex Series 170. AOT is a technique that generates executable binaries in the wheel file during compilation time. On Non-Flex Series 170 gpus, IPEX will generate executable binaries in runtime first and then executes them. This also contributes to a "freezing" when the script is executed.
  3. oneDNN JIT technique, that I described in your previous thread, also takes time generating executable binaries in runtime.

As Sanchit mentioned in the reply above, you can try compiling ipex from source with AOT configured for your graphics card. What needs to mention is that the official support to IPEX GPU currently is only on Flex Series 170.

tedliosu commented 2 years ago

There are several things involved.

1. Computation power of iGPU is not compatible with that of Flex Series 170. It is expected performance is slow.

2. The prebuilt wheel files are AOTed for Flex Series 170. AOT is a technique that generates executable binaries in the wheel file during compilation time. On Non-Flex Series 170 gpus, IPEX will generate executable binaries in runtime first and then executes them. This also contributes to a "freezing" when the script is executed.

3. oneDNN JIT technique, that I described in your previous thread, also takes time generating executable binaries in runtime.

As Sanchit mentioned in the reply above, you can try compiling ipex from source with AOT configured for your graphics card. What needs to mention is that the official support to IPEX GPU currently is only on Flex Series 170.

@jingxu10 Thank you so much for the info! I'll try building from source and testing it out within the next week or so as I'm very busy with school right now :smile:

sanchitintel commented 2 years ago

Thanks for your interest in Intel Extension for PyTorch, @tedliosu! We look forward to your response!

As @jingxu10 also mentioned, the current whls are for Flex Series 170 GPUs (which are discrete GPUs similar to Intel Arc Alchemist series GPUs). Just FYI, AOT (or USE_AOT_DEVLIST) builds GPU kernels for the target device (in this case, the whls were generated for Discrete GPUs), so they'd not work with your iGPU, and you'd have to build from source for your own GPU.

tedliosu commented 2 years ago

Thanks for your interest in Intel Extension for PyTorch, @tedliosu! We look forward to your response!

As @jingxu10 also mentioned, the current whls are for Flex Series 170 GPUs (which are discrete GPUs similar to Intel Arc Alchemist series GPUs). Just FYI, AOT (or USE_AOT_DEVLIST) builds GPU kernels for the target device (in this case, the whls were generated for Discrete GPUs), so they'd not work with your iGPU, and you'd have to build from source for your own GPU.

@jingxu10 @sanchitintel btw I just looked at the documentation here briefly, and won't I need to set USE_AOT_DEVLIST as tgllp instead of xe? I just tried doing icpx -fsycl -fsycl-targets=spir64_gen -Xs "-device xe" vector-add.cpp -o vector-add on an example vector-add.cpp file and this was the output, and as you can see none of the listed architectures match the Tiger Lake iGPU that I have:

Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g10.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g11.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g12.
Compilation from IR - skipping loading of FCL
Build succeeded for : pvc-sdv.
Compilation from IR - skipping loading of FCL
Build succeeded for : pvc.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g10.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g11.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g12.
Compilation from IR - skipping loading of FCL
Build succeeded for : pvc-sdv.
Compilation from IR - skipping loading of FCL
Build succeeded for : pvc.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g10.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g11.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g12.
Compilation from IR - skipping loading of FCL
Build succeeded for : pvc-sdv.
Compilation from IR - skipping loading of FCL
Build succeeded for : pvc. 
sanchitintel commented 2 years ago

Hi @tedliosu, thanks for checking it out! While the link you provided does not have the latest info (it doesn't even list discrete GPUs), you're right that you might have to use tgllp! Since we don't currently have official support for iGPUs, we haven't gotten a chance to check them out.

Can you please use dpcpp instead of icpx for your example, BTW? Are you using the latest oneAPI Basekit, BTW? If so, ocloc compile --help would show you even more device targets.

Thanks to you taking this initiative, your solution here would also help others! :)

tedliosu commented 2 years ago

Hi @tedliosu, thanks for checking it out! While the link you provided does not have the latest info (it doesn't even list discrete GPUs), you're right that you might have to use tgllp! Since we don't currently have official support for iGPUs, we haven't gotten a chance to check them out.

Can you please use dpcpp instead of icpx for your example, BTW? Are you using the latest oneAPI Basekit, BTW? If so, ocloc compile --help would show you even more device targets.

Thanks to you taking this initiative, your solution here would also help others! :)

Full outputs on my machine @sanchitintel:

(base) tedliosu@victus-ted:~/Documents/test_intel_example$ dpcpp -fsycl-targets=spir64_gen -Xs "-device xe" vector-add.cpp -o vector-add
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g10.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g11.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g12.
Compilation from IR - skipping loading of FCL
Build succeeded for : pvc-sdv.
Compilation from IR - skipping loading of FCL
Build succeeded for : pvc.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g10.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g11.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g12.
Compilation from IR - skipping loading of FCL
Build succeeded for : pvc-sdv.
Compilation from IR - skipping loading of FCL
Build succeeded for : pvc.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g10.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g11.
Compilation from IR - skipping loading of FCL
Build succeeded for : acm-g12.
Compilation from IR - skipping loading of FCL
Build succeeded for : pvc-sdv.
Compilation from IR - skipping loading of FCL
Build succeeded for : pvc.
(base) tedliosu@victus-ted:~/Documents/test_intel_example$ dpcpp -fsycl-targets=spir64_gen -Xs "-device tgllp" vector-add.cpp -o vector-add
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
Compilation from IR - skipping loading of FCL
Build succeeded.
(base) tedliosu@victus-ted:~/Documents/test_intel_example$ ./vector-add 
Device: Intel(R) UHD Graphics [0x9a68]
successout
(base) tedliosu@victus-ted:~/Documents/test_intel_example$ ocloc compile --help
Compiles input file to Intel Compute GPU device binary (*.bin).
Additionally, outputs intermediate representation (e.g. spirV).
Different input and intermediate file formats are available.

Usage: ocloc [compile] -file <filename> -device <device_type> [-output <filename>] [-out_dir <output_dir>] [-options <options>] [-32|-64] [-internal_options <options>] [-llvm_text|-llvm_input|-spirv_input] [-options_name] [-q] [-cpp_file] [-output_no_suffix] [--help]

  -file <filename>              The input file to be compiled
                                (by default input source format is
                                OpenCL C kernel language).

  -device <device_type>         Target device.
                                <device_type> can be: bdw, skl, kbl, cfl, apl, bxt, glk, whl, aml, cml, icllp, lkf, ehl, jsl, tgllp, rkl, adl-s, adl-p, adl-n, dg1, acm-g10, ats-m150, dg2-g10, acm-g11, ats-m75, dg2-g11, acm-g12, dg2-g12, pvc-sdv, pvc, gen11, gen12lp, gen8, gen9, xe, xe-hp, xe-hpc, xe-hpg, version  or hexadecimal value with 0x prefix
                                - can be single or multiple target devices.
                                The version is a representation of the
                                <major>.<minor>.<revision> value.
                                The hexadecimal value represents device ID.
                                If such value is provided, ocloc will try to
                                match it with corresponding device type.
                                For example, 0xFF20 device ID will be translated
                                to tgllp.
                                If multiple target devices are provided, ocloc
                                will compile for each of these targets and will
                                create a fatbinary archive that contains all of
                                device binaries produced this way.
                                Supported -device patterns examples:
                                -device 0x4905        ; will compile 1 target (dg1)
                                -device 12.10.0       ; will compile 1 target (dg1)
                                -device dg1           ; will compile 1 target
                                -device dg1,acm-g10   ; will compile 2 targets
                                -device dg1:acm-g10   ; will compile all targets
                                                        in range (inclusive)
                                -device dg1:          ; will compile all targets
                                                        newer/same as provided
                                -device :dg1          ; will compile all targets
                                                        older/same as provided
                                -device xe-hpg        ; will compile all targets
                                                        matching the same release
                                -device xe            ; will compile all targets
                                                        matching the same family
                                -device xe-hpg:xe-hpc ; will compile all targets
                                                        in range (inclusive)
                                -device xe-hpg:       ; will compile all targets
                                                        newer/same as provided
                                -device :xe-hpg       ; will compile all targets
                                                        older/same as provided
                                                        known to ocloc

                                Deprecated notation that is still supported:
                                <device_type> can be: xe_hp_sdv, dg2
                                - can be single target device.

  -output <filename>            Optional output file base name.
                                Default is input file's base name.
                                This base name will be used for all output
                                files. Proper sufixes (describing file formats)
                                will be added automatically.

  -out_dir <output_dir>         Optional output directory.
                                Default is current working directory.

  -options <options>            Optional OpenCL C compilation options
                                as defined by OpenCL specification.
                                Special options for Vector Compute:
                                -vc-codegen <vc options> compile from SPIRV
                                -cmc <cm-options> compile from CM sources

  -32                           Forces target architecture to 32-bit pointers.
                                Default pointer size is inherited from
                                ocloc's pointer size.
                                This option is exclusive with -64.

  -64                           Forces target architecture to 64-bit pointers.
                                Default pointer size is inherited from
                                ocloc's pointer size.
                                This option is exclusive with -32.

  -internal_options <options>   Optional compiler internal options
                                as defined by compilers used underneath.
                                Check intel-graphics-compiler (IGC) project
                                for details on available internal options.
                                You also may provide explicit --help to inquire
                                information about option, mentioned in -options

  -llvm_text                    Forces intermediate representation (IR) format
                                to human-readable LLVM IR (.ll).
                                This option affects only output files
                                and should not be used in combination with
                                '-llvm_input' option.
                                Default IR is spirV.
                                This option is exclusive with -spirv_input.
                                This option is exclusive with -llvm_input.

  -llvm_input                   Indicates that input file is an llvm binary.
                                Default is OpenCL C kernel language.
                                This option is exclusive with -spirv_input.
                                This option is exclusive with -llvm_text.

  -spirv_input                  Indicates that input file is a spirV binary.
                                Default is OpenCL C kernel language format.
                                This option is exclusive with -llvm_input.
                                This option is exclusive with -llvm_text.

  -options_name                 Will add suffix to output files.
                                This suffix will be generated based on input
                                options (useful when rebuilding with different 
                                set of options so that results won't get
                                overwritten).
                                This suffix is added always as the last part
                                of the filename (even after file's extension).
                                It does not affect '--output' parameter and can
                                be used along with it ('--output' parameter
                                defines the base name - i.e. prefix).

  -force_stos_opt               Will forcibly enable stateless to stateful optimization,
                                i.e. skip "-cl-intel-greater-than-4GB-buffer-required".

  -q                            Will silence most of output messages.

  -spv_only                     Will generate only spirV file.

  -cpp_file                     Will generate c++ file with C-array
                                containing Intel Compute device binary.

  -gen_file                     Will generate gen file.

  -output_no_suffix             Prevents ocloc from adding family name suffix.

  --help                        Print this usage message.

  -revision_id <revision_id>    Target stepping. Can be decimal or hexadecimal value.

  -exclude_ir                   Excludes IR from the output binary file.

  --format                      Enforce given binary format. The possible values are:
                                --format zebin - Enforce generating zebin binary
                                --format patchtokens - Enforce generating patchtokens (legacy) binary.

  -config                       Target hardware info config for a single device,
                                e.g 1x4x8.

Examples :
  Compile file to Intel Compute GPU device binary (out = source_file_Gen9core.bin)
    ocloc -file source_file.cl -device skl
Build failed with error code: -44
Command was: ocloc compile --help
(base) tedliosu@victus-ted:~/Documents/test_intel_example$ dpcpp --version
Intel(R) oneAPI DPC++/C++ Compiler 2022.2.0 (2022.2.0.20220730)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2022.2.0/linux/bin-llvm
sanchitintel commented 2 years ago

Awesome! Thanks, @tedliosu! Looks like your iGPU is indeed tgllp! :)

tedliosu commented 2 years ago

Awesome! Thanks, @tedliosu! Looks like your iGPU is indeed tgllp! :)

Help @sanchitintel this rabbit hole of issues just won't stop :sob: - now i'm running into compilation errors

tedliosu commented 2 years ago

Awesome! Thanks, @tedliosu! Looks like your iGPU is indeed tgllp! :)

Help @sanchitintel this rabbit hole of issues just won't stop sob - now i'm running into compilation errors

@sanchitintel @jingxu10 I managed to solve the compilation errors per here, but unfortunately I'm still running into the same exact freezing behavior after compiling IPEX with export USE_AOT_DEVLIST=tgllp; and I verified that my current IPEX install has Tiger Lake AOT built-in by running the following command in the docker instance where I've installed PyTorch and IPEX:

root@d3d5b40f2ad1:/workspace# strings /opt/intel/oneapi/intelpython/python3.9/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so | grep tgllp
-options -cl-poison-unsupported-fp64-kernels -device tgllp

I even tried removing the -cl-poison-unsupported-fp64-kernels option from here to see if that would help:

root@46fad2a5ab08:/# strings /opt/intel/oneapi/intelpython/python3.9/lib/python3.9/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so | grep tgllp
-fsycl-targets=spir64_gen,spir64 -device tgllp

But unfortunately my example script that I posted initially in this issue still fails to continue running after one iteration of the for loop even after that option was removed from the build options. Each time after it hits the beginning of the 2nd iteration of the for loop my Intel iGPU utilization would drop to zero and the script would basically hang indefinitely until I killed it. :confused: Since I also had to remove CUDA and cuDNN support from my IPEX-patched PyTorch build due to the version of CUDA/cuDNN installed on my computer being incompatible with the version of torch currently required by IPEX, could that have also affected things and be causing the freezing issues, or does the root issue stem from somewhere entirely different? :eyes:

Btw, yes, I've also tried running IPEX outside of docker and am running into the same exact issue there as well. :disappointed:

Please help me on this matter, thank you.

jingxu10 commented 2 years ago

Hi, most likely it is just simply the execution is slow due to limited computation power of the iGPU.

tedliosu commented 2 years ago

Hi, most likely it is just simply the execution is slow due to limited computation power of the iGPU.

As I've said, the iGPU utilization would drop to zero before the script started hanging indefinitely, so I doubt that it's the iGPU being slow that's causing this issue I'm facing. How long does the script have to hang at a given line before it is considered definitely a bug and not just the iGPU being slow?

@jingxu10

tedliosu commented 2 years ago

@jingxu10 @sanchitintel I am unfortunately unable to upload the full python script trace and strace traces to this comment due to how large the file sizes are, but as you can see from the two trace files I've uploaded to here, where "strace_result.txt" is the strace results and "ipex_f32_example_py_trace.txt" is the python trace results, that when the f32 example training script does get stuck the following trace results are produced -

Excerpt from end of "ipex_f32_example_py_trace.txt":

 ...
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
 --- modulename: collate, funcname: <genexpr>
collate.py(81):         if not all(len(elem) == elem_size for elem in it):
...

Excerpt from end of "ipex_f32_example_py_trace.txt":

...
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
ioctl(3, DRM_IOCTL_I915_GET_RESET_STATS, 0x7ffedb6d7d00) = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
...
ioctl(3, DRM_IOCTL_I915_GET_RESET_STATS, 0x7ffedb6d7d00) = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
...

And I cannot stress enough that the following are the truth: every time the script gets stuck at a particular line (around the beginning of the 2nd iteration of the for loop), the iGPU utilization would drop to zero and CPU utilization would hover at around at least 50% or above, so like I've said I highly doubt that this is an issue of my iGPU being slow as I've even successfully ran Stable Diffusion to completion using ITEX (although SD took about 5 minutes to initialize and 15 minutes to run in total). So please if you could help me debug what has actually gone wrong here that'd be much appreciated :+1:

Again, just how long does the script have to hang at a given line before it is considered definitely a bug and not just the iGPU being slow?

sanchitintel commented 2 years ago

Hi @tedliosu! Thanks again for the info! We'll investigate this issue while enabling Intel Extension for PyTorch for iGPUs. iGPUs are currently unsupported.

tedliosu commented 1 year ago

Hi @tedliosu! Thanks again for the info! We'll investigate this issue while enabling Intel Extension for PyTorch for iGPUs. iGPUs are currently unsupported.

@sanchitintel @jingxu10 I finally managed to get the script I posted initially to work! :tada: :tada: working_xpu_training I actually did it by following driver-installation instructions posted here which were meant for Arc GPUs (minus the resizable bar bit of course as my laptop's intel GPU is an iGPU) and installed the i915 kernel driver and other associated kernel modules from the development branch of Intel's official graphics drivers repositories for Ubuntu 20.04. And given how I am not the first person to have issues with your guys' machine learning software frameworks when running them on mainline kernel drivers as seen here; I highly suspect that there is something going on with the i915 drivers built into the Linux kernel (at least for Ubuntu-based distros and their derivatives) that needs fixing for IPEX and ITEX to both work properly and to their fullest potential (while running on such distros).

However, as soon as I tried running my own scripts on the newer i915 kernel driver, due to the current software stack's inadequate support for emulated FP64 instructions I ran into issues again no matter whether I had FP64 emulation turned on or turned off, and that is BOTH with the packages I'd built myself AND the pre-built packages I grabbed from here off of this GitHub repo. :frowning: But i suppose as that is a separate issue that the one raised here I'll raise a separate GitHub issue for this newer set of problems that I ran into :upside_down_face:.

EDIT - separate issue that I've raised for broken FP64 emulation support may be found here

tedliosu commented 1 year ago

Hi @tedliosu! Thanks again for the info! We'll investigate this issue while enabling Intel Extension for PyTorch for iGPUs. iGPUs are currently unsupported.

@sanchitintel @jingxu10 I finally managed to get the script I posted initially to work! tada tada working_xpu_training I actually did it by following driver-installation instructions posted here which were meant for Arc GPUs (minus the resizable bar bit of course as my laptop's intel GPU is an iGPU) and installed the i915 kernel driver and other associated kernel modules from the development branch of Intel's official graphics drivers repositories for Ubuntu 20.04. And given how I am not the first person to have issues with your guys' machine learning software frameworks when running them on mainline kernel drivers as seen here; I highly suspect that there is something going on with the i915 drivers built into the Linux kernel (at least for Ubuntu-based distros and their derivatives) that needs fixing for IPEX and ITEX to both work properly and to their fullest potential (while running on such distros).

However, as soon as I tried running my own scripts on the newer i915 kernel driver, due to the current software stack's inadequate support for emulated FP64 instructions I ran into issues again no matter whether I had FP64 emulation turned on or turned off, and that is BOTH with the packages I'd built myself AND the pre-built packages I grabbed from here off of this GitHub repo. frowning But i suppose as that is a separate issue that the one raised here I'll raise a separate GitHub issue for this newer set of problems that I ran into upside_down_face.

EDIT - separate issue that I've raised for broken FP64 emulation support may be found here

@sanchitintel @jingxu10 UPDATE AGAIN - Found a workaround for the broken FP64 emulation issue :wink: here

tedliosu commented 1 year ago

@sanchitintel

Since Intel iGPU support for IPEX is not even planned yet still, should I close this issue for the time being as "not planned/won't fix", or leave this open in case if Intel iGPU support for IPEX does get planned down the line?

sanchitintel commented 1 year ago

Hi @jingxu10, please advise if we should leave this issue open so that it'd be a known issue when/if we'd start working on iGPU support. Thanks