Open nathanodle opened 1 year ago
Which GPU did you run on?
We will look into this issue.
Which GPU did you run on?
Sorry, I should have mentioned that. Arc 770, latest drivers on Ubuntu.
Thank you very much for looking into this, I really appreciate it!
Is there an eta for someone to look at this? Just curious as I have a project I'm trying to validate on ARC. Thanks!
We are looking into this issue, and will update later. Seems like there are some issues found.
similar issue while trying to run openai-whisper on A770
from . import load_model
+ import intel_extension_for_pytorch as ipex
model = load_model(model_name, device=device, download_root=model_dir)
+ model.eval()
+ model = model.to('xpu')
+ ipex.optimize(model)
whisper --model tiny --language en --task transcribe --device xpu ...
results in
intel_extension_for_pytorch/frontend.py:264: UserWarning: Conv BatchNorm folding failed during the optimize process.
intel_extension_for_pytorch/frontend.py:277: UserWarning: pending the optimization for LSTM
Whipser then fails to decode the tokens.
torch 1.10.0a0+git3d5f2d4
intel-extension-for-pytorch 1.10.200+gpu
. /opt/intel/oneapi/tbb/2021.8.0/env/vars.sh
. /opt/intel/oneapi/compiler/2022.2.0/env/vars.sh
. /opt/intel/oneapi/mkl/2022.2.0/env/vars.sh
> sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.14.7.0.30_160000]
[opencl:cpu:1] Intel(R) OpenCL, AMD Ryzen 9 5900X 12-Core Processor 3.0 [2022.14.7.0.30_160000]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x56a0] 3.0 [22.49.25018.23]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x56a0] 1.3 [1.3.25018]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]
> uname -a
Linux 5.17.0-1020-oem #21-Ubuntu SMP PREEMPT Fri Oct 14 09:33:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Update, I have also tried this with an Intel I9-11900K CPU and A770 with the same result. The first attempt was using an AMD Threadripper. The code does not work on either platform. Is there a timeline for this issue? Thanks so much!
This issue will be fixed in the next release soon.
Just a note, I have gotten bad results with every single model I've tried to use with XPU, it's not limited to this model. From my perspective, ARC has been unusable for almost 2 months now. I bought 6 Arc A770s for a project and this has been a waste so far.
I understand that I'm just one user and your team has their own plan. Can you give me anything to help me use these cards though? Is there a branch I can try or at least can you provide a release date so I know if I should continue trying with this hardware? Thanks very much!
This incorrect output issue had been fixed in the latest code base. The next release is pending, though, you can try compile from source at this moment with https://github.com/intel/intel-extension-for-pytorch/blob/xpu-master/scripts/compile_bundle.sh. You need to use oneAPI basekit 2023.1 and with driver 602. https://dgpu-docs.intel.com/releases/stable_602_20230323.html
similar issue while trying to run openai-whisper on A770
from . import load_model + import intel_extension_for_pytorch as ipex model = load_model(model_name, device=device, download_root=model_dir) + model.eval() + model = model.to('xpu') + ipex.optimize(model)
whisper --model tiny --language en --task transcribe --device xpu ...
results in
intel_extension_for_pytorch/frontend.py:264: UserWarning: Conv BatchNorm folding failed during the optimize process. intel_extension_for_pytorch/frontend.py:277: UserWarning: pending the optimization for LSTM
Whipser then fails to decode the tokens.
torch 1.10.0a0+git3d5f2d4 intel-extension-for-pytorch 1.10.200+gpu
. /opt/intel/oneapi/tbb/2021.8.0/env/vars.sh . /opt/intel/oneapi/compiler/2022.2.0/env/vars.sh . /opt/intel/oneapi/mkl/2022.2.0/env/vars.sh
> sycl-ls [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.14.7.0.30_160000] [opencl:cpu:1] Intel(R) OpenCL, AMD Ryzen 9 5900X 12-Core Processor 3.0 [2022.14.7.0.30_160000] [opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x56a0] 3.0 [22.49.25018.23] [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x56a0] 1.3 [1.3.25018] [host:host:0] SYCL host platform, SYCL host device 1.2 [1.2] > uname -a Linux 5.17.0-1020-oem #21-Ubuntu SMP PREEMPT Fri Oct 14 09:33:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Hi, at this moment, please try compiling the latest code from source for now. Please take a reference to the comment above.
compilation took hours and multiple attempts, but whisper is working with the xpu-master branch and even loads the large model into the 16GB VRAM.
$ whisper --language en --model large --device xpu some.mp3
python3.10/site-packages/intel_extension_for_pytorch/frontend.py:484: UserWarning: Split Master Weight feature is not supported on XPU for now, disabled.
python3.10/site-packages/intel_extension_for_pytorch/frontend.py:494: UserWarning: To reduce device memory usage on XPU, optimization are done inplace, setting the inplace argument to True.
python3.10/site-packages/intel_extension_for_pytorch/frontend.py:500: UserWarning: Weight Prepack and Sample Input are both disabled on XPU. The Onednn Layout is automatically applied.
python3.10/site-packages/intel_extension_for_pytorch/frontend.py:506: UserWarning: For XPU, the optimize_lstm(replace lstm with ipex_lstm) is unsupported, so disable it
python3.10/site-packages/intel_extension_for_pytorch/frontend.py:526: UserWarning: Conv BatchNorm folding failed during the optimize process.
python3.10/site-packages/intel_extension_for_pytorch/frontend.py:531: UserWarning: Linear BatchNorm folding failed during the optimize process.
speed looks ok'ish, but given the warnings probably room for improvement.
intel_gpu_top shows 52% Render, 75% Blitter, 24% unknown.
whisper patch
diff --git a/whisper/transcribe.py b/whisper/transcribe.py
index ed6d820..0d9e3c8 100644
--- a/whisper/transcribe.py
+++ b/whisper/transcribe.py
@@ -429,8 +429,13 @@ def cli():
torch.set_num_threads(threads)
from . import load_model
+ import intel_extension_for_pytorch as ipex
model = load_model(model_name, device=device, download_root=model_dir)
+ model.eval()
+ model = model.to(device)
+ if device == 'xpu':
+ ipex.optimize(model)
writer = get_writer(output_format, output_dir)
for audio_path in args.pop("audio"):
python modules
openai-whisper 20230314
intel-extension-for-pytorch 1.13.120+git5fdf9e6
torch 1.13.0a0+git49444c3
torchaudio 0.13.1+b90d798
torchvision 0.14.1a0+5e8e2f1
> sycl-ls
[opencl:gpu:0] Intel(R) OpenCL HD Graphics, Intel(R) Arc(TM) A770 Graphics 3.0 [23.05.25593.18]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.25593]
apt packages
intel-i915-dkms 1.23.3.19.230122.18.5.17.0.1020+i38-1
intel-dpcpp-cpp-compiler-2023.1.0 2023.1.0-46347
intel-oneapi-mkl-2023.1.0 2023.1.0-46342
intel-oneapi-mkl-devel-2023.1.0 2023.1.0-46342
kernel 5.17.0-1020-oem
above warnings go away when ipex.optimize(model)
is omitted
found a metric to display GPU memory usage using lsgpu
normal usage
> lsgpu -p | grep ^lmem_
lmem_avail_bytes : 16260284416
lmem_total_bytes : 17079205888
openai whisper large mode loaded
lmem_avail_bytes : 4605845504
lmem_total_bytes : 17079205888
took hours to build, so uploaded unofficial wheels of xpu-master here: https://github.com/leuc/intel-extension-for-pytorch/releases/tag/v1.13.120%2Bgit5fdf9e6
@leuc How much RAM does your computer possess? It builds in around 20-25min on my workstation, utilizing slightly under 20GB of memory. However, when attempting building using a Github Actions I made (per Github docs, the VM has 7GB of memory) or a self-hosted runner on a laptop with 8GB of RAM, I didn't even get a build to finish.
@jingxu10 Having something akin to a nightly beta build from Intel could be really useful here.
@fredlarochelle it wasn't a resource issue, but the script doesn't build well without conda. I may work on a PR for better portability, with aim for CI/CD and containers.
@leuc Yeah, I know about conda + the GCC 11 requirement, however I had no luck with GCC 11, not consistent at all, got it working way better with GCC 9. We should probably have a look into the compiler flags used too.
what are error messages? I would recommend to do the compilation in a docker container.
what are error messages? I would recommend to do the compilation in a docker container.
addressed some build issues with PR https://github.com/intel/intel-extension-for-pytorch/pull/334
I'm using a tiny test network that is just one linear layer. Using the updated build I still get:
/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: warn(f"Failed to load image Python extension: {e}") /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/frontend.py:484: UserWarning: Split Master Weight feature is not supported on XPU for now, disabled. warnings.warn("Split Master Weight feature is not supported on XPU for now, disabled.") /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/frontend.py:500: UserWarning: Weight Prepack and Sample Input are both disabled on XPU. The Onednn Layout is automatically applied. warnings.warn( /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/frontend.py:506: UserWarning: For XPU, the optimize_lstm(replace lstm with ipex_lstm) is unsupported, so disable it
I don't know how this is possible because there's no LSTM at all!
import torch
from torch import nn
from torch.utils.data import DataLoader, Dataset
import math
import os
import glob
import random
import librosa
import soundfile as sf
import numpy as np
import intel_extension_for_pytorch as ipex
default_device = torch.device("xpu")
class DummyLayer(nn.Module):
def __init__(self):
super(DummyLayer, self).__init__()
self.layer = nn.Linear(1, 1)
def forward(self, src):
src = src.unsqueeze(-1)
src = self.layer(src)
src = src.squeeze(-1)
return src
model = DummyLayer()
model.to(default_device)
criterion = nn.MSELoss()
lr_factor = 0.1
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16, inplace=True)
target_sample_rate=8000
def load_file(path):
data, sample_rate = librosa.load(path, sr=target_sample_rate)
data = torch.from_numpy(data)
data = data.unsqueeze(0)
data = torch.mean(data.to(default_device), dim=0).unsqueeze(0)
return data
train = load_file("testrecording_8k.wav")
target = load_file("testrecording_target_8k.wav")
# Training loop
num_epochs = 150000
for epoch in range(num_epochs):
print("running")
# batch = batch.to(memory_format=torch.channels_last)
# target = target.to(memory_format=torch.channels_last)
train = train.bfloat16()
target = target.bfloat16()
optimizer.zero_grad()
with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
output = model(train)
loss = criterion(output, target)
print(f'Epoch: {epoch+1}/{num_epochs}, Step: {epoch+1}, Loss: {loss.item()}')
print("output", output.cpu())
print("target", target.cpu())
loss.backward()
optimizer.step()
print(f'Epoch: {epoch+1}/{num_epochs}, Step: {epoch+1}, Loss: {loss.item()}')
# every few steps save the output
if (epoch+1) % 50 == 0:
# Save the output to file
output = torch.flatten(output, start_dim=0)
print(output.size())
sf.write("samples2/testrecording_8k_progress2_" + str(epoch) + ".wav", output.float().cpu().detach().numpy(), target_sample_rate)
@zejun-chen Is this a known issue we already fixed?
I'm using a tiny test network that is just one linear layer. Using the updated build I still get:
/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: warn(f"Failed to load image Python extension: {e}") /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/frontend.py:484: UserWarning: Split Master Weight feature is not supported on XPU for now, disabled. warnings.warn("Split Master Weight feature is not supported on XPU for now, disabled.") /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/frontend.py:500: UserWarning: Weight Prepack and Sample Input are both disabled on XPU. The Onednn Layout is automatically applied. warnings.warn( /usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/frontend.py:506: UserWarning: For XPU, the optimize_lstm(replace lstm with ipex_lstm) is unsupported, so disable it
I don't know how this is possible because there's no LSTM at all!
import torch from torch import nn from torch.utils.data import DataLoader, Dataset import math import os import glob import random import librosa import soundfile as sf import numpy as np import intel_extension_for_pytorch as ipex default_device = torch.device("xpu") class DummyLayer(nn.Module): def __init__(self): super(DummyLayer, self).__init__() self.layer = nn.Linear(1, 1) def forward(self, src): src = src.unsqueeze(-1) src = self.layer(src) src = src.squeeze(-1) return src model = DummyLayer() model.to(default_device) criterion = nn.MSELoss() lr_factor = 0.1 optimizer = torch.optim.Adam(model.parameters(), lr=0.001) model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16, inplace=True) target_sample_rate=8000 def load_file(path): data, sample_rate = librosa.load(path, sr=target_sample_rate) data = torch.from_numpy(data) data = data.unsqueeze(0) data = torch.mean(data.to(default_device), dim=0).unsqueeze(0) return data train = load_file("testrecording_8k.wav") target = load_file("testrecording_target_8k.wav") # Training loop num_epochs = 150000 for epoch in range(num_epochs): print("running") # batch = batch.to(memory_format=torch.channels_last) # target = target.to(memory_format=torch.channels_last) train = train.bfloat16() target = target.bfloat16() optimizer.zero_grad() with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16): output = model(train) loss = criterion(output, target) print(f'Epoch: {epoch+1}/{num_epochs}, Step: {epoch+1}, Loss: {loss.item()}') print("output", output.cpu()) print("target", target.cpu()) loss.backward() optimizer.step() print(f'Epoch: {epoch+1}/{num_epochs}, Step: {epoch+1}, Loss: {loss.item()}') # every few steps save the output if (epoch+1) % 50 == 0: # Save the output to file output = torch.flatten(output, start_dim=0) print(output.size()) sf.write("samples2/testrecording_8k_progress2_" + str(epoch) + ".wav", output.float().cpu().detach().numpy(), target_sample_rate)
Hi, @turbobuilt
Thank you for using IPEX.
The warning message is thrown by model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16, inplace=True)
. This interface contains most of the IPEX optimization for model. It has a argument named level
, which is default O1
. For O1, most optimization will be enabled even if the model has no such layers. For XPU, some optimizations are disabled(For CPU, they are enabled), for example, split master weight(we will support it soon), weight prepack and optimize lstm, thus there are some warning messages because these optimizations are disabled for XPU.
@gujinghui This is caused by our warning messages from ipex.optimize.
Hi, trying to run inference with a pretrained OFA (OFA-huge) model according to these instructions:
https://github.com/OFA-Sys/OFA/blob/feature/add_transformers/transformers.md
This runs fine on both CPU and CUDA but using XPU results in gibberish. I also get several warnings which go away when
model = ipex.optimize(model)
is commented out. With essentially the only change between CPU/CUDA and XPU being the.to('xpu')
part, the model still outputs gibberish.Warnings from model = ipex.optimize(model):
[' this is the ch ch chaval all the is is the word for the band that is']
^ gibberish outputWith CPU/CUDA:
[' a black and white photo of a wolf walking through the woods at night.']
^ correct outputI'm running Ubuntu 22.04 with 1.13.10+xpu, code is below:
Image:
Thanks!