[Issue]: WSL2/Intel Arc OpenCL Error -6 but lots of VRAM available

vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models

https://github.com/vladmandic/automatic

GNU Affero General Public License v3.0

5.31k stars 379 forks source link

[Issue]: WSL2/Intel Arc OpenCL Error -6 but lots of VRAM available #1474

Closed dstults closed 1 year ago

dstults commented 1 year ago

Issue Description

Fresh install of Windows 10, Intel Arc A770M drivers, WSL2, followed instructions from @Disty0 here wherever possible: https://www.technopat.net/sosyal/konu/using-stable-diffusion-webui-with-intel-arc-gpus.2593077/

The closest issue is this one but it seems different: https://github.com/vladmandic/automatic/issues/1272

Ran ./webui.sh --use-ipex and loaded the vanilla SD 1.5 model.

Web UI appears to load the model, GPU fills up about 25% of its ~16 GB.

Tried to run any simple prompt with any sampling method with everything else at default settings.

I immediately get an error:

An OpenCL error occurred: -6
Time taken: 0.41s | GPU active 4086 MB reserved 4196 MB | System peak 4085 MB total 13005 MB

The error code -6 indicates that it's out of memory according to this: https://gist.github.com/bmount/4a7144ce801e5569a0b6 , but I'm using about 25% of available memory.

Possibly related, during startup, the graphics card is detected twice and is also shown with a (2) in the web UI system information, that seems odd to me:

02:38:29-196297 INFO     Torch detected GPU: Intel(R) Graphics [0x5690] VRAM 13005
02:38:29-197034 INFO     Torch detected GPU: Intel(R) Graphics [0x5690] VRAM 13005

Also possibly related, during startup, when importing intel_extension_for_pytorch this warning is displayed:

/path/to/git/automatic/venv/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
  warn(f"Failed to load image Python extension: {e}")

Might be unrelated though, I've yet not found any information on that nor a solution.

The full error in console is this:

02:49:05-793709 ERROR    gradio call: RuntimeError
╭──────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────────────────────────────────────────────╮
│ /path/to/git/automatic/modules/call_queue.py:34 in f                                                                                                                                  │
│                                                                                                                                                                                           │
│    33 │   │   │   try:                                                                                                                                                                    │
│ ❱  34 │   │   │   │   res = func(*args, **kwargs)                                                                                                                                         │
│    35 │   │   │   │   progress.record_results(id_task, res)                                                                                                                               │
│                                                                                                                                                                                           │
│ /path/to/git/automatic/modules/txt2img.py:56 in txt2img                                                                                                                               │
│                                                                                                                                                                                           │
│   55 │   if processed is None:                                                                                                                                                            │
│ ❱ 56 │   │   processed = processing.process_images(p)                                                                                                                                     │
│   57 │   p.close()                                                                                                                                                                        │
│                                                                                                                                                                                           │
│                                                                                 ... 19 frames hidden ...                                                                                  │
│                                                                                                                                                                                           │
│ /path/to/git/automatic/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py:190 in forward                                                                             │
│                                                                                                                                                                                           │
│   189 │   def forward(self, input: Tensor) -> Tensor:                                                                                                                                     │
│ ❱ 190 │   │   return F.layer_norm(                                                                                                                                                        │
│   191 │   │   │   input, self.normalized_shape, self.weight, self.bias, self.eps)                                                                                                         │
│                                                                                                                                                                                           │
│ /path/to/git/automatic/venv/lib/python3.10/site-packages/torch/nn/functional.py:2515 in layer_norm                                                                                    │
│                                                                                                                                                                                           │
│   2514 │   │   )                                                                                                                                                                          │
│ ❱ 2515 │   return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.c                                                                                           │
│   2516                                                                                                                                                                                    │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: An OpenCL error occurred: -6

Version Platform Description

x86_64 Windows 10 Education "Serpent Canyon" NUC with A770M built-in Driver version: 31.0.101.4499 (latest) Linux subsystem: WSL2 / Ubuntu 22.04.2 LTS (latest) 5.15.90.1-microsoft-standard-WSL2 (latest)

Python 3.10.6

From the web ui:

Version
updated: 2023-06-18
hash: b340e3f4
url: https://github.com/vladmandic/automatic.git/tree/master

Torch
1.13.0a0+gitb1dde16 Autocast  half

GPU
device: Intel(R) Graphics [0x5690] (2)
ipex: 1.13.120+xpu

Memory
ram: free:27.9 used:3.24 total:31.14
gpu: free:8.71 used:3.99 total:12.7
gpu-active: current:3.99 peak:3.99
gpu-allocated: current:3.99 peak:3.99
gpu-reserved: current:4.1 peak:4.1
gpu-inactive: current:0.11 peak:0.11
events: retries:0 oom:0
utilization: 0

Libs
xformers: unavailable
accelerate: 0.20.3
transformers: 4.26.1

Repos
Stable Diffusion: [cf1d67a] 2023-03-25
Taming Transformers: [9e9981c] 2022-11-07
CodeFormer: [7a584fd] 2023-01-18
BLIP: [3a29b74] 2022-09-20
k_diffusion: [c9fe758] 2023-05-21

Device Info
active: xpu
dtype: torch.float32
vae: torch.float32
unet: torch.float32

Other possibly related info

$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.15.3.0.20_160000]
[opencl:cpu:1] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i7-12700H 3.0 [2023.15.3.0.20_160000]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x5690] 3.0 [23.13.26032.26]
[opencl:gpu:3] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x46a6] 3.0 [23.13.26032.26]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x5690] 1.3 [1.3.26032]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x46a6] 1.3 [1.3.26032]

# Checking if device "Intel(R) Graphics [0x5690]" above is correct, it seems to be:
$ cd venv
$ source bin/activate
$ python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"
1.13.0a0+gitb1dde16
1.13.120+xpu
[0]: _DeviceProperties(name='Intel(R) Graphics [0x5690]', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=0, total_memory=13004MB, max_compute_units=512)
[1]: _DeviceProperties(name='Intel(R) Graphics [0x46a6]', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=0, total_memory=26043MB, max_compute_units=96)

Acknowledgements

[X] I have read the above and searched for existing issues

Disty0 commented 1 year ago

Can you try with --device-id 0 or 1? Seems like It's trying to use your iGPU too.

dstults commented 1 year ago

Thanks for the idea (and fast response), one second...

./webui.sh --use-ipex --device-id 0
... After attempted render (same as before) ...
[WEB UI] An OpenCL error occurred: -6
[WEB UI] Time taken: 0.52s | GPU active 4086 MB reserved 4196 MB | System peak 4085 MB total 13005 MB
[System info - Memory] gpu: free:8.71 used:3.99 total:12.7  <-- this appears to be a closer-to-expected numbers
[System info - GPU] device: Intel(R) Graphics [0x5690] (2)
[CONSOLE] RuntimeError: An OpenCL error occurred: -6

./webui.sh --use-ipex --device-id 1
.. During load ...
[CONSOLE] RuntimeError: The program was built for 1 devices
[CONSOLE] Build program log for 'Intel(R) Graphics [0x46a6]':
[CONSOLE] warning: module got recompiled from IR because provided native binary is incompatible with underlying device and/or driver [-Wrecompiled-from-ir]
[CONSOLE]  -999 (Unknown PI error)
... After attempted render ...
[WEB UI] Error: model not loaded
[WEB UI] Time taken: 1.33s | GPU active 7973 MB reserved 8108 MB | System peak 0 MB total 26044 MB
[System Info - Memory] -- gpu: free:25.43 used:0.0 total:25.43  <---- this appears to include shared memory (iGPU device)
[System Info - GPU] -- device: Intel(R) Graphics [0x5690] (2)
[CONSOLE] RuntimeError: DPCPP out of memory. Tried to allocate 20.00 MiB (GPU ; 25.43 GiB total capacity; 3.99 GiB already allocated; 4.10 GiB reserved in total by PyTorch)
[CONSOLE] 17:49:47-663173 WARNING  Model not loaded
... After retrying loading a model ...
[CONSOLE] RuntimeError: DPCPP out of memory. Tried to allocate 20.00 MiB (GPU ; 25.43 GiB total capacity; 7.79 GiB already allocated; 7.92 GiB reserved in total by PyTorch)

I agree I think it's likely the iGPU getting mixed up with the dGPU and I'm not sure why.

I just tried a brand new installation from scratch on a fresh WSL2 installation. The results are almost identical to what the other user reported: https://github.com/vladmandic/automatic/issues/1272

./webui.sh --use-ipex
...
[CONSOLE] Abort was called at 718 line in file:
[CONSOLE] ./shared/source/os_interface/windows/wddm_memory_manager.cpp
[CONSOLE] Aborted

After deleting the config.json and the venv folder to see if that solution worked in my case it threw yet another error:

ModuleNotFoundError: No module named 'pytorch_lightning'

I disabled the iGPU as mentioned in that other issue, then reinstalled the repo and ran it, then it worked. (Note: Rather than disabling the iGPU in BIOS I tested instead and found that turning it off in Device Manager works as well.)

Trying to investigate the "why"...

Looking at the System Info now in the web UI, it says (note 2 changed into 1):

device: Intel(R) Graphics [0x5690] (1)
ipex: 1.13.120+xpu

I think there's still a root cause to be fixed here in code, possibly around the fact that the iGPU and the dGPU both are named "Intel(R) Graphics" -- yet in that display, before it said:

device: Intel(R) Graphics [0x5690] (2)

Maybe it should probably have said something like this (assuming it noticed that the two devices were not the same thing):

device: Intel(R) Graphics [0x5690] (1)
device: Intel(R) Graphics [0x46a6] (1)

Alternatively, perhaps iGPU's should automatically be omitted, such as in the above case?

Anyway, I'm super happy that the same workaround mentioned above works here (I will go back and check it on my first installation of WSL2) but I'd love to look into helping investigate or implement a possible code change if we think one can be found?

Disty0 commented 1 year ago

I doubt this is the cause of this but i removed torch.xpu.device step from logging. It should log with direct device IDs instead of torch.xpu.device(ID) now. https://github.com/vladmandic/automatic/commit/c02ccc4f0001a8eb9b9b5ff32522f067bbe55103

System Info tab prints the current device and then prints the total available device count so (2) is expected.

Nuullll commented 1 year ago

@dstults Can you please post your intel-opencl-icd version?

sudo apt show intel-opencl-icd
...
Version: 23.17.26241.21-647~22.04

I encountered the same OpenCL error even without an iGPU. And finally resolved the issue by upgrading the intel-opencl-icd package from version 23.05.25593.18-601~22.04 to 23.17.26241.21-647~22.04.

Although I was using a different web ui variant (https://github.com/jbaboval/stable-diffusion-webui ), I think the problem is unlikely related to the Web UI implementation :)

Nuullll commented 1 year ago

Just confirmed: the issue is gone for me with intel-opencl-icd >= 23.13.26032.26-627~22.04.

dstults commented 1 year ago

System Info tab prints the current device and then prints the total available device count so (2)

Hmm, okay then.

Can you please post your intel-opencl-icd version?

Sure thing:

sudo apt show intel-opencl-icd
...
23.13.26032.26-627~22.04

Updated:

sudo apt show intel-opencl-icd
...
23.17.26241.21-647~22.04

Testing with iGPU still disabled... ✔️ Testing with iGPU re-enabled... ❌ Same issue: Abort was called at 729 line in file: ... wddm_memory_manager.cpp

In the spirit of testing more installs and updates, I went back to the deprecated official guide (steps 4 + 5) and updated official guides for Linux/Ubuntu and tried installing everything mentioned under both of them:

I did each guide half at a time, rebooting between tests and every time the tests came out identical as above with the same error.

Testing with iGPU still enabled... ❌ Abort was called at 729 line in file: ... wddm_memory_manager.cpp (disable iGPU...) Testing with iGPU disabled... ✔️ (re-enable before installing more things...)

Despite going through both guides and installing everything potentially related to Ubuntu (not including the i386 architecture items) the results were exactly the same as when I just bare-bones installed the items in Disty0's guide.

I'm curious if you all have separate graphics cards from your motherboard? If not I'm guessing it could either be an (a) issue with driver support for motherboard-soldered-on dGPU of the Serpent Canyon NUC; edge case driver bug because it's actually a mobile ("A770M") version of the card as opposed to the full desktop A770 version; or there's some kind of interference/edge case setup (not happening) that's being broken by the "Intel Deep Link" thing installed specifically on these machines. Still open to ideas but I think we maybe can close this if nothing else comes to mind.

Disty0 commented 1 year ago

I'm curious if you all have separate graphics cards from your motherboard? If not I'm guessing it could either be an (a) issue with driver support for motherboard-soldered-on dGPU of the Serpent Canyon NUC; edge case driver bug because it's actually a mobile ("A770M") version of the card as opposed to the full desktop A770 version; or there's some kind of interference/edge case setup (not happening) that's being broken by the "Intel Deep Link" thing installed specifically on these machines. Still open to ideas but I think we maybe can close this if nothing else comes to mind.

I don't have an iGPU or an Intel CPU so i can't test these setups.

It seems like PyTorch IPEX doesn't know what your GPU is but it should work fine since it's an A770 and a DG2. Also PyTorch IPEX doesn't support any iGPU yet.

"Intel Deep Link" uses the iGPU and the dGPU at the same time. This is probably the reason why you are getting these errors.

hdtv35 commented 1 year ago

Just wanted to toss in here I have the A770 running on Ubuntu bare metal (no WSL) with no iGPU, and it does work but only with 768x768 or smaller. Trying larger sizes like 1024x1024 hits the same -6 memory error. Same issue occurs with batch sizes larger than 4, it will crash immediately with the "RuntimeError: An OpenCL error occurred: -6". Been having this issue for months but I just gave up since 512x512 still works.

Here is my info from the System Info tab: Torch 1.13.0a0+gitb1dde16 Autocast half

GPU: device: Intel(R) Arc(TM) A770 Graphics (1) ipex: 1.13.120+xpu

Platform: arch: x86_64 cpu: x86_64 system: Linux release: 6.3.5-060305-generic python: 3.10.6

Memory: ram: free:23.3 used:7.96 total:31.26 gpu: free:10.94 used:4.17 total:15.11 gpu-active: current:4.17 peak:5.41 gpu-allocated: current:4.17 peak:5.41 gpu-reserved: current:4.43 peak:5.53 gpu-inactive: current:0.26 peak:1.07 events: retries:0 oom:0 utilization: 0

Libs: xformers: unavailable accelerate: 0.18.0 transformers: 4.26.1

Repos: Stable Diffusion: [cf1d67a] 2023-03-25 Taming Transformers: [9e9981c] 2022-11-07 CodeFormer: [7a584fd] 2023-01-18 BLIP: [3a29b74] 2022-09-20 k_diffusion: [c9fe758] 2023-05-21

Device Info: active: xpu dtype: torch.float16 vae: torch.float16 unet: torch.float16

dstults commented 1 year ago

"Intel Deep Link" uses the iGPU and the dGPU at the same time. This is probably the reason why you are getting these errors.

Yeah that makes sense.

Just wanted to toss in here I have the A770 running on Ubuntu bare metal (no WSL) with no iGPU, and it does work but only with 768x768 or smaller. Trying larger sizes like 1024x1024 hits the same -6 memory error.

I'll test again on my first install WSL2 snapshot that I was running when I first created the issue. I was only testing with 512x512, but I'd be interested in seeing what happens or whether that still happens once I disable the iGPU. Will get to this later today once I'm off work. [Edit: appended below.]

Same issue occurs with batch sizes larger than 4

I have issues with batches larger than 4 as well, it's really weird, basically the computer starts to freeze and possibly even crashes -- if it does appear to finish then pictures > 4 (or sometimes > 2) per batch will be black (as in, 1st 2 or 4 pictures are normal, but the latter pictures in the batch are all black images).

Testing the original WSL2, I think it looks like it couldn't find my graphics card once I disabled the iGPU:

Error in sys.excepthook:
Traceback (most recent call last):
...
  File "/path/to/git/automatic/venv/lib/python3.10/site-packages/rich/traceback.py", line 280, in __init__
    for suppress_entity in suppress:
TypeError: 'module' object is not iterable

Original exception was:
Traceback (most recent call last):
...
  File "/path/to/git/automatic/venv/lib/python3.10/site-packages/intel_extension_for_pytorch/xpu/__init__.py", line 169, in get_device_properties
    if device < 0 or device >= device_count():
TypeError: '<' not supported between instances of 'NoneType' and 'int'
Traceback (most recent call last):
...
  File "/path/to/git/automatic/venv/lib/python3.10/site-packages/intel_extension_for_pytorch/cpu/launch.py", line 542, in launch
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd_s)
subprocess.CalledProcessError: Command 'taskset -c 0-9 /path/to/git/automatic/venv/bin/python3 -u launch.py --use-ipex' returned non-zero exit status 1.

At least that's my guess from this line: if device < 0 or device >= device_count(): TypeError: '<' not supported between instances of 'NoneType' and 'int'

Anyway, I'm going to remove that again, I'm pretty sure it was broken from trying to follow a variety of conflicting/out-of-date instructions (such as manually installing/building packages straight from GitHub -- then trying to manually uninstall when they didn't work) before finally landing on the good ones that rely mostly on just installing via apt.

Nuullll commented 1 year ago

JFYI, I just made a docker image for Intel IPEX + automatic environment: https://github.com/Nuullll/ipex-sd-docker-for-arc-gpu Maybe this could be another option for you :-)

Disty0 commented 1 year ago

WSL is known to have issues. Update intel-opencl-icd only or update the whole Ubuntu image with: apt update && apt upgrade -y

I've never encountered an OpenCL issue with Linux and batch sizes above 4 have no issues.

Bath Size 32 with forked version of @Nuullll's docker image on Linux: Note: Using Sub-quadratic instead of InvokeAI's optimizations. https://github.com/Disty0/docker-sd-webui-ipex

hdtv35 commented 1 year ago

WSL is known to have issues. Update intel-opencl-icd only or update the whole Ubuntu image with: apt update && apt upgrade -y

I've never encountered an OpenCL issue with Linux and batch sizes above 4 have no issues.

Bath Size 32 with forked version of @Nuullll's docker image on Linux: Note: Using Sub-quadratic instead of InvokeAI's optimizations. https://github.com/Disty0/docker-sd-webui-ipex

Thank you, using the docker image did work for me!

dstults commented 1 year ago

Using Sub-quadratic instead of InvokeAI's optimizations

How do you do that?

I've never encountered an OpenCL issue with Linux and batch sizes above 4 have no issues.

If you're curious or able to identify it, this is what it looks like:

It happens regardless of the VAE or model I use -- but that's an entirely different issue.

a docker image for Intel IPEX + automatic environment: https://github.com/Nuullll/ipex-sd-docker-for-arc-gpu

Batch Size 32 with forked version of @Nuullll's docker image on Linux: https://github.com/Disty0/docker-sd-webui-ipex

As the docker images have not yet been tagged with latest [1] [2], I used versions v0.1 and 0.10 respectively.

Due to having static names and reloading every test, I added --rm to the docker exec commands to remove the container after launch.

❗ Regarding the PowerShell commands, of the three Linux installs (Ubuntu/Docker/Docker-data) none of them had a /dev/dxg folder, however I reconfirmed that running in native WSL2 with the vanilla automatic clone was indeed using the GPU.

For the reasons mentioned above, I am testing using the following syntax:

docker run -it --rm `
  --device /dev/dri `
  -v /usr/lib/wsl:/usr/lib/wsl `
  -v C:\AI\docker\sd-webui-nuullll:/sd-webui `
  -v C:\AI\docker\docker-mount\deps-nuullll:/deps `
  -p 7860:7860 `
  --name sd-server `
  nuullll/ipex-arc-sd:v0.1

docker run -it --rm `
  --device /dev/dri `
  -v /usr/lib/wsl:/usr/lib/wsl `
  -v C:\AI\docker\sd-webui-disty0:/sd-webui `
  -v C:\AI\docker\deps-disty0:/deps `
  -p 7860:7860 `
  --name sd-server `
  disty0/sd-webui-ipex:0.10

Ironically, testing the docker containers, I got exactly the same errors from my previous post on my "broken" WSL2 install, in both iGPU enabled or disabled tests, from PowerShell, the results were like this:

03:20:29-402748 INFO     Extension preload: 0.1s /sd-webui/extensions-builtin
03:20:29-410736 INFO     Extension preload: 0.0s /sd-webui/extensions
03:20:29-413018 INFO     Server arguments: ['--use-ipex', '--listen']
No module 'xformers'. Proceeding without it.
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/rich/traceback.py", line 103, in excepthook
    Traceback.from_exception(
  File "/usr/local/lib/python3.10/dist-packages/rich/traceback.py", line 346, in from_exception
    return cls(
  File "/usr/local/lib/python3.10/dist-packages/rich/traceback.py", line 280, in __init__
    for suppress_entity in suppress:
TypeError: 'module' object is not iterable

Original exception was:
Traceback (most recent call last):
  File "/sd-webui/./launch.py", line 189, in <module>
    instance = start_server(immediate=True, server=None)
  File "/sd-webui/./launch.py", line 136, in start_server
    module_spec.loader.exec_module(server)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/sd-webui/webui.py", line 43, in <module>
    from modules.call_queue import queue_lock, wrap_queued_call, wrap_gradio_gpu_call # pylint: disable=W0611,C0411,C0412
  File "/sd-webui/modules/call_queue.py", line 8, in <module>
    from modules import shared, progress, errors
  File "/sd-webui/modules/shared.py", line 693, in <module>
    mem_mon = modules.memmon.MemUsageMonitor("MemMon", device, opts)
  File "/sd-webui/modules/memmon.py", line 26, in __init__
    self.cuda_mem_get_info()
  File "/sd-webui/modules/memmon.py", line 39, in cuda_mem_get_info
    return [(torch.xpu.get_device_properties(self.device).total_memory - torch.xpu.memory_allocated()), torch.xpu.get_device_properties(self.device).total_memory]
  File "/usr/local/lib/python3.10/dist-packages/intel_extension_for_pytorch/xpu/__init__.py", line 169, in get_device_properties
    if device < 0 or device >= device_count():
TypeError: '<' not supported between instances of 'NoneType' and 'int'

However, running docker from WSL2 got even less far. Reran it several times and got the same result.

docker run -it --rm \
  --device /dev/dri \
  -v /path/to/docker-mount/sd-webui:/sd-webui \
  -v /path/to/docker-mount/deps:/deps \
  -p 7860:7860 \
  --name sd-server \
  disty0/sd-webui-ipex:0.10
...
04:05:27-957352 INFO     Server arguments: ['--use-ipex', '--listen']
╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ /sd-webui/automatic/launch.py:189 in <module>                                                                        │
│                                                                                                                      │
│   188                                                                                                                │
│ ❱ 189     instance = start_server(immediate=True, server=None)                                                       │
│   190     while True:                                                                                                │
│                                                                                                                      │
│ /sd-webui/automatic/launch.py:136 in start_server                                                                    │
│                                                                                                                      │
│   135     installer.log.info(f"Server arguments: {sys.argv[1:]}")                                                    │
│ ❱ 136     module_spec.loader.exec_module(server)                                                                     │
│   137     if args.test:                                                                                              │
│ in exec_module:883                                                                                                   │
│ in _call_with_frames_removed:241                                                                                     │
│                                                                                                                      │
│ /sd-webui/automatic/webui.py:23 in <module>                                                                          │
│                                                                                                                      │
│    22 import torchvision # pylint: disable=W0611,C0411                                                               │
│ ❱  23 import pytorch_lightning # pytorch_lightning should be imported after torch, but it re-e                       │
│    24 if ".dev" in torch.__version__ or "+git" in torch.__version__:                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ModuleNotFoundError: No module named 'pytorch_lightning'

^^ I tried cding into the venv and pip3 installing the xpu pytorch module and rerunning but I still got the same result.

Finally to make sure my hardware didn't get borked while doing everything, I went back to the vanilla automatic repo and was able to successfully run again. ✔️

I tried force-reinstalling the NUC-specific graphics drivers one more time (downgrading the current ones to 31.0.101.5959). The Linux distros did not have a /dev/dxg directory appear (even tried another update/upgrade). The above tests had the same results.

The lack of the dxg directory while it still able to use the /dev/dri folder for GPU acceleration via WSL2 again makes me think this is a NUC-A770M-only edge case problem.

I'm contemplating reinstalling raw Ubuntu and giving it a shot...but that might have to wait for me to order a new disk.

whchan05 commented 1 year ago

Using Sub-quadratic instead of InvokeAI's optimizations

How do you do that?

You can select in Settings.

I've never encountered an OpenCL issue with Linux and batch sizes above 4 have no issues.

If you're curious or able to identify it, this is what it looks like:

It happens regardless of the VAE or model I use -- but that's an entirely different issue.

I have those issues too. Those belts seem to appear when prompts is too short.

Disty0 commented 1 year ago

Using Sub-quadratic instead of InvokeAI's optimizations How do you do that?

From settings. (Needs restarting or reloading the model to be applied.)

If you're curious or able to identify it, this is what it looks like: It happens regardless of the VAE or model I use -- but that's an entirely different issue.

I got that with UniPC now too. As i stated in my guide, UniPC can be unstable with ARC, try another sampler. Also this doesn't go away until restarting the webui or reloading the model if you have used UniPC.

Ironically, testing the docker containers, I got exactly the same errors from my previous post on my "broken" WSL2 install, in both iGPU enabled or disabled tests, from PowerShell, the results were like this:

This commit should fix this issue: https://github.com/vladmandic/automatic/commit/4f722289cab5c989dc836b8674511d5fa50d6547

As the docker images have not yet been tagged with latest [1] [2], I used versions v0.1 and 0.10 respectively.

Pushed the same image as latest too.

The lack of the dxg directory while it still able to use the /dev/dri folder for GPU acceleration via WSL2 again makes me think this is a NUC-A770M-only edge case problem.

PyTorch IPEX doesn't know what your GPU is since it prints the device ID instead of the name. But i don't see any reason why it won't work since it's just another A770.

The lack of the dxg directory...

I don't use Windows so I copied the @Nuullll's steps for Windows.

dstults commented 1 year ago

Most excellent! I'm getting batches of 16 now!

So apparently in my case, it wasn't only the UniPC setting, it was both the UniPC OR the Invoke AI optimizer (that's why I used the UniPC out of laziness -- it was the default and changing it previously didn't seem to fix anything before). What else is interesting is that the "pixel belt" vs. the "solarized" image swapped places depending on which one was in effect (and taking precedence, InvokeAI if enabled put the belt on the bottom whether or not UniPC was used).

Thank you for everything and helping with multiple issues.

This commit should fix this issue: https://github.com/vladmandic/automatic/commit/4f722289cab5c989dc836b8674511d5fa50d6547 I don't use Windows so I copied the @Nuullll's steps for Windows.

I git fetched/pulled on that recent release and reran the docker with both iGPU enabled and disabled again. Got a new error that is the same in both cases:

10:44:03-406265 INFO     Extension preload: 0.1s /sd-webui/automatic/extensions-builtin
10:44:03-412374 INFO     Extension preload: 0.0s /sd-webui/automatic/extensions
10:44:03-414577 INFO     Server arguments: ['--use-ipex', '--listen']
No module 'xformers'. Proceeding without it.
Error in sys.excepthook:
Traceback (most recent call last):
  File "/deps/venv/lib/python3.10/site-packages/rich/traceback.py", line 103, in excepthook
    Traceback.from_exception(
  File "/deps/venv/lib/python3.10/site-packages/rich/traceback.py", line 346, in from_exception
    return cls(
  File "/deps/venv/lib/python3.10/site-packages/rich/traceback.py", line 280, in __init__
    for suppress_entity in suppress:
TypeError: 'module' object is not iterable

Original exception was:
Traceback (most recent call last):
  File "/sd-webui/automatic/launch.py", line 189, in <module>
    instance = start_server(immediate=True, server=None)
  File "/sd-webui/automatic/launch.py", line 136, in start_server
    module_spec.loader.exec_module(server)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/sd-webui/automatic/webui.py", line 43, in <module>
    from modules.call_queue import queue_lock, wrap_queued_call, wrap_gradio_gpu_call # pylint: disable=W0611,C0411,C0412
  File "/sd-webui/automatic/modules/call_queue.py", line 8, in <module>
    from modules import shared, progress, errors
  File "/sd-webui/automatic/modules/shared.py", line 693, in <module>
    mem_mon = modules.memmon.MemUsageMonitor("MemMon", device, opts)
  File "/sd-webui/automatic/modules/memmon.py", line 26, in __init__
    self.cuda_mem_get_info()
  File "/sd-webui/automatic/modules/memmon.py", line 40, in cuda_mem_get_info
    return [(torch.xpu.get_device_properties(index).total_memory - torch.xpu.memory_allocated(index)), torch.xpu.get_device_properties(index).total_memory]
  File "/deps/venv/lib/python3.10/site-packages/intel_extension_for_pytorch/xpu/__init__.py", line 170, in get_device_properties
    raise AssertionError("Invalid device id")
AssertionError: Invalid device id
Traceback (most recent call last):
  File "/deps/venv/bin/ipexrun", line 8, in <module>
    sys.exit(main())
  File "/deps/venv/lib/python3.10/site-packages/intel_extension_for_pytorch/cpu/launch.py", line 880, in main
    launcher.launch(args)
  File "/deps/venv/lib/python3.10/site-packages/intel_extension_for_pytorch/cpu/launch.py", line 542, in launch
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd_s)
subprocess.CalledProcessError: Command 'taskset -c 0-9 /deps/venv/bin/python3 -u launch.py --use-ipex --listen' returned non-zero exit status 1.

I'm more than happy to keep chasing these, but also I'm super happy that I got everything working at full steam on the native WSL2 build if you'd like to wrap this up here. Thanks again in advance.

Disty0 commented 1 year ago

...OR the Invoke AI optimizer...

Bad results with InvokeAI is pretty repeatable on my end too. InvokeAI was the default on IPEX since it was the fastest one. We can change it to Sub-quad by default (and lose a little bit of performance).

I git fetched/pulled on that recent release and reran the docker with both iGPU enabled and disabled again. Got a new error that is the same in both cases:

I don't think docker has access to your GPU. Does it print the GPU or does it print Torch failed error?

Normally it should print something like this: But the Docker image doesn't have icpx in it and it will fail to print DPC++ info, so ignore the icpx not found error.

Docker:

Note: Warnings above the Starting SD.Next are mostly harmless.

vladmandic commented 1 year ago

We can change it to Sub-quad by default (and lose a little bit of performance).

Let me know if you decide to do so.

dstults commented 1 year ago

We can change it to Sub-quad by default (and lose a little bit of performance).

I think a safer approach for a first possibly change might be to: (1) Display a console warning if --use-ipex and InvokeAI are both detected on launch; and (2) in the case of --use-ipex being used/detected, not defaulting to "UniPC" (maybe either Euler a or DPM++ 2M Karras instead).

If for any reason (2) isn't easy to implement on the server side I'd love to take a look at knocking out [Feature]: Ability to save current UI settings #909.

hdtv35 commented 1 year ago

We can change it to Sub-quad by default (and lose a little bit of performance).

I think a safer approach for a first possibly change might be to: (1) Display a console warning if --use-ipex and InvokeAI are both detected on launch; and (2) in the case of --use-ipex being used/detected, not defaulting to "UniPC" (maybe either Euler a or DPM++ 2M Karras instead).

If for any reason (2) isn't easy to implement on the server side I'd love to take a look at knocking out [Feature]: Ability to save current UI settings #909.

Sorry if I'm misunderstanding, but can't those options be saved by editing the ui-config.json file? For instance, I have the line changed to: "txt2img/Sampling method/value": "Euler a",

Disty0 commented 1 year ago

I think a safer approach for a first possibly change might be to: (1) Display a console warning if --use-ipex and InvokeAI are both detected on launch;

Default optimization method is already set by the backend in use so displaying a warning message won't be useful imo. Also users have to use sub-quad if they want to generate anything above 1024x1024 and this is the reason why i use sub-quad almost all of the time. I think it would be better to use sub-quad by default on IPEX and lose a little bit of performance.

These are the it/s speeds at 768x768 (I am CPU bottlenecked at 512x512): InvokeAI: 2.8 it/s Sub-quad: 2.6 it/s

Also ROCM (AMD) and DirectML uses sub-quad as default:

And (2) in the case of --use-ipex being used/detected, not defaulting to "UniPC" (maybe either Euler a or DPM++ 2M Karras instead).

I agree with this but UniPC is much more stable than when i first wrote the guide and said do not use UniPC. Originally most of the stability issues are from broken torch.linalg.solve function on IPEX. Running torch.linalg.solve on the GPU was crashing the GPU so torch.linalg.solve is runninng on the CPU as a workaround now.

vladmandic commented 1 year ago

no issues with changing default cross-optimization for ipex, but i wouldn't like to have separate default sampler for ipex only.

Nuullll commented 1 year ago

The python dependencies issues, like

ModuleNotFoundError: No module named 'pytorch_lightning'

TypeError: 'module' object is not iterable

are probably caused by a "half-initialized" python environment under the SD.Next working directory, which is common when one is testing docker containers..

See https://github.com/vladmandic/automatic/blob/4867dafada92d28fbec9478f69cb1484bb8afa50/launch.py#L169-L177

If --reinstall is not specified, SD.Next will check the timestamp of sdnext.log and perform a quick launch if possible. Removing sdnext.log before launch or specifying --reinstall both would work.

Disty0 commented 1 year ago

Changed the default optimizer for ipex to sub-quad @vladmandic https://github.com/vladmandic/automatic/commit/102503a3a4f48160d81cd65a57ac09be149fbcf1

Also closing this issue since OpenCL Error -6 is caused by outdated intel-opencl-icd drivers. Updating the package manually or updating the system should fix this issue.