Closed ConfusedMerlin closed 11 months ago
Also, there is no setup.log, as far as I can remember
because its called sdnext.log
- that's noted in issue template
otherwise, i agree, this seems like its using cpu version of torch. i'd need to see the logs why its doing that.
Also, there is no setup.log, as far as I can remember
because its called
sdnext.log
- that's noted in issue templateotherwise, i agree, this seems like its using cpu version of torch. i'd need to see the logs why its doing that.
@vladmandic , sdnext.log I saw; I will delete it and venv and call webui.sh again then, and post it (and the shell output) in here.... in about 8 hours :)
Until then, do you happen to know a verified way to test if the gpu is actually ready in use? I mean, I can activate kernel modules for stuff I don't even have in some cases, so that should be done too.
Until then, do you happen to know a verified way to test if the gpu is actually ready in use?
you should have rocm-smi
utility that can show gpu utilization, power draw, etc.
one idea - do you have cpu on-board gpu? it could be that rocm is used, but its using one on your cpu and thats slow.
if yes, try using --device-id
param
@vladmandic logs added. nope, gpu and cpu are separated entities; I recently replaced an old rx580 (which was fried by KSP 2) with that rx7600 that now is running. Looking back, I should have left the AMD fan corner for nvidia, but I didn't know about rocm/cuda/openml and the likes back then...
anyway. rocm-smi... I do not have. There seems to be a git repo for that... should I try to build it? https://github.com/RadeonOpenCompute/rocm_smi_lib
But there is... rocminfo? that is kind of interesting; it lists the "HSA agent" entries available on my system, of which two seem to be my CPU and one is the GPU:
OCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen Threadripper 2920X 12-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen Threadripper 2920X 12-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3500
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 16312948(0xf8ea74) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16312948(0xf8ea74) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16312948(0xf8ea74) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: AMD Ryzen Threadripper 2920X 12-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen Threadripper 2920X 12-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 1
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3500
BDFID: 0
Internal Node ID: 1
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 16444956(0xfaee1c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16444956(0xfaee1c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16444956(0xfaee1c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 3
*******
Name: gfx1102
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 7600
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 2
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 2048(0x800) KB
Chip ID: 29824(0x7480)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2900
BDFID: 17408
Internal Node ID: 2
Compute Unit: 32
SIMDs per CU: 2
Shader Engines: 4
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1102
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
EDiT: radeontop works; even if its does not know the cards name or chip, it is able to draw gpu and vram usage... now, next check, can it do rocm. lets see, if I can find something to test that.
quick look at the log doesn't show any issues with torch - it seems to be installed and initialized correctly and it does detect rocm - i needed to verify that.
anyhow, i've used rocm-smi before, but if radeontop works, its good enough.
this rocminfo output is strange, why is it listing threadripper (two instances) before gpu - that might as well be throwing rocm off so its using first-available and that happens to be cpu, not gpu. you can try
--use-rocm --device-id 0
and try chaning 0 to 1 or 2 - those are ids.
another two things to try are:I looked up the 5.6 rocm page: https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html it says, my gpu isn't officially supported on linux (but windows). And... it does look kind of official for me?
Also, the page listed some installation hints I didn't try out yet (like "amdgpu-install --usecase=rocm", which installed a lot of new libraries).
now rocm-smi works also:
======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 33.0c 2.0W 226Mhz 96Mhz 0% auto 140.0W 6% 1%
================================================================================
============================= End of ROCm SMI Log ==============================
So... lets try --use-rocm --device-id 2... nope. id 1.... nope. id 0.... nope. I just discovered that you can see if it did work or not in the console output:
Model load finished: {'ram': {'used': 9.29, 'total': 31.24}} cached=0
no consumer graphic card brings 32GB of ram with it; that is my system RAM. it was printed there every time. I guess, if it would take the gpu, it should show... 8GB.
I added the IF section to my webui.sh... and it still does not like the GPU. And of course, it does not like --use-directML (well, that is Windows).
I am a bit confused. Next thing to do... trying out your web ui on windows. Yes, same rig, another hd. over there, at least directml may work.
Until later
btw, what does system info tab show?
interestingly, windows behaves mostly the same. If I call --use-rocm, it starts, but uses the cpu. But it is at least so honest to put that in the shell log (using cpu backend). directML dies there too, with mostly the same error as it does here in Linux.
with... --use-rocm, it looks like this
it just feels like the whole thing does not want to use any gpu at all...
It shows rocm detected correctly, but it also shows backend as cpu which means at one point fallback happened.
I'll probably need to spray a bit more debug messages to see where and why the fallback happens.
there is a --debug flag for the webui.sh, which I did not use yet (depite it being placed at the main page here):
$ ./webui.sh --use-rocm --debug
Create and activate python venv
Launching launch.py...
19:32:19-978228 INFO Starting SD.Next
19:32:19-981486 INFO Python 3.10.12 on Linux
19:32:19-986127 INFO Version: 417ef540 Tue Aug 8 12:05:30 2023 -0400
19:32:20-266253 DEBUG Setting environment tuning
19:32:20-268263 DEBUG Torch overrides: cuda=False rocm=True ipex=False
diml=False
19:32:20-270057 DEBUG Torch allowed: cuda=False rocm=True ipex=False
diml=False
19:32:20-271806 INFO AMD ROCm toolkit detected
19:32:20-287477 WARNING Modified files: ['webui.sh']
19:32:20-293218 DEBUG Repository update time: Tue Aug 8 18:05:30 2023
19:32:20-294889 DEBUG Previous setup time: Wed Aug 9 16:17:51 2023
19:32:20-296419 INFO Disabled extensions: []
19:32:20-297927 INFO Enabled extensions-builtin:
['multidiffusion-upscaler-for-automatic1111',
'stable-diffusion-webui-rembg', 'LDSR', 'Lora',
'stable-diffusion-webui-images-browser',
'sd-webui-controlnet', 'ScuNET',
'sd-webui-agent-scheduler', 'sd-extension-system-info',
'sd-dynamic-thresholding', 'clip-interrogator-ext',
'SwinIR', 'a1111-sd-webui-lycoris']
19:32:20-301770 INFO Enabled extensions: []
19:32:20-303200 DEBUG Latest extensions time: Wed Aug 9 16:17:24 2023
19:32:20-304554 DEBUG Timestamps: version:1691510730 setup:1691590671
extension:1691590644
19:32:20-305403 INFO No changes detected: Quick launch active
19:32:20-306112 INFO Verifying requirements
19:32:20-320393 INFO Disabled extensions: []
19:32:20-321302 INFO Enabled extensions-builtin:
['multidiffusion-upscaler-for-automatic1111',
'stable-diffusion-webui-rembg', 'LDSR', 'Lora',
'stable-diffusion-webui-images-browser',
'sd-webui-controlnet', 'ScuNET',
'sd-webui-agent-scheduler', 'sd-extension-system-info',
'sd-dynamic-thresholding', 'clip-interrogator-ext',
'SwinIR', 'a1111-sd-webui-lycoris']
19:32:20-323184 INFO Enabled extensions: []
19:32:20-325969 INFO Extension preload: 0.0s
/opt/ai/automatic/extensions-builtin
19:32:20-327403 INFO Extension preload: 0.0s /opt/ai/automatic/extensions
19:32:20-339301 DEBUG Memory used: 0.04 total: 31.24 Collected 0
19:32:20-340488 DEBUG Starting module: <module 'webui' from
'/opt/ai/automatic/webui.py'>
19:32:20-341366 INFO Server arguments: ['--use-rocm', '--debug']
19:32:20-365802 DEBUG Loading Torch
19:32:24-103673 DEBUG Loading Gradio
19:32:24-602334 DEBUG Loading Modules
No module 'xformers'. Proceeding without it.
19:32:25-408084 DEBUG Reading: /opt/ai/automatic/config.json len=295
19:32:25-409865 INFO Pipeline: Backend.ORIGINAL
19:32:25-410877 DEBUG Loaded styles: /opt/ai/automatic/styles.csv 0
19:32:25-730555 INFO Libraries loaded
19:32:25-731842 DEBUG Entering start sequence
19:32:25-739928 DEBUG Version: {'app': 'sd.next', 'updated': '2023-08-08',
'hash': '417ef540', 'url':
'https://github.com/vladmandic/automatic/tree/master'}
19:32:25-742381 INFO Using data path: /opt/ai/automatic
19:32:25-744220 DEBUG Event loop: <_UnixSelectorEventLoop running=False
closed=False debug=False>
19:32:25-745951 DEBUG Entering initialize
19:32:25-747101 DEBUG Available samplers: ['UniPC', 'DDIM', 'PLMS', 'Euler
a', 'Euler', 'DPM++ 2S a', 'DPM++ 2S a Karras', 'DPM++
2M', 'DPM++ 2M Karras', 'DPM++ SDE', 'DPM++ SDE
Karras', 'DPM++ 2M SDE', 'DPM++ 2M SDE Karras', 'DPM
fast', 'DPM adaptive', 'DPM2', 'DPM2 Karras', 'DPM2 a',
'DPM2 a Karras', 'LMS', 'LMS Karras', 'Heun']
19:32:25-750649 INFO Available VAEs: /opt/ai/automatic/models/VAE 0
19:32:25-752687 DEBUG Reading: /opt/ai/automatic/cache.json len=1
19:32:25-754158 DEBUG Reading: /opt/ai/automatic/metadata.json len=1
19:32:25-755477 INFO Available models:
/opt/ai/automatic/models/Stable-diffusion 1
19:32:25-782980 DEBUG Loading scripts
19:32:27-490352 INFO ControlNet v1.1.234
ControlNet v1.1.234
ControlNet preprocessor location: /opt/ai/automatic/extensions-builtin/sd-webui-controlnet/annotator/downloads
19:32:27-684530 INFO ControlNet v1.1.234
ControlNet v1.1.234
19:32:28-555204 DEBUG Scripts load: ['a1111-sd-webui-lycoris:0.58s',
'clip-interrogator-ext:0.061s', 'LDSR:0.057s',
'Lora:0.332s', 'sd-dynamic-thresholding:0.056s',
'sd-extension-system-info:0.113s',
'sd-webui-agent-scheduler:0.372s',
'sd-webui-controlnet:0.325s',
'stable-diffusion-webui-images-browser:0.121s',
'stable-diffusion-webui-rembg:0.623s', 'SwinIR:0.061s',
'ScuNET:0.062s']
Scripts load: ['a1111-sd-webui-lycoris:0.58s', 'clip-interrogator-ext:0.061s', 'LDSR:0.057s', 'Lora:0.332s', 'sd-dynamic-thresholding:0.056s', 'sd-extension-system-info:0.113s', 'sd-webui-agent-scheduler:0.372s', 'sd-webui-controlnet:0.325s', 'stable-diffusion-webui-images-browser:0.121s', 'stable-diffusion-webui-rembg:0.623s', 'SwinIR:0.061s', 'ScuNET:0.062s']
19:32:28-685202 INFO Loading UI theme: name=black-orange style=Auto
19:32:28-688421 DEBUG Creating UI
19:32:28-692934 DEBUG Reading: /opt/ai/automatic/ui-config.json len=0
19:32:28-722115 DEBUG Extra networks: checkpoints items=1 subdirs=0
19:32:28-767777 DEBUG UI interface: tab=txt2img batch=False seed=False
advanced=False second_pass=False
19:32:28-877445 DEBUG UI interface: tab=img2img seed=False resize=False
batch=False denoise=True advanced=False
19:32:28-957262 DEBUG Reading: /opt/ai/automatic/ui-config.json len=0
19:32:29-661055 DEBUG Script: 0.53s ui_tabs
/opt/ai/automatic/extensions-builtin/stable-diffusion-w
ebui-images-browser/scripts/image_browser.py
19:32:29-663900 DEBUG Extensions list failed to load:
/opt/ai/automatic/html/extensions.json
19:32:29-749482 DEBUG Extension list refresh: processed=13 installed=13
enabled=13 disabled=0 visible=13 hidden=0
Running on local URL: http://127.0.0.1:7860
19:32:30-065888 INFO Local URL: http://127.0.0.1:7860/
19:32:30-067920 DEBUG Gradio registered functions: 1852
19:32:30-069280 INFO Initializing middleware
19:32:30-074981 DEBUG Creating API
19:32:30-222233 INFO [AgentScheduler] Task queue is empty
19:32:30-223304 INFO [AgentScheduler] Registering APIs
19:32:30-351779 DEBUG Scripts setup: ['Tiled Diffusion:0.023s',
'ControlNet:0.041s', 'Alternative:0.009s']
19:32:30-355006 DEBUG Scripts components: []
19:32:30-355720 DEBUG Model metadata: /opt/ai/automatic/metadata.json no
changes
19:32:30-362512 DEBUG Select checkpoint: model
v1-5-pruned-emaonly.safetensors [6ce0161689]
19:32:30-365126 DEBUG Load model weights: existing=False
target=/opt/ai/automatic/models/Stable-diffusion/v1-5-p
runed-emaonly.safetensors info=None
19:32:30-684233 DEBUG gc: collected=10213 device=cpu {'ram': {'used': 1.31,
'total': 31.24}}
Loading weights: /opt/ai/automatic/models/Stable-diffusion/v1-5-pruned-emaonly…
19:32:30-861365 DEBUG Load model:
name=/opt/ai/automatic/models/Stable-diffusion/v1-5-pru
ned-emaonly.safetensors dict=True
19:32:30-862395 DEBUG Verifying Torch settings
19:32:30-863051 INFO Torch override dtype: no-half set
19:32:30-863733 INFO Torch override VAE dtype: no-half set
19:32:30-864393 DEBUG Desired Torch parameters: dtype=FP32 no-half=True
no-half-vae=True upscast=True
19:32:30-865284 INFO Setting Torch parameters: dtype=torch.float32
vae=torch.float32 unet=torch.float32
19:32:30-866153 DEBUG Torch default device: cpu
19:32:30-867939 DEBUG Model dict loaded: {'ram': {'used': 1.34, 'total':
31.24}}
19:32:30-882125 DEBUG Model config loaded: {'ram': {'used': 1.34, 'total':
31.24}}
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
19:32:31-344983 DEBUG Model created from config:
/opt/ai/automatic/configs/v1-inference.yaml
19:32:31-347451 DEBUG Model weights loading: {'ram': {'used': 2.31, 'total':
31.24}}
19:32:32-142593 DEBUG Model weights loaded: {'ram': {'used': 9.28, 'total':
31.24}}
19:32:32-152908 DEBUG Model weights moved: {'ram': {'used': 9.28, 'total':
31.24}}
19:32:32-161406 INFO Applying Doggettx cross attention optimization
19:32:32-167383 INFO Embeddings: loaded=0 skipped=0
19:32:32-174052 INFO Model loaded in 1.5s (load=0.2s create=0.5s apply=0.8s)
19:32:32-482288 DEBUG gc: collected=24 device=cpu {'ram': {'used': 9.29,
'total': 31.24}}
19:32:32-484533 INFO Model load finished: {'ram': {'used': 9.29, 'total':
31.24}} cached=0
19:32:32-946570 DEBUG gc: collected=0 device=cpu {'ram': {'used': 5.32,
'total': 31.24}}
19:32:32-948283 INFO Startup time: 12.6s (torch=3.7s gradio=0.5s
libraries=1.1s scripts=2.8s onchange=0.1s
ui-txt2img=0.1s ui-img2img=0.1s ui-settings=0.1s
ui-extensions=0.7s ui-defaults=0.1s launch=0.2s
app-started=0.3s checkpoint=2.6s)
I cannot see anything error in it; only the 32GB RAM shows up more early, and it openly says "cpu" in some of the gc entries.
run a simple test from inside venv
:
python -c "import torch; print(torch.version, torch.cuda.is_available(), getattr(torch.version, 'cuda'), getattr(torch.version, 'hip'))"
okay, into the cloned repo, into its venv folder.... this went... surprisingly not as expected (neither python nor python3.10):
python3.10 -c "import torch; print(torch.version, torch.cuda.is_available(), getattr(torch.version, 'cuda'), getattr(torch.version, 'hip'))"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'torch'
EDiT: I did pip3 install torch after that. It was quite busy installing a metric ton of additional packages. Aren't they part of the webui.sh installation? Okay, dumb question in light of my first result.
Now it says.... this (not much better)
python3.10 -c "import torch; print(torch.version, torch.cuda.is_available(), getattr(torch.version, 'cuda'), getattr(torch.version, 'hip'))"
<module 'torch.version' from '/home/rk/.local/lib/python3.10/site-packages/torch/version.py'> False 11.7 None
okay, into the cloned repo, into its venv folder....
you don't do cd venv
, you activate venv so it becomes an active context.
something like venv/scripts/activate
(check exact names, not in front of active install now to check)
ah... my bad; I rarely use python venv (well, I let pycharm handle that usually)
python -c "import torch; print(torch.version, torch.cuda.is_available(), getattr(torch.version, 'cuda'), getattr(torch.version, 'hip'))"
<module 'torch.version' from '/opt/ai/automatic/venv/lib/python3.10/site-packages/torch/version.py'> False None 5.4.22803-474e8620
This is quite confusing... I am kind of sure that your software isn't to blame, but my installation is... well... messed up beyond usability. I am not brave enough to allow a dist-ugrade, because then the graphic card driver will fail to install again.... not that this would make this big of a difference.
Should I go for 22.04 and see what happens?
this test clearly shows that torch-rocm is correctly installed and it detects rocm libs, but it doesn't detect actual gpu. which is very weird since your rocminfo does detect gpu.
if it were me, i'd go for ubuntu 22.04. and you might as well install torch with rocm 5.6 instead of default 5.4. and set correct HSA_OVERRIDE_GFX_VERSION (from my first post).
nope... I pip installed rocm 5.6 inside the venv, added that gfx=1102 switch case to the webui.sh and tried again, with the same result.
Well then, up to 22.04 I go. I bet the gpu drivers will fail to install yet again.
And... done. And guess what? The same kind of error as before. All seems fine, until you are in the webui, where it only likes my gpu
hm, this time the amdgpu-install did not fail. I guess that is because I didn't get the 6.x kernel like when I installed 22.04 directly, but still have the 5.19. Another thing I noticed... I get my rocm stuff from here:
https://repo.radeon.com/rocm/apt/5.5.3
webui.sh installs that
https://download.pytorch.org/whl/rocm5.4.2
I wondered if I should adjust that, but then I decided to check out the URL from that indes-url parameter: Installing package: torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/rocm5.4.2
Turns out, https://download.pytorch.org/whl/rocm5.4.2 points to an access denied xml? Is that supposed to happen, or does the installer.py somehow authenticate there?
EDiT: Looked up the torch+rocm secion at the same page... no +5.5.3 rocm there yet. I smell new problems.... which only would come into play, if my stupid card would get recognized at all.
hm, this time the amdgpu-install did not fail. I guess that is because I didn't get the 6.x kernel like when I installed 22.04 directly, but still have the 5.19. Another thing I noticed... I get my rocm stuff from here:
https://repo.radeon.com/rocm/apt/5.5.3
webui.sh installs that
https://download.pytorch.org/whl/rocm5.4.2
I wondered if I should adjust that, but then I decided to check out the URL from that indes-url parameter: Installing package: torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/rocm5.4.2
Turns out, https://download.pytorch.org/whl/rocm5.4.2 points to an access denied xml? Is that supposed to happen, or does the installer.py somehow authenticate there?
EDiT: Looked up the torch+rocm secion at the same page... no +5.5.3 rocm there yet. I smell new problems.... which only would come into play, if my stupid card would get recognized at all.
as far as I'm aware you aren't able to access any of the download directories that way in your browser, its intended for install via commandline.
regarding the no +5.5.3 version, that wont matter. if you REALLY want you can install the torch 2.1.0 nightly + rocm 5.6, add export TORCH_COMMAND="torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.6" to your webui-user.sh, also I'd recommend adding export ROCMPATH="/opt/rocm", that sometimes fixes issues like this. and finally, make sure you add an HSA override to it as well as vlad mentioned, mine looks like this export HSA_OVERRIDE_GFX_VERSION=10.3.0, yours will be whatever the correct version is for your card.
my full webui-user.sh:
export COMMANDLINE_ARGS="--listen --use-rocm --insecure --debug" export ROCMPATH="/opt/rocm" export HSA_OVERRIDE_GFX_VERSION=10.3.0 export TORCH_COMMAND="torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.6"
Also, something to note, you can use an earlier version of rocm built torch with a newer system version, eg I have 5.6 installed and use torch+5.4.2 on a few projects.
@iDeNoh I guess I have to create the webui-user.sh by myself, as there is no such file in my cloned project? Then, should it call webui.sh at the end?
@ConfusedMerlin yes, create the webui-user.sh manually and add whatever you're going to pass into it and then launch with webui.sh however you want. I set up a .desktop shortcut for mine to add it to the favorites bar
and... I just bricked my Installation. A python package conflict crept into it over the last hours of trying. Being a bit fed up and demotivated, I used a lot of -y during apt-get operations. And this one time I hit the... anti-jackpot. Lets says, I had 10G more free space afterwards (astounding, how much stuff is depending on pylib3.10....)
Anyway, I will be back tomorrow, with a fresh 22.04 ubuntu. In the meantime.... do we have user of your automatic fork that managed to get it working with ubuntu 2x and a rx 7600?
i don't know, i'm big on privacy, so there is no callhome to report usage and i honestly don't report what everyone said over time - too many conversations. better ask in discord, its quite active.
but in your case, this has nothing to do with fork - torch refuses to detect your gpu, that one line python test shows that.
you can nuke entire sdnext and once that oneliner reports true, then install sdnext.
If I had to guess I'd say it's probably a busted rocm install, how did you install? I ask because the easiest method is to use the script installer directly off of amd's website,make sure you follow the instructions precisely and ensure you set up the prerequisites before you install.
Greetings @vladmandic
so, lets get this in an orderly way:
13:42:22-023561 DEBUG Loading Torch
Speicherzugriffsfehler (Speicherabzug geschrieben)
... seriously? --medvram (as in the original automatic) wont help, but anyways....
So, why am I getting a true for cuda.is_available now? "video" and "render". Just checked it. Removed my user from these groups, reboot, tried again. False. Added to groups, reboot, tried again. True. rocminfo does hint on that, but its error message can be misleading, as it starts with "permission to dev/something denied", and then of course sudo rocminfo looks cool. But if you add your user to video and render (must be both), then suddenly it can do rocminfo. And it says true!
Now... for the "Speicherzugriffsfehler" (Memory access violation, I would say)... I make another post, so we can keep these apart better (also, I would guess its a new issue)
console:
./webui.sh
Create and activate python venv
Launching launch.py...
14:06:56-781913 INFO Starting SD.Next
14:06:56-786013 INFO Python 3.10.12 on Linux
14:06:56-802997 INFO Version: 69eaf4c6 Sat Aug 12 08:32:19 2023 +0000
14:06:57-162359 DEBUG Setting environment tuning
14:06:57-164374 DEBUG Torch overrides: cuda=False rocm=True ipex=False
diml=False
14:06:57-166294 DEBUG Torch allowed: cuda=False rocm=True ipex=False
diml=False
14:06:57-233467 DEBUG Repository update time: Sat Aug 12 10:32:19 2023
14:06:57-234919 INFO Verifying requirements
14:06:57-248009 INFO Verifying packages
14:06:57-250049 INFO Verifying repositories
14:06:57-257607 DEBUG Submodule:
/opt/ai/automatic/repositories/stable-diffusion-stabili
ty-ai / main
14:06:57-724807 DEBUG Submodule:
/opt/ai/automatic/repositories/taming-transformers /
master
14:06:58-159123 DEBUG Submodule: /opt/ai/automatic/repositories/k-diffusion /
master
14:06:58-997480 DEBUG Submodule: /opt/ai/automatic/repositories/BLIP / main
14:06:59-466906 INFO Verifying submodules
14:06:59-872804 DEBUG Submodule: extensions-builtin/a1111-sd-webui-lycoris /
main
14:06:59-885050 DEBUG Submodule: extensions-builtin/clip-interrogator-ext /
main
14:06:59-896967 DEBUG Submodule:
extensions-builtin/multidiffusion-upscaler-for-automati
c1111 / main
14:06:59-906399 DEBUG Submodule: extensions-builtin/sd-dynamic-thresholding /
master
14:06:59-914269 DEBUG Submodule: extensions-builtin/sd-extension-system-info
/ main
14:06:59-921273 DEBUG Submodule: extensions-builtin/sd-webui-agent-scheduler
/ main
14:06:59-928293 DEBUG Submodule: extensions-builtin/sd-webui-controlnet /
main
14:06:59-958614 DEBUG Submodule:
extensions-builtin/stable-diffusion-webui-images-browse
r / main
14:06:59-966978 DEBUG Submodule:
extensions-builtin/stable-diffusion-webui-rembg /
master
14:06:59-974058 DEBUG Submodule: modules/lora / main
14:06:59-981843 DEBUG Submodule: modules/lycoris / main
14:06:59-989506 DEBUG Submodule: wiki / master
14:07:00-071207 DEBUG Installed packages: 220
14:07:00-072289 DEBUG Extensions all: ['sd-webui-agent-scheduler',
'clip-interrogator-ext',
'stable-diffusion-webui-rembg', 'LDSR', 'SwinIR',
'sd-webui-controlnet', 'sd-dynamic-thresholding',
'multidiffusion-upscaler-for-automatic1111', 'ScuNET',
'sd-extension-system-info',
'stable-diffusion-webui-images-browser', 'Lora',
'a1111-sd-webui-lycoris']
14:07:00-073594 DEBUG Running extension installer:
/opt/ai/automatic/extensions-builtin/sd-webui-agent-sch
eduler/install.py
14:07:00-265381 DEBUG Running extension installer:
/opt/ai/automatic/extensions-builtin/clip-interrogator-
ext/install.py
14:07:08-117198 DEBUG Running extension installer:
/opt/ai/automatic/extensions-builtin/stable-diffusion-webui-rembg/install.py
14:07:08-392023 DEBUG Running extension installer:
/opt/ai/automatic/extensions-builtin/sd-webui-controlnet/install.py
14:07:08-714753 DEBUG Running extension installer:
/opt/ai/automatic/extensions-builtin/sd-extension-system-info/install.py
14:07:08-894843 DEBUG Running extension installer:
/opt/ai/automatic/extensions-builtin/stable-diffusion-webui-images-browser/i
nstall.py
14:07:09-180858 DEBUG Extensions all: []
14:07:09-181893 INFO Extensions enabled: ['sd-webui-agent-scheduler', 'clip-interrogator-ext',
'stable-diffusion-webui-rembg', 'LDSR', 'SwinIR', 'sd-webui-controlnet',
'sd-dynamic-thresholding', 'multidiffusion-upscaler-for-automatic1111',
'ScuNET', 'sd-extension-system-info',
'stable-diffusion-webui-images-browser', 'Lora', 'a1111-sd-webui-lycoris']
14:07:09-183000 INFO Verifying packages
14:07:09-184858 DEBUG Setup complete without errors: 1691842029
14:07:09-190212 INFO Extension preload: 0.0s /opt/ai/automatic/extensions-builtin
14:07:09-190990 INFO Extension preload: 0.0s /opt/ai/automatic/extensions
14:07:09-202213 DEBUG Memory used: 0.04 total: 31.23 Collected 0
14:07:09-203258 DEBUG Starting module: <module 'webui' from '/opt/ai/automatic/webui.py'>
14:07:09-204049 INFO Server arguments: ['--listen', '--use-rocm', '--debug', '--medvram']
14:07:09-211962 DEBUG Loading Torch
Speicherzugriffsfehler (Speicherabzug geschrieben)
rk@rkai:/opt/ai/automatic$
sdnext.log
2023-08-12 14:06:56,781 | sd | INFO | launch | Starting SD.Next
2023-08-12 14:06:56,786 | sd | INFO | installer | Python 3.10.12 on Linux
2023-08-12 14:06:56,802 | sd | INFO | installer | Version: 69eaf4c6 Sat Aug 12 08:32:19 2023 +0000
2023-08-12 14:06:57,162 | sd | DEBUG | installer | Setting environment tuning
2023-08-12 14:06:57,164 | sd | DEBUG | installer | Torch overrides: cuda=False rocm=True ipex=False diml=False
2023-08-12 14:06:57,166 | sd | DEBUG | installer | Torch allowed: cuda=False rocm=True ipex=False diml=False
2023-08-12 14:06:57,233 | sd | DEBUG | installer | Repository update time: Sat Aug 12 10:32:19 2023
2023-08-12 14:06:57,234 | sd | INFO | installer | Verifying requirements
2023-08-12 14:06:57,248 | sd | INFO | installer | Verifying packages
2023-08-12 14:06:57,250 | sd | INFO | installer | Verifying repositories
2023-08-12 14:06:57,257 | sd | DEBUG | installer | Submodule: /opt/ai/automatic/repositories/stable-diffusion-stability-ai / main
2023-08-12 14:06:57,724 | sd | DEBUG | installer | Submodule: /opt/ai/automatic/repositories/taming-transformers / master
2023-08-12 14:06:58,159 | sd | DEBUG | installer | Submodule: /opt/ai/automatic/repositories/k-diffusion / master
2023-08-12 14:06:58,997 | sd | DEBUG | installer | Submodule: /opt/ai/automatic/repositories/BLIP / main
2023-08-12 14:06:59,466 | sd | INFO | installer | Verifying submodules
2023-08-12 14:06:59,872 | sd | DEBUG | installer | Submodule: extensions-builtin/a1111-sd-webui-lycoris / main
2023-08-12 14:06:59,885 | sd | DEBUG | installer | Submodule: extensions-builtin/clip-interrogator-ext / main
2023-08-12 14:06:59,896 | sd | DEBUG | installer | Submodule: extensions-builtin/multidiffusion-upscaler-for-automatic1111 / main
2023-08-12 14:06:59,906 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-dynamic-thresholding / master
2023-08-12 14:06:59,914 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-extension-system-info / main
2023-08-12 14:06:59,921 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-webui-agent-scheduler / main
2023-08-12 14:06:59,928 | sd | DEBUG | installer | Submodule: extensions-builtin/sd-webui-controlnet / main
2023-08-12 14:06:59,958 | sd | DEBUG | installer | Submodule: extensions-builtin/stable-diffusion-webui-images-browser / main
2023-08-12 14:06:59,966 | sd | DEBUG | installer | Submodule: extensions-builtin/stable-diffusion-webui-rembg / master
2023-08-12 14:06:59,974 | sd | DEBUG | installer | Submodule: modules/lora / main
2023-08-12 14:06:59,981 | sd | DEBUG | installer | Submodule: modules/lycoris / main
2023-08-12 14:06:59,989 | sd | DEBUG | installer | Submodule: wiki / master
2023-08-12 14:07:00,071 | sd | DEBUG | installer | Installed packages: 220
2023-08-12 14:07:00,072 | sd | DEBUG | installer | Extensions all: ['sd-webui-agent-scheduler', 'clip-interrogator-ext', 'stable-diffusion-webui-rembg', 'LDSR', 'SwinIR', 'sd-webui-controlnet', 'sd-dynamic-thresholding', 'multidiffusion-upscaler-for-automatic1111', 'ScuNET', 'sd-extension-system-info', 'stable-diffusion-webui-images-browser', 'Lora', 'a1111-sd-webui-lycoris']
2023-08-12 14:07:00,073 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/sd-webui-agent-scheduler/install.py
2023-08-12 14:07:00,265 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/clip-interrogator-ext/install.py
2023-08-12 14:07:08,117 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/stable-diffusion-webui-rembg/install.py
2023-08-12 14:07:08,392 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/sd-webui-controlnet/install.py
2023-08-12 14:07:08,714 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/sd-extension-system-info/install.py
2023-08-12 14:07:08,894 | sd | DEBUG | installer | Running extension installer: /opt/ai/automatic/extensions-builtin/stable-diffusion-webui-images-browser/install.py
2023-08-12 14:07:09,180 | sd | DEBUG | installer | Extensions all: []
2023-08-12 14:07:09,181 | sd | INFO | installer | Extensions enabled: ['sd-webui-agent-scheduler', 'clip-interrogator-ext', 'stable-diffusion-webui-rembg', 'LDSR', 'SwinIR', 'sd-webui-controlnet', 'sd-dynamic-thresholding', 'multidiffusion-upscaler-for-automatic1111', 'ScuNET', 'sd-extension-system-info', 'stable-diffusion-webui-images-browser', 'Lora', 'a1111-sd-webui-lycoris']
2023-08-12 14:07:09,182 | sd | INFO | installer | Verifying packages
2023-08-12 14:07:09,184 | sd | DEBUG | launch | Setup complete without errors: 1691842029
2023-08-12 14:07:09,190 | sd | INFO | installer | Extension preload: 0.0s /opt/ai/automatic/extensions-builtin
2023-08-12 14:07:09,190 | sd | INFO | installer | Extension preload: 0.0s /opt/ai/automatic/extensions
2023-08-12 14:07:09,202 | sd | DEBUG | launch | Memory used: 0.04 total: 31.23 Collected 0
2023-08-12 14:07:09,203 | sd | DEBUG | launch | Starting module: <module 'webui' from '/opt/ai/automatic/webui.py'>
2023-08-12 14:07:09,204 | sd | INFO | launch | Server arguments: ['--listen', '--use-rocm', '--debug', '--medvram']
2023-08-12 14:07:09,211 | sd | DEBUG | webui | Loading Torch
of course, no visible errors inside. If I could offer a guess: it sees the 32gb system memory and tries the 8GB VRAM with it
well, found it, kind of. Its the webui.py, this line:
rnd = torch.sum(torch.randn(2, 2)).to(0)
this is reproducible , as this produces the same error:
python -c "import torch; print(torch.sum(torch.randn(2, 2)).to(0))"
lets see if I can figure out whats wrong here
@ConfusedMerlin
Greetings. Would you mind following the steps here and see if it works?
This line:
rnd = torch.sum(torch.randn(2, 2)).to(0)
is from https://github.com/vladmandic/automatic/issues/1929.
But if your RX 7600 core dumped at this line, you might want to export these two environment variables before calling ./webui.sh
:
export HIP_VISIBLE_DEVICES=0
export HSA_OVERRIDE_GFX_VERSION=11.0.0
@evshiron , oh my dear god, it finally starts and shows the GPU in the system tab! I get about 5.5 It/sec, comared to the 1.7 I get in windows. That is remarkable! (using your are-we-gfx1100-yet fork)
I red up about that memory error; it seems, that this short call here: "torch.sum(torch.randn(2, 2)).to(0)" tries to stuff a Dozend Gigabyte of tensors worth into the VRAM, which only has 8 to offer. I wonder If this would work by commenting out that line simply... Oh, the "need to be video and render group" applies to your fork too, as it seems.
EDiT: So... to sum up:
Shall I close it, or is there something we can do to increase the "lessons learned" from this issue?
@ConfusedMerlin
It shouldn't. It creates a 2x2 matrix with randomness on your GPU, and sum them.
This line was added to ensure torch
initializes early, to avoid another core dump caused by tensorflow-rocm
for our Navi 3x GPUs.
# this is required if you have multiple gpus and you want to use the first one
export HIP_VISIBLE_DEVICES=0
# this is required for apps to correctly recognize navi 3x gpus
export HSA_OVERRIDE_GFX_VERSION=11.0.0
This two lines should be used in most situations, for example:
export HIP_VISIBLE_DEVICES=0
export HSA_OVERRIDE_GFX_VERSION=11.0.0
python3 -c "import torch; print(torch.sum(torch.randn(2, 2)).to(0))"
Will work as soon as you add them.
Would you mind going to the "System Info" tab and run the benchmark SEVERAL TIMES and submit it? It would be great to see the performance of various Navi 3x GPUs.
Nowadays, generating images is well optimized and should not require too much VRAM. If there are larger images that you can't generate, you can use the included Tiled Diffusion and Tiled VAE extension and tweak the parameters to make it work on your GPU (at the cost of performance).
If you find your GPU malfunctions, check dmesg
. If there are plenty of errors coming out from the AMDGPU driver, your GPU will not recover without a reboot.
@evshiron , on it.... I must ask, how do I know if the submit was successful? I mean, I can hit "Submit Results", but nothing seems to happen. The results page seems to be.... already in the future?
@ConfusedMerlin
Every time you click, the WebUI will print a log in the console about how many records have been submitted.
The benchmark data is collected with a logstash service, you can submit several times to make sure they are submitted successfully. The submitted records will be automatically deduplicated.
You will be able to see the records you submit in an hour, because there is a hourly worker to update the benchmark data from collected records.
@evshiron , I see.
There is one thing I just noticed: switching from one checkpoint model to another one takes way longer than in the Windows Automatic1111. There is was a matter of a couple of seconds (at least it seemed this way), here it needs 30 seconds and more. Is that normal?
@ConfusedMerlin
My RX 7900 XTX takes about 4-7 seconds to switch a checkpoint at the moment.
It was faster when HSA_OVERRIDE_GFX_VERSION
wasn't used in the early day. But now I get used to it.
30 seconds sounds pretty slow. I guess it's caused by insufficient RAM or VRAM and swapping or offloading happens during data load. Maybe --medvram
will help?
@evshiron , --medvram just... causes no checkpoint to be loaded at all, but without any error message. I select it, and nothing happens. Hit "reload" in the settings, nothing.
Well, I guess I can live with the longer switching times for now (it sill is weird).
@ConfusedMerlin
According to https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Optimizations, --medvram
:
Makes the Stable Diffusion model consume less VRAM by splitting it into three parts - cond (for transforming text into numerical representation), first_stage (for converting a picture into latent space and back), and unet (for actual denoising of latent space) and making it so that only one is in VRAM at all times, sending others to CPU RAM. Lowers performance, but only by a bit - except if live previews are enabled.
In case you miss it, you should Ctrl+C and ./webui.sh
again after enabling --medvram
.
But if still nothing gonna happen, you might want to check if there is something weird in dmesg
.
Btw, there are "Token Merge Ratio" settings which can be set to 0.3-0.5
to gain some performance boost without affecting the outputs too much.
@evshiron I strg+C out of it and applied medvram. This time at least something did happen:
Progress 0.1it/s ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:03:22
loading the initial safetensor (not checkpoint, if that may of relevance) took this long. then my prompt (test) was spit out like immediately.
Now I switched to another checkpoint, which seems to take even longer to load than before.
dmesg show nothing of interest; the last entries are 30 minutes old and about the memory access violation.
EDIT: And it is finished loading:
16:36:16-131017 INFO Model loaded in 176.7s (load=1.7s config=69.7s create=0.5s apply=0.8s
vae=104.0s)
@ConfusedMerlin
Hmmm. I am out of ideas now.
Model loaded in 176.7s (load=1.7s config=69.7s create=0.5s apply=0.8s vae=104.0s)
There must be something worth investigating, but it is beyond my abilities.
@evshiron
It is like the total opposite to my windows automatic1111 experience. There, the model switches in like 4 seconds, but it onnly get up to 1.5 it/s if I don't do anything fancy. Also, I must run it with --medvram, otherwise it crashes the gpu driver.
Here, the model switch takes an eternity, but once that is done, it spits out pictures with 6 it/s. And while loading the new image, the system actually lags. But neither cpu nor gpu are really used... but I observed, that it takes way more system RAM and less VRAM than on windows, where it claims 95% of my VRAM and a couple of GB system RAM.
Still, stuff does happen. For now, that is enough.
Thank you all for you help, hints and suggestions!
for me, it seemed that the latest GPU driver, the correct torch-version (rocm!) and - critically underrated - my user being in the correct groups (render and video) was necessary to get this to work. Calling rocm-info as non-root user should not create ANY error message, because you need to be in order to be able to use any processes that want to do something with the gpu.
The model loading stuff is something on its own, which should not be in an issue about something else.
@ConfusedMerlin
Enabling Tiled VAE might help with the lag at the end of image generation. It might not be used if the image size is small, and you can reduce Decoder Tile Size (from 64 upwards to find a sweet point for your GPU) to have it applied, if needed.
@ConfusedMerlin
I come up with some ideas. Could you try these when you have time?
rm sdnext.log && ./webui.sh --debug
and then post your sdnext.log (you can zip it)@evshiron , I ran some tests; at one point, the os just froze. And after that reboot, using no custom parameters brought the loading times below the 10 second mark again. For now, I assume that something really fishy got stuck in the vram (I know, that isn't even close to how this works, but a good old crash'n'reboot may has fixed it for now).
If I find a way to reproduce that, I will come back here and open a new issue, okay?
@ConfusedMerlin
That's good news.
It's OK. Remember to mention me when new issue comes out.
@ConfusedMerlin
My RX 7900 XTX takes about 4-7 seconds to switch a checkpoint at the moment.
It was faster when
HSA_OVERRIDE_GFX_VERSION
wasn't used in the early day. But now I get used to it.30 seconds sounds pretty slow. I guess it's caused by insufficient RAM or VRAM and swapping or offloading happens during data load. Maybe
--medvram
will help?
where are models actually residing (and what type of a filesystem)? fast load relies on memory mapping of safetensors file and some filesystems are really bad with that, so you actually get better results if you switch to stream load method (in settings)
Greetings. Would you mind following the steps here and see if it works?
This line:
rnd = torch.sum(torch.randn(2, 2)).to(0)
is from #1929.
But if your RX 7600 core dumped at this line, you might want to export these two environment variables before calling
./webui.sh
:export HIP_VISIBLE_DEVICES=0 export HSA_OVERRIDE_GFX_VERSION=11.0.0
@evshiron today i've first heard of this fork and some of the stuff really makes sense - can we get in direct contact (e.g. are you on discord)?
@vladmandic
Greetings. That's the fork I was working on when we were handling https://github.com/vladmandic/automatic/issues/1929.
It's still far from complete, but at least it works out of the box for Navi 3x users running ROCm 5.5+.
I can make a simplified PR for only Navi 3x (with some if/else) if you need.
Btw, we might be able to eliminate the hack from https://github.com/vladmandic/automatic/issues/1929, if we don't set os.environ.setdefault('TENSORFLOW_PACKAGE', 'tensorflow-rocm')
for Navi 3x.
The list of gfxXXX
and which HSA_OVERRIDE_GFX_VERSION
should be used can be found here (search "GCN GFX10.3" for eaxmple), if you want to extend the strategy for Navi 2x (RDNA 2) dGPUs (my iGPU reported as gfx1036
doesn't work). CDNA GPUs should work without it.
@ConfusedMerlin My RX 7900 XTX takes about 4-7 seconds to switch a checkpoint at the moment. It was faster when
HSA_OVERRIDE_GFX_VERSION
wasn't used in the early day. But now I get used to it. 30 seconds sounds pretty slow. I guess it's caused by insufficient RAM or VRAM and swapping or offloading happens during data load. Maybe--medvram
will help?where are models actually residing (and what type of a filesystem)? fast load relies on memory mapping of safetensors file and some filesystems are really bad with that, so you actually get better results if you switch to stream load method (in settings)
they are on an ext-4 fs, that also hosts about... everything else. I know, not the best setup, But after a dozen or so not quite working manuel partition setups I just stuffed everything on one big fs.
EDiT: would that memmap/stream not be kind of reproducable with ease? Yesterday evening I actually failed to load anything even if it was trying. At this time, even generation also slowed down to 1 - 2 it/s. This morning nothing is remaining of that.... switching models takes a couple of seconds, and generation breezes through with about 7 it/s. ... I should really make a new issue for that, shouldn't I? But... in the other forks issue section.
@evshiron
I can make a simplified PR for only Navi 3x (with some if/else) if you need.
My goal is always to have as good as possible out-of-the-box solution, so anything you can contribute - its appreciated. I don't have AMD GPU, so i rely on community for most of it.
Btw, we might be able to eliminate the hack from https://github.com/vladmandic/automatic/issues/1929, if we don't set os.environ.setdefault('TENSORFLOW_PACKAGE', 'tensorflow-rocm') for Navi 3x.
I was thinking the same - do we actually need tensorflow-rocm
at all?
The list of gfxXXX and which HSA_OVERRIDE_GFX_VERSION should be used can be found here (search "GCN GFX10.3" for eaxmple), if you want to extend the strategy for Navi 2x (RDNA 2) dGPUs (my iGPU reported as gfx1036 doesn't work). CDNA GPUs should work without it.
Absolutely - see #1972 - I'm ready to merge as soon as PR is ready. perhaps you can work with @Aptronymist to create it?
Issue Description
Tried to install vladmandic's automatic on an Ubuntu yesterday to see, if the ROCm backend performs better than automatics1111's openML on windows.
It... kind of worked. After a lot of problems with the Python 3.8/3.10 versions, it finally started. I immediately issued a 512x512 test image (happy cat sitting on a computer), but was a bit disappointed when it claimed to need 4 Minutes to do it. The image appeared after said time.
Which is 7 times the windows openML counterpart needed. but then the CPU fan gave away, that not the GPU was thinking, but the CPU. The system monitor agreed with that observation, when it showed pretty graphs for all my cpu cores above 50%. This was astounding and concerning at the same time.
Astounding, because the openML automatic1111 version estimated 40m+ for that test image with CPU backend and clogged up my CPU with next to 100% for each core; your version had each core around 60% with a lot of fluctuations. Concerning, because I realized that the GPU was idle the whole time. Looking at the systeminfo page (thanks for including that!) I realized that the backend in use was called CPU.
I looked around the interwebs a bit; somebody here posted a similar issue some time ago (https://github.com/vladmandic/automatic/issues/816), but failed to offer the required log files. But there were some instructions inside this ticket, like "remove venv, delete setup.log". Which I did.
While I had a hickup at one try, where it failed to find the CLIP thingy (this didn't happen the next time), this does not resolve the issue. Also, there is no setup.log, as far as I can remember.
Still, the output during the startup sounds kind of promising, as it says "rocm roolkit detected" and stuff like that. But even with the --use-rocm switch, it falls back to GPU without a highly visible error message around.
As far as I can tell, the GPU should be ready to use; its kernel moduls are compiled and activated. But this being the first time I try to get an AMD GPU to run on Linux, I may draw wrong conclusions about this. But if you google "check if AMD GPU works on ubuntu", all answers are about "doing lspci" and stuff, which did after the drivers claimed to be installed. But I guess if you have a dedicated "try to to check if it works" test at hand, I will do that one too.
Finally... I am sorry, but I cannot offer logs right now. The test system being a new one, I managed to forget my gitlab pw yesterday evening, until gitlab locked the ip... Now I am at work, where I cannot access the test system (but the pw manager knows my password) I will add it to this ticket later this day.
Version Platform Description
ubuntu 20.04.5 (tried a 22.04 first, but the gpu driver installation failed... very hard; not your problem) python 3.10.12 (from that inofficial repo, with fitting pip, also keeping the 3.8 as alternative for ubuntu) radeon rx 7600, 23.10.3 for Ubuntu 20.04.5 HWE (see https://www.amd.com/en/support/linux-drivers) the firefiox that comes with ubuntu 20.04.5 (dunno which version that is)
the vladmandic is cloned fresh (yesterday evening), and the webui.sh seemed to have no problems getting its stuff.
Relevant log output
the changed webui.sh contains now this line (instead of only python3, which points to python 3.8, which was declared unsuppored somewhere during my first installation tries)
EDIT: Added log and console outpu
Acknowledgements