Closed Soulreaver90 closed 1 year ago
I ended up replacing the torch command in the setup.py file with "torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2" instead of the default cuda install. That works but it still appends --no-half into my arguments, why? and how can I remove it?
EDIT: Nevermind, it did not work. Tried to generate a image, got 6.6s/it. Not sure if its because of the --no-half or its reverting to using my CPU. I get 6-8it/s on Auto1111.
EDIT2: I realized that even though I installed torch rocm5.4.2, torch+cu118 was still installed and seen when checking torch.version. I completely removed torch entirely and reinstalled from scratch.
19:43:18-138823 INFO Torch 2.0.0+rocm5.4.2
19:43:20-887382 INFO Torch backend: AMD ROCm HIP 5.4.22803-474e8620
19:43:20-888597 INFO Torch detected GPU: AMD Radeon RX 6700 XT VRAM 12272
Arch (10, 3) Cores 20
19:43:20-889260 INFO Server arguments: []
However
Tried to generate a image and got a lengthy error.
Progress 4.44it/s ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:02
gradio call: NotImplementedError
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/blah/automatic/modules/call_queue.py:61 in f │
│ │
│ 60 │ │ │ │ pr.enable() │
│ ❱ 61 │ │ │ res = list(func(*args, *kwargs)) │
│ 62 │ │ │ if shared.cmd_opts.profile: │
│ │
│ /home/blah/automatic/modules/call_queue.py:39 in f │
│ │
│ 38 │ │ │ try: │
│ ❱ 39 │ │ │ │ res = func(args, **kwargs) │
│ 40 │ │ │ finally: │
│ │
│ ... 17 frames hidden ... │
│ │
│ /home/blah/automatic/venv/lib/python3.10/site-packages/xformers/ops/fmha/d │
│ ispatch.py:98 in _dispatch_fw │
│ │
│ 97 │ │ priority_list_ops.insert(0, triton.FwOp) │
│ ❱ 98 │ return _run_priority_list( │
│ 99 │ │ "memory_efficient_attention_forward", priority_list_ops, inp │
│ │
│ /home/blah/automatic/venv/lib/python3.10/site-packages/xformers/ops/fmha/d │
│ ispatch.py:73 in _run_priority_list │
│ │
│ 72 │ │ msg += "\n" + _format_not_supported_reasons(op, not_supported) │
│ ❱ 73 │ raise NotImplementedError(msg) │
│ 74 │
╰──────────────────────────────────────────────────────────────────────────────╯
NotImplementedError: No operator found for memory_efficient_attention_forward
with inputs:
query : shape=(1, 4096, 1, 512) (torch.float16)
key : shape=(1, 4096, 1, 512) (torch.float16)
value : shape=(1, 4096, 1, 512) (torch.float16)
attn_bias : <class 'NoneType'>
p : 0.0
cutlassF
is not supported because:
xFormers wasn't build with CUDA support
flshattF
is not supported because:
xFormers wasn't build with CUDA support
max(query.shape[-1] != value.shape[-1]) > 128
tritonflashattF
is not supported because:
xFormers wasn't build with CUDA support
max(query.shape[-1] != value.shape[-1]) > 128
requires A100 GPU
smallkF
is not supported because:
xFormers wasn't build with CUDA support
dtype=torch.float16 (supported: {torch.float32})
max(query.shape[-1] != value.shape[-1]) > 32
unsupported embed per head: 512
EDIT3: Because my previous post was getting long. Webui.sh did not just install the wrong torch version, it installed xformers which was causing the previous issue. Uninstalled xformers and I could FINALLY generate an image. Clearly this was heavily optimized with Nvidia in mind, but we need some AMD love :( lol Anyway I hope my pain helps with optimizing the AMD workflow. Let me know if you need me to test things out.
Update: Anytime I launch webui.sh, it installs xformers and gives me an error about it. When I tried to generate an image, it gives me the memory error shown in the previous post. Not an issue, a can manually launch launch.py with my args with a separate .sh file.
Since I don't have AMD system available, I've asked several times for community to provide best practices and steps - and I'm more than willing to integrate them into core workflow. But I cannot do thar alone.
Since I don't have AMD system available, I've asked several times for community to provide best practices and steps - and I'm more than willing to integrate them into core workflow. But I cannot do thar alone.
Unfortunately I’m not a developer who can assist with debugging, but I can take a look. Right now the two issues are as follows:
-Installs cu118 even though AMD is clearly identified (I did check that code in setup.py) -Installs xformers and defaults to using them regardless which optimizer is chosen in setting.
I have to recheck, but I had changed all instances of Torch command, yet it still installs and defaults the torch install to cu118 when I run the sh script, including the install of xformers. Even if I completely uninstall and remove both torch and xformers, it will reinstall and default to using xformers when I run that script. If I manually launch “launch.py”, I get the no xformers message and proceeds with no issues. Otherwise, everything else works flawlessly.
i've just modified installer so you can override torch and xformers using environment variables. by default, installer will try to install:
torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu118
xformers==0.0.17
but now, you can uninstall them and install whatever packages you want using:
export TORCH_COMMAND="torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu118"
export XFORMERS_PACKAGE="xformers==0.0.17"
and if you set it to none or no or anything like that, it will not try to install them at all - so whatever you installed (or uninstalled) will remain as such. for example:
xformers
if you already have them: pip uninstall xformers
export TORCH_COMMAND=torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
export XFORMERS_PACKAGE=none
let me know if this works?
OP, it didn't hang on the environmental tuning portion. There's no visual indicator of what's going on like a progress bar and there's an issue that's causing pip downloads to take ages. You can verify that it's still functioning by using procmon and watching python go nuts with the writefile. Vlad should add a visual aid for the impatient.
This isn't the same wheel but this is what you would see if the window had a visual indicator.
The timer there doing the funky chicken, it's a Windows 11 issue. Still figuring it out but python succs down the data, spits it into a tmp file as well as an pip-unpack folder, meanwhile Windows 11 is writing an entry for every single thing happening there in little bits in the search index db and causing I/O horseshit. There is 169,742 entries written in Windows-Gather.db from this on my end right now, lol. SystemIndex_Gthr is still loading, 20 minutes later, Windows.db is now 1.1gig
Procmon will be an absolute blur of read and writes, registry checks for namespace, it's adding values to the database for each little part of what pip is doing, etc.
Turn off search indexing,
Vlad should add a visual aid for the impatient
I'll see what I can do. Mostly, this affects torch
installation as that is the only package that is huge (2+ gb)
"This may take time please be patient, like lots of time, go outside and smell a tree."
Just throw that in there, Windows isn't supposed to be indexing this stuff so no idea why its happening I just checked mine and all areas I've got were marked excluded but still, maybe because the venv's end up being 62,725 files or so and those AREN'T excluded by default afaik.
i've just modified installer so you can override torch and xformers using environment variables. by default, installer will try to install:
torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu118
xformers==0.0.17
but now, you can uninstall them and install whatever packages you want using:
export TORCH_COMMAND="torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu118"
export XFORMERS_PACKAGE="xformers==0.0.17"
and if you set it to none or no or anything like that, it will not try to install them at all - so whatever you installed (or uninstalled) will remain as such. for example:
- uninstall
xformers
if you already have them:pip uninstall xformers
export TORCH_COMMAND=torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
export XFORMERS_PACKAGE=none
let me know if this works?
Okay so first things first. The environmental variables work, when I set them and relaunch launch.py, it loads up rocm5.4.2 just fine and there are no xformer messages shown as before. If I uninstall xformers manually and try again, I do get the "no xformers found" message as expected, so "seems" to be fine, however I could not test image generation.
Applying scaled dot product cross attention optimization Segmentation fault (core dumped)
. It gets stuck at the applying portion for a solid minute or so before failing.Not sure if something in the recent commits broke how optimizations are applied. I will add that as much as I like the idea of env variables, it still isn't beginner friendly. It still requires running webui.sh and letting it install cu117, which is a huge waste of time and resources when It should just install rocm5.4.2 at the start. Maybe have a separate setup.py aimed at AMD while a unified version is figured out and implemented. I think its easier to tell someone "hey run webui-amd.sh" as opposed to run these "export commands in so and so" while they look with a blank stare, lol.
Scaled dot product is probaby non functional with ROCm, actually never seen anyone mention using it. Not surpring since it's newer than any version of ROCm.
Change cross optimization to something less aggressive, like Doggetx.
Scaled dot product is probaby non functional with ROCm, actually never seen anyone mention using it. Not surpring since it's newer than any version of ROCm.
Change cross optimization to something less aggressive, like Doggetx.
I would but that would require being able to access the UI to change those settings, correct? Once sdp fails, it just craps out the entire thing. I can run sdp on the previous day's commit with no issue. Does it do anything? no idea, but it runs and the UI is accessible. With the latest commit, it just fails and ends the entire session.
As a workaround, you can edit config.json manually to disable SDP.
And yes on unified installer, I'll add AMD specific stuff into it directly once we know exactly what combo works. Like I said, I don't have AMD system, so I rely on ppl like you to tell me what packages/settings work best.
As a workaround, you can edit config.json manually to disable SDP.
And yes on unified installer, I'll add AMD specific stuff into it directly once we know exactly what combo works. Like I said, I don't have AMD system, so I rely on ppl like you to tell me what packages/settings work best.
Okay I’ll try it out this afternoon. I’m down to test any AMD related commits. I noticed a speed issue in my last install so I’ll try testing it out, I think I messed up but will see.
I have an AMD System with an 6900XT working more or less fine with auto1111. I can later post the versions of ROCm and torch for my system.
I have an AMD System with an 6900XT working more or less fine with auto1111. I can later post the versions of ROCm and torch for my system.
Yeah my card works fine with Auto, however the install process for it was complete ass. Vlad’s installer gets significantly farther than Auto’s, although it too fails at installing the correct torch. I’ll add I haven’t tried Auto’s installer in months so not sure if it was improved. it would be nice if the installers can first detect if the appropriate rocm drivers are installed (via amdgpu?), a lot of the newb issues I’ve encountered are from users who simply fired up Linux/Ubuntu and expected it to work out of the box. I also recall an error that only occurs on 22.04 that requires a certain install, I think I have it in my notes.
there is good info in #269 - can you confirm before i start making code changes to support it out-of-the-box?
I've got an RX 6700xt running more or less on here, I can provide any info needed as well. So far I'm running with --medvram and doggetx for optimizations. I also enabled upcast sampling because why not. I'm getting 5-6it/s at 512x512 at 20 samples, which is at least on par if not better than base a1111, but with less command line args
I've got an RX 6700xt running more or less on here, I can provide any info needed as well. So far I'm running with --medvram and doggetx for optimizations. I also enabled upcast sampling because why not. I'm getting 5-6it/s at 512x512 at 20 samples, which is at least on par if not better than base a1111, but with less command line args
great. command line to install torch is the same?
@vladmandic The error I was having yesterday with loading the model is fixed, no sdp issues now. I was briefly having major issues an hour ago but saw you pushed new commits that fixed them.
However, a new issue and a quirk. When I ran webui.sh fresh, it installed torch+cu117. I removed both torch and xformers and installed torch+rocm5.4.2. I went to launch ui with no issues, however I couldn’t generate an image and was presented with the “no operator found for ‘memory_efficient…”, basically xformers was somehow reinstalled and was STILL setting itself as the main optimizer despite me selecting everything else in settings. Ill add i did not try the args you introduced the other day. I uninstalled xformers again and looks good so far now, can generate images. Which leads me to problem #2 ..
When I first got ui working two days ago, I was getting speeds similar to auto1111, around 6.5it/s. But since yesterday and now today, I can’t get anything above 1.3it/s. I’ve tried several settings and combinations. Not sure what’s going on now.
@Soulreaver90 env options to disable installing xformers should resolve that issue. for the performance, its most likely because of cuda settings being moved to ui settings as of today, so whatever command line you were using before is being ignored.
@Soulreaver90 env options to disable installing xformers should resolve that issue. for the performance, its most likely because of cuda settings being moved to ui settings as of today, so whatever command line you were using before is being ignored.
Okay I’ll check it out again. I blew out the folder and am recloning from scratch. I see it is infact installing rocm5.2 but still shows as torch+cu117 for whatever reason.
EDIT: Installed. Removed xformers, Installed Torch+rocm5.2. Ran launch.py, got the SDP error as yesterday. Reran with my "go-to" args export HSA_OVERRIDE_GFX_VERSION=10.3.0 export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:32
, model loaded with no SDP issue. Will try again later to see if its related to the args or some random chance. Maybe its not applying the HSA override by default? Anyway, speeds are back at 6+ so its perfect now. I do see the settings you mentioned, that explains why --no-half-vae
was giving me an error.
@Soulreaver90 env options to disable installing xformers should resolve that issue. for the performance, its most likely because of cuda settings being moved to ui settings as of today, so whatever command line you were using before is being ignored.
Okay I’ll check it out again. I blew out the folder and am recloning from scratch. I see it is infact installing rocm5.2 but still shows as torch+cu117 for whatever reason.
EDIT: Installed. Removed xformers, Installed Torch+rocm5.2. Ran launch.py, got the SDP error as yesterday. Reran with my "go-to" args
export HSA_OVERRIDE_GFX_VERSION=10.3.0 export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:32
, model loaded with no SDP issue. Will try again later to see if its related to the args or some random chance. Maybe its not applying the HSA override by default? Anyway, speeds are back at 6+ so its perfect now. I do see the settings you mentioned, that explains why--no-half-vae
was giving me an error.
fwiw this is what I'm using in my script, it seems to work perfectly for me.
export TORCH_COMMAND=torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2 export XFORMERS_PACKAGE=none
@Soulreaver90 env options to disable installing xformers should resolve that issue. for the performance, its most likely because of cuda settings being moved to ui settings as of today, so whatever command line you were using before is being ignored.
Okay I’ll check it out again. I blew out the folder and am recloning from scratch. I see it is infact installing rocm5.2 but still shows as torch+cu117 for whatever reason. EDIT: Installed. Removed xformers, Installed Torch+rocm5.2. Ran launch.py, got the SDP error as yesterday. Reran with my "go-to" args
export HSA_OVERRIDE_GFX_VERSION=10.3.0 export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:32
, model loaded with no SDP issue. Will try again later to see if its related to the args or some random chance. Maybe its not applying the HSA override by default? Anyway, speeds are back at 6+ so its perfect now. I do see the settings you mentioned, that explains why--no-half-vae
was giving me an error.![]()
fwiw this is what I'm using in my script, it seems to work perfectly for me.
export TORCH_COMMAND=torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2 export XFORMERS_PACKAGE=none
I’m aware and had tested it yesterday with no issues. It’s just not newbie friendly, this is more of a bandaid until the installer can properly install the correct packages by default. At the very least, we confirmed all things work with AMD cards once the install is done properly.
there is good info in #269 - can you confirm before i start making code changes to support it out-of-the-box?
installed packages on ubuntu 22.04 lts rocm-core5.25 - version 5.2.5.50205-186 rocm-dgbapi - version 0.65.1.50205-186 rocm-gdb - version 11.2.50.200-65 rocm-hip-runtime5.2.5 - version 5.2.5.50205-186 rocm-language-runtime5.2.5 - version 5.2.5.50205-186 rocm-llvm5.2.5 - version 14.0.0.22324.50205-186 rocm-ocl-icd5.2.5 - version 2.0.0.50205-186 rocm-opencl - version 1.2.0-2018111340 (maybe upgrade to version 2) rocm-opencl-dev - 1.2.0-2018111340 (maybe upgrade to version 2) rocm-opencl-runtime - version 5.2.5.50205-186 rocm-opencl5.2.5 - version 2.0.0.50205-186 rocminfo5.2.5 - version 1.0.0.50205-186 hip-runtime-amd5.2.5 - version 5.2.21153-50205-186
There were some bugs with uninstallable packages (rocm-opencl and rocm-opencl-dev). This was the correct thread I believe https://github.com/RadeonOpenCompute/ROCm/issues/1713).
i've just added this to setup:
if shutil.which('nvidia-smi') is not None:
log.info('nVidia toolkit detected')
torch_command = os.environ.get('TORCH_COMMAND', 'torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu118')
xformers_package = os.environ.get('XFORMERS_PACKAGE', 'xformers==0.0.17')
elif shutil.which('rocm-smi') is not None:
log.info('AMD toolkit detected')
torch_command = os.environ.get('TORCH_COMMAND', 'torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2')
xformers_package = os.environ.get('XFORMERS_PACKAGE', 'none')
else:
log.info('Using CPU-only Torch')
torch_command = os.environ.get('TORCH_COMMAND', 'torch torchaudio torchvision')
xformers_package = os.environ.get('XFORMERS_PACKAGE', 'none')
if you can test and let me know? if it works, then we can move on to next stage - what is ideal cross-optimization for amd? i've heard different things...
i've just added this to setup:
if shutil.which('nvidia-smi') is not None: log.info('nVidia toolkit detected') torch_command = os.environ.get('TORCH_COMMAND', 'torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu118') xformers_package = os.environ.get('XFORMERS_PACKAGE', 'xformers==0.0.17') elif shutil.which('rocm-smi') is not None: log.info('AMD toolkit detected') torch_command = os.environ.get('TORCH_COMMAND', 'torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2') xformers_package = os.environ.get('XFORMERS_PACKAGE', 'none') else: log.info('Using CPU-only Torch') torch_command = os.environ.get('TORCH_COMMAND', 'torch torchaudio torchvision') xformers_package = os.environ.get('XFORMERS_PACKAGE', 'none')
if you can test and let me know? if it works, then we can move on to next stage - what is ideal cross-optimization for amd? i've heard different things...
I did a test last night and strangely enough, I was getting my best performance with sdp/doggettx. I was under the impression that sdp wasn't supposed to benefit and users at all, though.
sdp is only available for torch 2.0, so if you have torch 1.13, even if you select it, it will not activate. you'll see in console log on startup which cross-optimization is activated. also seen in system info tab. also, sdp doesn't benefit users of low-end gpus compared to xformers due to workload split cpu<->gpu and if gpu is semi-decent, sdp is not worse.
i've just added this to setup:
if shutil.which('nvidia-smi') is not None: log.info('nVidia toolkit detected') torch_command = os.environ.get('TORCH_COMMAND', 'torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu118') xformers_package = os.environ.get('XFORMERS_PACKAGE', 'xformers==0.0.17') elif shutil.which('rocm-smi') is not None: log.info('AMD toolkit detected') torch_command = os.environ.get('TORCH_COMMAND', 'torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2') xformers_package = os.environ.get('XFORMERS_PACKAGE', 'none') else: log.info('Using CPU-only Torch') torch_command = os.environ.get('TORCH_COMMAND', 'torch torchaudio torchvision') xformers_package = os.environ.get('XFORMERS_PACKAGE', 'none')
if you can test and let me know? if it works, then we can move on to next stage - what is ideal cross-optimization for amd? i've heard different things...
It detects AMD and sets torch to the correct path, but torch.version still shows as torch.cu117. I am also getting a ton of assertion errors at start and a ton of attributeerrors after install. I couldn't even generate an image, I got a bunch of RuntimeErrors "LayerNormKernelImpl" not implemented for 'Half'"
When I rerun launch.py manually, it reinstalled torch rocm5.2 and fixed itself. Seems like there is still something in webui.sh that is pushing cu117. This reinstall did fix the RuntimeErrors so I could generate images. However I am getting the slow 1it/s speed I experienced last night, not sure what causes it go that slow when it should hit 6it/s. Baby steps.
Edit: I noticed the initial install shows as --extra-indexl-url, while the fixed version drops extra.
yeah, there was a leftover code in webui.sh
that did that, i've removed it. any installation should be done by setup.py, not old webui.sh
yeah, there was a leftover code in
webui.sh
that did that, i've removed it. any installation should be done by setup.py, not oldwebui.sh
Bingo! That did the trick and installed the correct drivers! That is great progress. Still getting assertion errors at start AssertionError: Couldn't find Stable Diffusion in any of:
However the terminal crashes at the very end with Applying scaled dot product cross attention optimization Segmentation fault (core dumped)
again. Its on and off with this.
EDIT: When you cleaned up webui.sh, did you check if setup.py or launch.py adds export HSA_OVERRIDE_GFX_VERSION=10.3.0
? I ran it and now the sdp error above is fixed. That will be needed for AMD.
@vladmandic Following up on the above. You removed the entire gpu preq for AMD instead of just the Torch install portion. Those prereqs are a requirement for AMD cards. I added just the following back to webui.sh and now everything installs and works out of the box, no scaled dot errors. You can still move this code over to setup.py but it just needs to live somewhere and setup once.
gpu_info=$(lspci 2>/dev/null | grep VGA) case "$gpu_info" in *"Navi 1"*|*"Navi 2"*) export HSA_OVERRIDE_GFX_VERSION=10.3.0 ;; *"Renoir"*) export HSA_OVERRIDE_GFX_VERSION=9.0.0 printf "\n%s\n" "${delimiter}" printf "Experimental support for Renoir: make sure to have at least 4GB of VRAM and 10GB of RAM or enable cpu mode: --use-cpu all --no-half" printf "\n%s\n" "${delimiter}" ;; *) ;; esac
I'm not sure if this is a problem with my system specifically or what but with the current method of detecting the hardware my system is defaulting to CPU only, After digging around I found that the rocm-smi doesn't seem to be a valid command on my system, changing line 179 to "elif shutil.which('rocminfo') is not None:" does work, though I'm not sure if that's the best way to do it.
After a bit of digging I found that I can access rocm-smi if I add the full path (/opt/rocm/bin/rocm-smi) to the command, so I'm not sure whats going on.
ok, i've switched from rocm-smi
to rocminfo
- that's why i asked community whats the best and always present bin.
regarding setup of env variable:
gpu_info=$(lspci 2>/dev/null | grep VGA)
case "$gpu_info" in
*"Navi 1"*|*"Navi 2"*) export HSA_OVERRIDE_GFX_VERSION=10.3.0
*"Renoir"*) export HSA_OVERRIDE_GFX_VERSION=9.0.0
running lspci
is really bad, it can segfault/fail on some virtualized platform, especially cloud ones. and its not going to work as expected unless its a bare-metal linux install.
need to find a better way to determine which HSA_OVERRIDE_GFX_VERSION
to set.
since Navi
is more common nowadays, perhaps set that as default and for Renoir leave it as documentation note?
again, i need community help for that :)
elif shutil.which('rocminfo') is not None:
log.info('AMD toolkit detected')
os.environ.setdefault('HSA_OVERRIDE_GFX_VERSION', '10.3.0')
torch_command = os.environ.get('TORCH_COMMAND', 'torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2')
xformers_package = os.environ.get('XFORMERS_PACKAGE', 'none')
I tried rocminfo
in terminal and it shows info for both my AMD GPU and CPU, that might cause false positives for the CPU only folks.
@Soulreaver90 the point is that rocminfo
itself will not exist unless you have ROCm
system.
@Soulreaver90 the point is that
rocminfo
itself will not exist unless you haveROCm
system.
You are right. Had a brain fart moment lol.
Could lshw work? you could do "gpu_info=$(lshw -short | grep display)", however it would throw a comment about sudo every time you launch, and unfortunately it wouldn't come pre-installed on all systems afaik.
As an alternative, I've been attempting to find a way to get glxinfo to work, but i've yet to find a solution for that option.
ok, so all community suggestions on what to do for defaults on ROCm setups have been added and I haven't seen any further updates on this thread, so I'll close it. if there are any remaining issues or further tuning needed, lets start with the new thread as there is a lot of history here.
I'm trying to get this tool working, after using Easy Diffusion for a while without problem - using export HSA_OVERRIDE_GFX_VERSION=10.3.0
.
By itself it says nVidia CUDA toolkit detected
despite there being no nVidia and no CUDA packages (but I used to have an nVidia card).
Everything rocm is installed.
I tried to force it using flags, sometimes there are random errors, xformers is removed and other times installed again, but either way, it never uses GPU acceleration.
I added rembg and xformers to requirements.txt, thinking that will help. rembg helped, there was a crash when it couldn't be found. xformers probably confused something.
/opt/sdnext (git)-[master] % ./webui.sh --experimental --reinstall --use-rocm
Create and activate python venv
Launching launch.py...
00:41:12-951991 INFO Running extension preloading
00:41:12-956514 INFO Starting SD.Next
00:41:12-957464 INFO Python 3.11.3 on Linux
00:41:12-967243 INFO Version: 5f2bdba8 Fri Jun 2 12:56:44 2023 -0400
00:41:13-231622 INFO Setting environment tuning
00:41:13-232921 INFO Forcing reinstall of all packages
00:41:13-233952 INFO AMD ROCm toolkit detected
00:41:13-234645 INFO Installing package: torch==2.0.0 torchvision==0.15.1 --index-url
https://download.pytorch.org/whl/rocm5.4.2
00:41:14-672970 ERROR Error running pip: install --upgrade torch==2.0.0 torchvision==0.15.1 --index-url
https://download.pytorch.org/whl/rocm5.4.2
00:41:15-820805 INFO Torch 2.0.1+cu118
00:41:15-914591 INFO Installing package: tensorflow==2.12.0
00:41:19-090670 INFO Verifying requirements
00:41:19-093038 INFO Installing package: addict
00:41:21-546644 INFO Installing package: aenum
00:41:23-982640 INFO Installing package: aiohttp
...
now some things are uninstalled:
% ./webui.sh --experimental --use-rocm
Create and activate python venv
Launching launch.py...
00:45:12-052053 INFO Running extension preloading
00:45:12-056847 INFO Starting SD.Next
00:45:12-057842 INFO Python 3.11.3 on Linux
00:45:12-067872 INFO Version: 5f2bdba8 Fri Jun 2 12:56:44 2023 -0400
00:45:12-342813 INFO Setting environment tuning
00:45:12-346087 INFO AMD ROCm toolkit detected
00:45:13-516102 INFO Torch 2.0.1+cu118
00:45:13-597207 WARNING Not used, uninstalling: xformers 0.0.20
00:45:13-598729 INFO Installing package: un xformers --yes --quiet
00:45:14-307632 INFO Verifying requirements
00:45:14-344259 WARNING Package wrong version: numpy 1.24.3 required 1.23.5
00:45:14-345265 INFO Installing package: numpy==1.23.5
Everything looks happy, but the GPU is not detected.
rocminfo:
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 5 1600 Six-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 1600 Six-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3200
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32795612(0x1f46bdc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32795612(0x1f46bdc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32795612(0x1f46bdc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1030
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6400
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 1024(0x400) KB
L3: 16384(0x4000) KB
Chip ID: 29759(0x743f)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2320
BDFID: 2560
Internal Node ID: 1
Compute Unit: 12
SIMDs per CU: 2
Shader Engines: 2
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 4177920(0x3fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1030
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Some ideas what else to try?
@MightyPork don't post a new issue on an already closed thread (and one which deals with different issue to start with) - i cannot help here.
surry but I didn't want to create a new issue for likely the same problem, rocm gpu is not detected/used
Issue Description
Decided to try this out but can't get far. I was able to run webui.sh, however it did try to install torch2.0 +cu118 even though I have an AMD card and it should have installed ROCM instead. However even after all that, it got hung on "Setting environment tuning". I closed it, installed the rocm torch drivers and reran launcher. It hangs at Setting environment tuning for minutes, and then it still shows torch + cu118 and says CUDA not available.
"18:36:31-613009 INFO Python 3.10.6 on Linux 18:36:31-622754 INFO No changes detected: quick launch active 18:36:31-623268 INFO Setting environment tuning 18:40:36-283387 INFO Torch 2.0.0+cu118 18:40:36-284238 WARNING Torch repoorts CUDA not available 18:40:36-284768 INFO Server arguments: ['--no-half-vae', '--skip-requirements', '--skip-extensions',
'--no-half'] Available models: /home/blah/automatic/models/Stable-diffusion 0 Download the default model? (y/N) Loading theme: black-orange Running on local URL: http://127.0.0.1:7861"
Version Platform Description
Ubuntu 22.04