invoke-ai / InvokeAI

Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
23.34k stars 2.4k forks source link

[bug]: Can't use AMD gpu #5599

Closed genelatham closed 4 months ago

genelatham commented 8 months ago

Is there an existing issue for this problem?

Operating system

Linux

GPU vendor

Apple Silicon (MPS)

GPU model

RX 6700 XT

GPU VRAM

12GB

Version number

3.6.2

Browser

Firefox 121.0

Python dependencies

No response

What happened

I installed and selected option 2 for AMD GPU. The GPU drivers are installed and other stable diffusion front ends can use it. I tried to reconfigure but no option was offered for AMD or ROCM (in option 5 of the main menu). I tried the fix from #5219. No difference. I tried the suggestions from #4202 also no change.

What you expected to happen

I expected it to use the AMD GPU

How to reproduce the problem

As stated above, I just did an install.

Additional context

I am running on Mint 21.3 which is essentially Ubuntu 22.04.

Hears what is in the log: Generate images with a browser-based interface

patchmatch.patch_match: INFO - Compiling and loading c extensions from "/home/invoke/invokeai/.venv/lib/python3.10/site-packages/patchmatch". patchmatch.patch_match: ERROR - patchmatch failed to load or compile (Command 'make clean && make' returned non-zero exit status 2.). patchmatch.patch_match: INFO - Refer to https://invoke-ai.github.io/InvokeAI/installation/060_INSTALL_PATCHMATCH/ for installation instructions. [2024-01-29 15:22:46,863]::[InvokeAI]::INFO --> Patchmatch not loaded (nonfatal) /home/invoke/invokeai/.venv/lib/python3.10/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be removed in 0.17. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( [2024-01-29 15:22:51,017]::[uvicorn.error]::INFO --> Started server process [11713] [2024-01-29 15:22:51,017]::[uvicorn.error]::INFO --> Waiting for application startup. [2024-01-29 15:22:51,017]::[InvokeAI]::INFO --> InvokeAI version 3.6.2 [2024-01-29 15:22:51,017]::[InvokeAI]::INFO --> Root directory = /home/invoke/invokeai [2024-01-29 15:22:51,018]::[InvokeAI]::INFO --> Initializing database at /home/invoke/invokeai/databases/invokeai.db [2024-01-29 15:22:51,019]::[InvokeAI]::INFO --> GPU device = cpu

Discord username

@GeneL

ebr commented 8 months ago

Could you please try running pip install torch torchvision --force-reinstall --extra-index-url https://download.pytorch.org/whl/rocm5.6 and see if it fixes the issue? (please make sure you have the virtual environment activated when you do this. source /home/invoke/invokeai/.venv/bin/activate). And then try running Invoke again. If this still does not help, please delete the virtual environment (the /home/invoke/invokeai/.venv folder), install again, and attach the complete console output of the install process.

sysbadmin commented 8 months ago

Same here, tried the latest release (3.6.2) and the manual install, using both pytorch rocm5.6 and 5.4.2 on linux with a rx6650xt.

ebr commented 8 months ago

@sysbadmin to clarify - are you experiencing the same issue after trying the steps above?

sysbadmin commented 8 months ago

@sysbadmin to clarify - are you experiencing the same issue after trying the steps above?

Yes

genelatham commented 8 months ago

I did the install as requested. To establish the venv I used the developer's console, if that's not right I will try again. So, as requested I removed the .venv directory and reran the install. I had done a manual install before and so decided to do an automatic one hoping for better result. It ran for a very long time (my internet connection is limited). The requested log is attached. install.log

ebr commented 8 months ago

Thanks for the logs @genelatham. it's surprising to me that you're getting nvidia and onnx libraries installed. on the plus side, you have a lot of the requirements cached already, so the installation shouldn't use your bandwidth.

let's run through a very basic manual install and see if you get better results. Do not use the developer console, and do not activate any virtual environment prior to this:

# delete the virtual environment
rm -rf /home/invoke/invokeai/.venv/
# create a new virtual environment
python3 -m venv /home/invoke/invokeai/.venv
# activate it
source /home/invoke/invokeai/.venv/bin/activate
# install invoke
pip install "invokeai==3.6.2" --extra-index-url https://download.pytorch.org/whl/rocm5.4.2
# configure invoke (this *may* download new models, but you probably have some locally already)
invokeai-configure
# run invoke
invokeai-web

Please report back with the results. We appreciate you helping troubleshoot this!

genelatham commented 8 months ago

"it's surprising to me that you're getting nvidia and onnx libraries installed" it kinda surprised me too. I don't know anything about how Cuda or ROCM work so I figured I just didn't understand.

When I ran the pip install it download a few module but only a few, maybe nightly updates. The invokeai-configure step did not offer me an option for AMD or ROCM so I left it on auto. I made no adjustments to the configuration. I did the log with the script command and the results are a little hard to read to me, if you have alternative suggestions, I can do it again. I think it's mostly the "operator entertainment" that is so ugly. (Actually it was not that bad using VS Code rather than cat.) Log is attached: manual.log

harm0nic commented 8 months ago

Just wanted to leave a note that I had this EXACT same issue, but fixed it by launching the dev console and reinstalling romc.

pip install --force-reinstall torch==2.1.0 --index-url https://download.pytorch.org/whl/rocm5.6.

ebr commented 8 months ago

Thank you @harm0nic, --force-reinstall is key here. @genelatham please give this a try.

genelatham commented 8 months ago

I have done this several times. But nonetheless I copied the command directly from the above mail message. Whenever I run this command I get the following message at the end:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchvision 0.16.2 requires torch==2.1.2, but you have torch 2.1.0+rocm5.6 which is incompatible. invokeai 3.6.2 requires torch==2.1.2, but you have torch 2.1.0+rocm5.6 which is incompatible. huggingface-hub 0.20.2 requires fsspec>=2023.5.0, but you have fsspec 2023.4.0 which is incompatible. Successfully installed MarkupSafe-2.1.3 filelock-3.9.0 fsspec-2023.4.0 jinja2-3.1.2 mpmath-1.3.0 networkx-3.2.1 pytorch-triton-rocm-2.1.0 sympy-1.12 torch-2.1.0+rocm5.6 typing-extensions-4.8.0

I don't know if this is an issue or not.

In any case, it still uses the CPU for generation.

Shal-Ziar commented 8 months ago

For full disclosure I'm running InvokeAI on a OpenSuse LEAP 15 installation, so my solution may not work for you. But what helped me is doing the following:

pip install --force-reinstall torch==2.1.2+rocm5.6 --index-url https://download.pytorch.org/whl/rocm5.6
pip install --force-reinstall torchaudio==2.1.2+rocm5.6 --index-url https://download.pytorch.org/whl/rocm5.6

The only complaint I get after that is that, when the installation is finished it complains about a Huggingface library not being of the right version. But GPU generation works

genelatham commented 8 months ago

@Shal-Ziar thanks for the suggestion: After running the suggested commands I got the following error message and it still uses the CPU.

/home/invoke/invokeai/.venv/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'libc10_cuda.so: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?

It appears to be looking for the cuda stuff. But I don't know if that is important.

genelatham commented 8 months ago

I am often guilty of thinking things don't matter that do. So I thought I would add some background information.

First of all I use the setup I will describe with Easy Diffusion every day. ED installed and found the GPU on the first try. So, I thought the setup was good for stable diffusion, perhaps there is an issue.

So, the system runs as a guest on a Proxmox server. The GPU is a Radon RX 6700 XT. It is passed to the VM via PCI pass through. The CPU is a Ryzen 5 1600 (Six cores, 12 threads). It is passed through to the VM as native. The VM has 28 GB of memory. The VM is running Linux Mint 21.3 Virginia (which is based on Ubuntu 22.04). I installed ROCm version 6.0.0.
The VM sees the RX 6700 as as secondary graphics adapter.

So, while I don't think any of this matters since it works with Easy Diffusion; I wanted to put a system description in the bug report.

A note; when I am in the developer's console and I run: pip list|grep roc I get:

multiprocess 0.70.15 pytorch-triton-rocm 2.1.0 torch 2.1.2+rocm5.6 torchaudio 2.1.2+rocm5.6

If anymore info is needed, let me know.

russjr08 commented 8 months ago

@genelatham Have you by chance tried the fixes mentioned over in #4211 yet? This worked for me, specifically I had to do the reinstall fix mentioned here (and in the linked issue, they're the same thing it seems), and make sure that opencv and python3-opencv were installed, and finally I had to do the last comment of making sure that export HSA_OVERRIDE_GFX_VERSION=10.3.0 was ran before running invoke.sh. I'm currently on Fedora 39, also with a Radeon RX 6700XT.

To make things easier, so I don't forget the variable export (otherwise you get a core dump/segfault when attempting to generate an image) I wrapped it in a launch.sh script:

#!/bin/bash
export HSA_OVERRIDE_GFX_VERSION=10.3.0
bash invoke.sh

After those steps, I was able to get InvokeAI to work properly on my system using GPU acceleration.

Shal-Ziar commented 8 months ago

@genaletham

I too get that error but the website starts for me and it works. Only had issues with controlnet, but that may be unrelated.

Sent from Outlook for Androidhttps://aka.ms/AAb9ysg


From: Russell @.> Sent: Saturday, February 3, 2024 11:48:12 PM To: invoke-ai/InvokeAI @.> Cc: M de Vries @.>; Mention @.> Subject: Re: [invoke-ai/InvokeAI] [bug]: Can't use AMD gpu (Issue #5599)

@genelathamhttps://github.com/genelatham Have you by chance tried the fixes mentioned over in #4211https://github.com/invoke-ai/InvokeAI/issues/4211 yet? This worked for me, specifically I had to do the reinstall fix mentioned here (and in the linked issue, they're the same thing it seems), and make sure that opencv and python3-opencv were installed, and finally I had to do the last comment of making sure that export HSA_OVERRIDE_GFX_VERSION=10.3.0 was ran before running invoke.sh. I'm currently on Fedora 39, also with a Radeon RX 6700XT.

To make things easier, so I don't forget the variable export (otherwise you get a core dump/segfault when attempting to generate an image) I wrapped it in a launch.sh script:

!/bin/bash

export HSA_OVERRIDE_GFX_VERSION=10.3.0 bash invoke.sh

After those steps, I was able to get InvokeAI to work properly on my system using GPU acceleration.

— Reply to this email directly, view it on GitHubhttps://github.com/invoke-ai/InvokeAI/issues/5599#issuecomment-1925476175, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADRX35MBWPMC2SAJLDJAKCTYR25CZAVCNFSM6AAAAABCQEUIDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGQ3TMMJXGU. You are receiving this because you were mentioned.Message ID: @.***>

genelatham commented 8 months ago

@russjr08 Thanks for the suggestion. I tried it and it didn't work. I think I should start over with a fresh install and try some of the patches again.

ebr commented 8 months ago

@genelatham please let us know if you are still experiencing this issue

russjr08 commented 8 months ago

@ebr I can confirm that with the newest installer, it correctly initializes the AMD ROCm libraries and doesn't need the previous pip interventions (which I previously needed to do).

Although for my card (Radeon 6700XT) I still have to run export HSA_OVERRIDE_GFX_VERSION=10.3.0 before launching InvokeAI, as while it will still detect it as a cuda device, without the variable being set it'll crash upon actually trying to run any generation with a core dumped message.

genelatham commented 8 months ago

What is the latest version? Tomorrow is the soonest I could try it. I've been very busy with work and not had time to work on this.

ebr commented 8 months ago

@genelatham https://github.com/invoke-ai/InvokeAI/releases - v3.6.3 is the latest release as of this writing. No pressure to test this - please let us know anytime if you're still having an issue. We do believe this to be fixed.

ebr commented 8 months ago

Although for my card (Radeon 6700XT) I still have to run export HSA_OVERRIDE_GFX_VERSION=10.3.0 before launching InvokeAI, as while it will still detect it as a cuda device, without the variable being set it'll crash upon actually trying to run any generation with a core dumped message.

This is great info @russjr08. Thank you!

@Millu: please see above - this seems like a good addition to Discord FAQs and the docs for that particular GPU.

genelatham commented 8 months ago

Thanks Eugene. Unfortunately, I have hardware problems with the system. I expect to get them resolved late this week or next. Sorry.

On February 12, 2024 1:58:49 PM CST, Eugene Brodsky @.***> wrote:

@genelatham https://github.com/invoke-ai/InvokeAI/releases - v3.6.3 is the latest release as of this writing. No pressure to test this - please let us know anytime if you're still having an issue. We do believe this to be fixed.

-- Reply to this email directly or view it on GitHub: https://github.com/invoke-ai/InvokeAI/issues/5599#issuecomment-1939464704 You are receiving this because you were mentioned.

Message ID: @.***>

genelatham commented 7 months ago

I finally got InvokeAi to use the AMD graphics card. I am somewhat embarrassed to say the problem was that I had not added the user to the groups render and video. I'm sorry. But it is something that should be pointed out to others that have this problem, so I will leave this note here.

I have other problems now. But I don't fully understand them yet.

Shal-Ziar commented 7 months ago

Ah that happens, it's easy to forget steps. I have most everything working with my 6800xt so its possible to get it to work. Sadly I feel like the performance is still significantly less than an equivalent Nvidia card.

FredLaGitoune commented 7 months ago

Hello People,

Tried InvokeAI on :

Ryzen 7 5800X 32Gb RAM DDR4 3600 GPU : RX6900XT Aorus Linux install is Bodhi Linux (funny distro , based Ubuntu 22.04)

Always device = cpu ...

tried all i read here and everywhere else ...

So I share my solution

(because same request on CPU is 10 minutes and 6 seconds on GPU ... )

I installed of course AMDGPU dirvers on SUDO ....

But I didn't add my user to the groups allowed to use the kernel ...

type "groups" ... see if you have "RENDER" and "VIDEO" ...

if not ... add them ... reboot ... let device on "AUTO" ... lauching the invoke server will show you the right GPU...

(ps I used as well the export line with GFX version stated on this thread)

hope this help