[bug]: Fresh install always uses cuda for a ROCm compatible AMD GPU

invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.

https://invoke-ai.github.io/InvokeAI/

Apache License 2.0

23.06k stars 2.39k forks source link

[bug]: Fresh install always uses cuda for a ROCm compatible AMD GPU #4211

Open jackmillward opened 1 year ago

jackmillward commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

OS

Linux

GPU

amd

VRAM

16GB

What version did you experience this issue on?

v3.0.2rc1

What happened?

I used the install script from the latest release, and selected AMD GPU (with ROCm). The script installs perfectly fine, and then I go to run InvokeAI in the graphical web client, and I get this output:

[2023-08-09 17:25:18,419]::[uvicorn.error]::INFO --> Started server process [16866]
[2023-08-09 17:25:18,419]::[uvicorn.error]::INFO --> Waiting for application startup.
[2023-08-09 17:25:18,420]::[InvokeAI]::INFO --> InvokeAI version 3.0.1
[2023-08-09 17:25:18,420]::[InvokeAI]::INFO --> Root directory = /<myrootpath>/AI/invokeAI
[2023-08-09 17:25:18,421]::[InvokeAI]::INFO --> GPU device = cuda AMD Radeon RX 6800 XT

As you can see, it opens up with cuda AMD Radeon RX 6800 XT. This card works just fine with A111 and ROCm. I've also edited the invokeai.yaml file, as I saw that xformers was enabled (isn't available for AMD cards). Here's my current config:

InvokeAI:
  Web Server:
    host: 127.0.0.1
    port: 9090
    allow_origins: []
    allow_credentials: true
    allow_methods:
    - '*'
    allow_headers:
    - '*'
  Features:
    esrgan: true
    internet_available: true
    log_tokenization: false
    patchmatch: true
  Memory/Performance:
    always_use_cpu: false
    free_gpu_mem: true
    max_cache_size: 10.0
    max_vram_cache_size: 2.75
    precision: float32
    sequential_guidance: false
    xformers_enabled: false
    tiled_decode: false
  Paths:
    autoimport_dir: autoimport
    lora_dir: null
    embedding_dir: null
    controlnet_dir: null
    conf_path: configs/models.yaml
    models_dir: models
    legacy_conf_dir: configs/stable-diffusion
    db_dir: databases
    outdir: /<myrootpath>/AI/invokeAI/outputs
    use_memory_db: false
  Logging:
    log_handlers:
    - console
    log_format: color
    log_level: info

Of course, cuda doesn't work with my card and I get all-black output images.

Screenshots

No response

Additional context

This also breaks on the manual install and runs with cuda.

Contact Details

No response

Poisonsting commented 1 year ago

I'm having a similar problem using a 7900 XTX even after I installed Invoke with pip install InvokeAI --use-pep517 --extra-index-url https://download.pytorch.org/whl/nightly/rocm5.6

Invoke still fully installs only the Nvidia stack, then launches in cpu only mode

Poisonsting commented 1 year ago

Using the zip release Installer also installs Nvidia only dependencies, then launches cpu only:

2023-08-14 15:33:39,131]::[InvokeAI]::INFO --> InvokeAI version 3.0.2post1
[2023-08-14 15:33:39,131]::[InvokeAI]::INFO --> Root directory = /opt/InvokeAI
[2023-08-14 15:33:39,135]::[InvokeAI]::INFO --> GPU device = cpu

adeliktas commented 11 months ago

I'm having a similar problem using a 7900 XTX even after I installed Invoke with pip install InvokeAI --use-pep517 --extra-index-url https://download.pytorch.org/whl/nightly/rocm5.6

Invoke still fully installs only the Nvidia stack, then launches in cpu only mode

The provided pytorch+rocm5.6 package from https://download.pytorch.org/whl/rocm5.6 does either not have support for gfx103x or is mistakenly using CUDA.

https://gist.github.com/adeliktas/669812e64fd356afc4648ba847c61133
torch version = 2.0.1+cu117
cuda available = False
cuda version = 11.7
device count = 0

The docs recommend installing 5.4.2 and you might have to run it like: CUDA_VERSION=gfx1030 HSA_OVERRIDE_GFX_VERSION=10.3.0 invokeai-web

https://gist.github.com/adeliktas/669812e64fd356afc4648ba847c61133
torch version = 2.0.1+rocm5.4.2
cuda available = True
cuda version = None
device count = 1
cudart = <module 'torch._C._cudart'>
device = 0
capability = (10, 3)
name = AMD Radeon RX 6600 XT

Poisonsting commented 11 months ago

The 7900 XTX does not work with rocm 5.4.2

CUDA_VERSION=gfx1030 HSA_OVERRIDE_GFX_VERSION=10.3.0 invokeai-web

I suspect the ENV vars are doing the heavy lifting here, though one would need to change the specific values for a 7900 XTX in particular, I'll give this a try:

CUDA_VERSION=gfx1100 HSA_OVERRIDE_GFX_VERSION=11.0.0

The provided pytorch+rocm5.6 package from https://download.pytorch.org/whl/rocm5.6 does either not have support for gfx103x or is mistakenly using CUDA.

This is not correct. I have installed using this package for text-gen-webui, auto1111, and comfy without any issues

nlbutts commented 11 months ago

I have tried to manually modify the create_install.sh and associated files to remove CUDA. I have tried to force ROCM install. I still haven't figured things out. I wish they had the old requirements.txt file. At this point InvokeAI is useless for AMD GPUs.

I tried to update installer.py with the following:

    # device can be one of: "cuda", "rocm", "cpu", "idk"
    device = graphical_accelerator()
    device = "rocm"

    url = None
    optional_modules = "[onnx]"
    if OS == "Linux":
        if device == "rocm":
            url = "https://download.pytorch.org/whl/nightly/rocm5.7"
        elif device == "cpu":
            url = "https://download.pytorch.org/whl/cpu"

But it still tries to install CUDA dependencies.

cfbauer commented 10 months ago

Same experience for me on my AMD 6750 XT on Pop OS. I tell it I have an AMD card but it installs CUDA then launches in CPU mode. Both automatic and manual installation. InvokeAI 3.4.0post2.

Millu commented 10 months ago

@cfbauer can you open the developer console in the automatic installation with option 7 and then do pip show torch?

cfbauer commented 10 months ago

@Millu

$ pip show torch

Version: 2.1.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/chris/invokeai/.venv/lib/python3.10/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: accelerate, basicsr, clip-anytorch, compel, controlnet-aux, facexlib, gfpgan, invisible-watermark, InvokeAI, pytorch-lightning, realesrgan, test-tube, timm, torchmetrics, torchsde, torchvision

slavexe commented 10 months ago

I also have the same issue on Arch with a 6800XT. I installed version v3.4.0post2 from zip, selected AMD GPU during install, yet it launches using CPU. Running pip show torch does indeed indicate CUDA specific dependencies were installed.

Millu commented 10 months ago

@cfbauer @slavexe can you try running this from the developer console:

pip install "torch==2.1.0+rocm5.6" "torchvision==0.16.0+rocm5.6" "requests~=2.28.2" --force-reinstall --extra-index-url https://download.pytorch.org/whl/rocm5.6

slavexe commented 10 months ago

@Millu

Upon running

pip install "torch==2.1.0+rocm5.6" "torchvision==0.16.0+rocm5.6" "requests~=2.28.2" --force-reinstall --extra-index-url https://download.pytorch.org/whl/rocm5.6

it is properly using my GPU now, however the following error showed up after running the command. Not sure if it would impact anything as all the correct ROCM related packages are installed now. Thanks

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. datasets 2.14.7 requires fsspec[http]<=2023.10.0,>=2023.1.0, but you have fsspec 2023.12.0 which is incompatible. Successfully installed MarkupSafe-2.1.3 certifi-2023.11.17 charset-normalizer-3.3.2 cmake-3.27.9 filelock-3.13.1 fsspec-2023.12.0 idna-3.6 jinja2-3.1.2 lit-17.0.6 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.2 pillow-10.1.0 pytorch-triton-rocm-2.1.0 requests-2.28.2 sympy-1.12 torch-2.1.0+rocm5.6 torchvision-0.16.0+rocm5.6 typing-extensions-4.8.0 urllib3-1.26.18

Millu commented 10 months ago

If that error isn't causing issues, I wouldn't worry about it!

cfbauer commented 10 months ago

I'm getting the same error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datasets 2.14.7 requires fsspec[http]<=2023.10.0,>=2023.1.0, but you have fsspec 2023.12.0 which is incompatible.

Trying to run Invoke now gives me this:

./invoke.sh: line 54: 544485 Segmentation fault      (core dumped) invokeai-web $PARAMS

I also tried running what I thought was a compatible version of fsspec but still got the error above telling me I had an incompatible verion:

pip install fsspec==2023.9.0

fzzinchemical commented 9 months ago

I may have found the culprit for why amdgpus default to cuda.

If I find the time, I will test it and report the results back here.

GoDJr commented 9 months ago

Any update on this?

fzzinchemical commented 8 months ago

Any update on this?

Yup and I have no clue what goes wrong or how it does it.

cfbauer commented 8 months ago

export HSA_OVERRIDE_GFX_VERSION=10.3.0

Before running ./invoke.sh appears to have fixed the issue for me on my 6750 xt.

7 Series AMD cards may need this:

export HSA_OVERRIDE_GFX_VERSION=11.0.0

Also of note, tweaking the version numbers of the above suggested command runs without errors: pip install "torch==2.1.2+rocm5.6" "torchvision==0.16.2+rocm5.6" "fsspec==2023.10.0.0" "requests~=2.28.2" --force-reinstall --extra-index-url https://download.pytorch.org/whl/rocm5.6 I don't know enough to be able to say if those versions are a good idea, but they don't error for me on invoke 3.6.0rc6.

disclaimer: I don't have any idea what I'm doing