[bug]: segmentation fault on AMD GPU

src-r-r commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

OS

Linux

GPU

amd

VRAM

8GB

What happened?

Program installs and starts just fine, but when I hit the "Invoke" button, I immediately get a segmentation fault.

$ invokeai --web 2>&1                            
* Initializing, be patient...                               
>> Initialization file /home/user/invokeai/invokeai.init found. Loading...
>> Internet connectivity is False
>> InvokeAI, version 2.3.1.post2
>> InvokeAI runtime directory is "/home/user/invokeai"                                                                >> GFPGAN Initialized                                                                                                   >> CodeFormer Initialized                                                                                               >> ESRGAN Initialized
>> Using device_type cuda
>> xformers not installed  
>> NSFW checker is disabled  
>> Current VRAM usage:  0.00G                                                                                           
>> Loading diffusers model from stabilityai/stable-diffusion-2-1
  | Using faster float16 precision
** An unexpected error occurred while downloading the model: stabilityai/stable-diffusion-2-1 does not appear to have a 
file named model_index.json.)
  | Default image dimensions = 768 x 768
>> Model loaded in 0.77s
>> Max VRAM used to load the model: 0.00G 
>> Current VRAM usage:0.00G
>> Loading embeddings from /home/user/invokeai/embeddings
>> Textual inversion triggers:  
>> Setting Sampler to k_lms (LMSDiscreteScheduler)
* --web was specified, starting web server...
* Initializing, be patient...
>> Initialization file /home/user/invokeai/invokeai.init found. Loading...
>> Started Invoke AI Web Server!
>> Default host address now 127.0.0.1 (localhost). Use --host 0.0.0.0 to bind any address.
>> Point your browser at http://127.0.0.1:9090
>> System config requested
>> patchmatch.patch_match: INFO - Compiling and loading c extensions from "/home/user/invokeai/.venv/lib/python3.10/site-packages/patchmatch".
>> patchmatch.patch_match: ERROR - patchmatch failed to load or compile (Command 'make clean && make' returned non-zero exit status 2.).
>> patchmatch.patch_match: INFO - Refer to https://invoke-ai.github.io/InvokeAI/installation/060_INSTALL_PATCHMATCH/ for installation instructions.
>> Patchmatch not loaded (nonfatal)

>> Image Generation Parameters:

{'prompt': 'banana sushi', 'iterations': 1, 'steps': 50, 'cfg_scale': 7.5, 'threshold': 0, 'perlin': 0, 'height': 512, 'width': 512, 'sampler_name': 'k_lms', 'seed': 2545060656, 'progress_images': False, 'progress_latents': True, 'save_intermediates': 5, 'generation_mode': 'txt2img', 'init_mask': '...', 'hires_fix': False, 'seamless': False, 'variation_amount': 0}

>> ESRGAN Parameters: False
>> Facetool Parameters: False
[1]    1147376 segmentation fault (core dumped)  invokeai --web 2>&1

I'm using v2.3.1-post-2

I've tried other models and get the same result. I do not have this issue with invokeai v1.3

I've been playing around with other AI libraries lately and been encountering segfaults, too. Still haven't figured out why in most cases.

Here's the output from rocminfo:

[37mROCk module is loaded[0m
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4500                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            8                                  
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    31790420(0x1e51554) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    31790420(0x1e51554) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    31790420(0x1e51554) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1030                            
  Uuid:                    GPU-XX                             
  Marketing Name:          Radeon RX 580 Series               
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26591(0x67df)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1366                               
  BDFID:                   256                                
  Internal Node ID:        1                                  
  Compute Unit:            36                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8388608(0x800000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1030         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

I also have HSA_OVERRIDE_GFX_VERSION=10.0.3. If I unset HSA_OVERRIDE_GFX_VERSION and start the invokeai web I get the error "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"

How would I determine where this segfault is coming from?

Screenshots

No response

Additional context

No response

Contact Details

No response

Lolagatorade commented 1 year ago

I had the same issue with nvidia actually. Segmentation errors left and right. Manjaro arch Linux. Installing using the auto installer. Found out I had to delete whole installation and reinstall using python. Sucks but actually worked. Not sure how or why.

Working now but now I have a different issue with blank black images.

lstein commented 1 year ago

@Lolagatorade I believe I got to the bottom of the black images problem earlier today and have posted a fix which will appear in 2.3.2 (coming soon).

@src-r-r I feel your pain. ROCm support is very spotty and I've had numerous difficulties with AMD GPUs. Generally the problem is with the torch library and on one system I ended up having to recompile pytorch from source code in order to get a stable system. You can try to load different versions of torch and see if one is more stable than other. To do so, enter the "developer's console" and try loading the "nightly preview" version following the "pip install" instructions at https://pytorch.org/get-started/locally/

github-actions[bot] commented 1 year ago

There has been no activity in this issue for 14 days. If this issue is still being experienced, please reply with an updated confirmation that the issue is still being experienced with the latest release.

muhamadazmy commented 1 year ago

I have the same issue I also noticed this error message in dmesg:

[  264.032168] invokeai[1521]: segfault at 20 ip 00007f9c9a6b40a7 sp 00007ffd2bb33f00 error 4 in libamdhip64.so[7f9c9a600000+3f3000]
[  264.032178] Code: 8d 15 5d 6d 25 00 48 8d 3d f6 6c 25 00 be 32 00 00 00 e8 dc ed 1f 00 e8 c7 ed 1f 00 48 8b 45 b8 48 8b 50 28 4c 8b 24 da 31 c0 <41> 80 7c 24 20 00 74 11 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d

muhamadazmy commented 1 year ago

I managed to make it work by making sure i install nightly build of torch first with

given you have an active virtualenv

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.4.2

The i install invokeai from the repo with pip install -e but i make sure i remove the torch dependency from the pyproject.toml first so it doesn't override the already installed version.

puresick commented 1 year ago

It is also segfaulting for me on Arch Linux using an AMD Radeon 5500XT. Running on the CPU is fine (AMD Ryzen 7 3700X).

I tried using the auto installer as well as installing it manually with python (both with python 3.9.57 and 3.10.12), but no difference. Unfortunately @muhamadazmy suggestion above also did not work for me.

Each try, this was the log output from the time starting invokeai --web until it crashed:

amdgpu.ids: No such file or directory
amdgpu.ids: No such file or directory
* Initializing, be patient...
>> Initialization file /home/puresick/invokeai/invokeai.init found. Loading...
>> Internet connectivity is True
>> InvokeAI, version 2.3.5.post2
>> InvokeAI runtime directory is "/home/puresick/invokeai"
>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Using device_type cuda
>> CUDA device 'AMD Radeon Graphics' (GPU 0)
>> xformers not installed
>> NSFW checker is disabled
>> Current VRAM usage:  0.00G
>> Loading diffusers model from stabilityai/stable-diffusion-2-1
   | Using faster float16 precision
   | Default image dimensions = 768 x 768
>> Model loaded in 2.19s
>> Max VRAM used to load the model: 2.60G
>> Current VRAM usage:2.60G
>> Loading embeddings from /home/puresick/invokeai/embeddings
>> Textual inversion triggers:
>> Setting Sampler to k_lms (LMSDiscreteScheduler)

* --web was specified, starting web server...
* Initializing, be patient...
>> Initialization file /home/puresick/invokeai/invokeai.init found. Loading...
>> Started Invoke AI Web Server!
>> Default host address now 127.0.0.1 (localhost). Use --host 0.0.0.0 to bind any address.
>> Point your browser at http://127.0.0.1:9090
>> System config requested
>> patchmatch.patch_match: ERROR - patchmatch failed to load or compile (libvtkFiltersTexture.so.1: cannot open shared object file: No such file or directory).
>> patchmatch.patch_match: INFO - Refer to https://invoke-ai.github.io/InvokeAI/installation/060_INSTALL_PATCHMATCH/ for installation instructions.
>> Patchmatch not loaded (nonfatal)
>> System config requested

>> Image Generation Parameters:

{'prompt': 'banana sushi', 'iterations': 1, 'steps': 50, 'cfg_scale': 7.5, 'threshold': 0, 'perlin': 0, 'height': 512, 'width': 512, 'sampler_name': 'k_lms', 'seed': 479169790, 'progress_images': False, 'progress_latents': True, 'save_intermediates': 5, 'generation_mode': 'txt2img', 'init_mask': '...', 'hires_fix': False, 'seamless': False, 'variation_amount': 0}

>> ESRGAN Parameters: False
>> Facetool Parameters: False
Segmentation fault (core dumped)

rocminfo outputs the following information:

[37mROCk module is loaded[0m
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 3700X 8-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 3700X 8-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3600                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32809736(0x1f4a308) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32809736(0x1f4a308) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32809736(0x1f4a308) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1012                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 5500 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      2048(0x800) KB                     
  Chip ID:                 29504(0x7340)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1900                               
  BDFID:                   11520                              
  Internal Node ID:        1                                  
  Compute Unit:            22                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    1280(0x500)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1012:xnack-  
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

puresick commented 1 year ago

What is the reason this issue has been closed @hipsterusername?

hipsterusername commented 1 year ago

We’ve released the 3.0 alpha and it’s a general reset on any issues experienced with the app. If you experience the same segfaulting, I’d advise creating a new issue

TheKarls commented 1 year ago

Someone has updates on this? Still have the problem with 3.1.1 I saw this issue that shows some workaround but there is no way for to solve this staying with the installation script?

puresick commented 1 year ago

@TheKarls As far as I know this is still somewhat unsolved.

Personally I prevented InvokeAI from segfaulting by setting the environment variable HSA_OVERRIDE_GFX_VERSION=10.3.0, but this leads to the following issue: https://github.com/invoke-ai/InvokeAI/issues/4364.

Also, what hardware configuration and OS are you using? That will be helpful for the InvokeAI team to work on this.

TheKarls commented 1 year ago

I'm on Arch Linux, with a 6700XT GPU I also tried to run it with rocm 5.2, It works but only with CPU, the gpu is not recognised.

DvdGiessen commented 1 year ago

After some puzzling I got it to work on my 7900XTX on Arch with kernel 6.5.6 and 78377469dbddd8456b12e830aa2f8cc19620f916 (latest main when I checked out, though tag 3.2.0 will probably work too), using a heavily modified Docker setup:

Dockerfile

# syntax=docker/dockerfile:1.4

# Build the Web UI
FROM node:18 AS web-builder
WORKDIR /build
COPY invokeai/frontend/web/ ./
RUN --mount=type=cache,target=/usr/lib/node_modules \
    npm install --include dev
RUN --mount=type=cache,target=/usr/lib/node_modules \
    yarn vite build

# InvokeAI runtime for AMD cards
FROM rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1 AS runtime

ARG DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

RUN apt update && apt install -y --no-install-recommends \
        git \
        curl \
        vim \
        tmux \
        ncdu \
        iotop \
        bzip2 \
        gosu \
        libglib2.0-0 \
        libgl1-mesa-glx \
        python3-pip \
        build-essential \
        libopencv-dev \
        libstdc++-10-dev && \
    apt-get clean && apt-get autoclean && \
    pip install --upgrade pip

ENV INVOKEAI_SRC=/opt/invokeai
ENV INVOKEAI_ROOT=/invokeai
ENV PATH="$INVOKEAI_SRC:$PATH"

WORKDIR ${INVOKEAI_SRC}

COPY invokeai ./invokeai
COPY pyproject.toml ./
RUN --mount=type=cache,target=/root/.cache/pip pip install .[onnx-cuda]
COPY --link --from=web-builder /build/dist ${INVOKEAI_SRC}/invokeai/frontend/web/dist

# build patchmatch
RUN cd /usr/lib/$(uname -p)-linux-gnu/pkgconfig/ && ln -sf opencv4.pc opencv.pc
RUN python3 -c "from patchmatch import patch_match"

# Create unprivileged user and make the local dir
RUN userdel $(getent passwd 1000 | cut -d: -f1) && useradd --create-home --shell /bin/bash -u 1000 -G video --comment "container local user" invoke
RUN mkdir -p ${INVOKEAI_ROOT} && chown -R invoke:invoke ${INVOKEAI_ROOT}

COPY docker/docker-entrypoint.sh ./
ENTRYPOINT ["/opt/invokeai/docker-entrypoint.sh"]
CMD ["invokeai-web", "--host", "0.0.0.0"]

docker-compose.yml

version: '3.8'

services:
  invokeai:
    build:
      context: ..
      dockerfile: docker/Dockerfile
    environment:
      HSA_OVERRIDE_GFX_VERSION: 11.0.0
    devices:
      - /dev/dri:/dev/dri
      - /dev/kfd:/dev/kfd
    ports:
      - 9090:9090/tcp
    volumes:
      - ./data:/invokeai
    command: ["invokeai-web", "--host", "0.0.0.0"]

Biggest difference is installing in the global env (not a venv) in the AMD ROCm Docker image. I suppose those have some extra customizations that fixed most of my issues.

Anyway, sharing here in the hope it'll prove useful for someone else.

Zlendy commented 11 months ago

After some puzzling I got it to work on my 7900XTX on Arch with kernel 6.5.6 and 7837746 (latest main when I checked out, though tag 3.2.0 will probably work too), using a heavily modified Docker setup:

Dockerfile

# syntax=docker/dockerfile:1.4

# Build the Web UI
FROM node:18 AS web-builder
WORKDIR /build
COPY invokeai/frontend/web/ ./
RUN --mount=type=cache,target=/usr/lib/node_modules \
    npm install --include dev
RUN --mount=type=cache,target=/usr/lib/node_modules \
    yarn vite build

# InvokeAI runtime for AMD cards
FROM rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1 AS runtime

ARG DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

RUN apt update && apt install -y --no-install-recommends \
        git \
        curl \
        vim \
        tmux \
        ncdu \
        iotop \
        bzip2 \
        gosu \
        libglib2.0-0 \
        libgl1-mesa-glx \
        python3-pip \
        build-essential \
        libopencv-dev \
        libstdc++-10-dev && \
    apt-get clean && apt-get autoclean && \
    pip install --upgrade pip

ENV INVOKEAI_SRC=/opt/invokeai
ENV INVOKEAI_ROOT=/invokeai
ENV PATH="$INVOKEAI_SRC:$PATH"

WORKDIR ${INVOKEAI_SRC}

COPY invokeai ./invokeai
COPY pyproject.toml ./
RUN --mount=type=cache,target=/root/.cache/pip pip install .[onnx-cuda]
COPY --link --from=web-builder /build/dist ${INVOKEAI_SRC}/invokeai/frontend/web/dist

# build patchmatch
RUN cd /usr/lib/$(uname -p)-linux-gnu/pkgconfig/ && ln -sf opencv4.pc opencv.pc
RUN python3 -c "from patchmatch import patch_match"

# Create unprivileged user and make the local dir
RUN userdel $(getent passwd 1000 | cut -d: -f1) && useradd --create-home --shell /bin/bash -u 1000 -G video --comment "container local user" invoke
RUN mkdir -p ${INVOKEAI_ROOT} && chown -R invoke:invoke ${INVOKEAI_ROOT}

COPY docker/docker-entrypoint.sh ./
ENTRYPOINT ["/opt/invokeai/docker-entrypoint.sh"]
CMD ["invokeai-web", "--host", "0.0.0.0"]

docker-compose.yml

version: '3.8'

services:
  invokeai:
    build:
      context: ..
      dockerfile: docker/Dockerfile
    environment:
      HSA_OVERRIDE_GFX_VERSION: 11.0.0
    devices:
      - /dev/dri:/dev/dri
      - /dev/kfd:/dev/kfd
    ports:
      - 9090:9090/tcp
    volumes:
      - ./data:/invokeai
    command: ["invokeai-web", "--host", "0.0.0.0"]

Biggest difference is installing in the global env (not a venv) in the AMD ROCm Docker image. I suppose those have some extra customizations that fixed most of my issues.

Anyway, sharing here in the hope it'll prove useful for someone else.

It works! Thank you so much! I really can't thank you enough

tindolt commented 3 months ago

./invoke.sh: line 37: 26519 Segmentation fault (core dumped) invokeai-web $PARAMS Still seg fault on 4.2.4

bergentroll commented 3 months ago

Still seg fault on 4.2.4

Possibly it is ROCm + unsupported GPU. I also has segfaults on a couple of ROCm 5.6 + gfx803 (RTX 570). Also people say such models anyway give no real advantages over CPU generation.

For some models nevertheless there are custom kits users can try.

invoke-ai / InvokeAI