[bug]: Image Generation Never Starts

invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.

https://invoke-ai.github.io/InvokeAI/

Apache License 2.0

22.87k stars 2.37k forks source link

[bug]: Image Generation Never Starts #4364

Open tokenwizard opened 1 year ago

tokenwizard commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

OS

Linux

GPU

amd

VRAM

8GB

What version did you experience this issue on?

3.0.2post1

What happened?

Using the Manual install and the Automated Install gives the same results. The installation is successful and the WebUI start up and is accessible. I can see the models I downloaded as options. But when I type "banana sushi" and click Invoke, the button starts scrolling like it is working, but it stays here at this point indefinitely. I have left it for over an hour and the status still shows "Generating" but never shows even the first pass/sample of the image.

Below is the console output from the time I started the WebUI to the point where it hangs indefinitely.

Generate images with a browser-based interface
[2023-08-25 20:12:26,926]::[InvokeAI]::INFO --> Patchmatch initialized
/root/invokeai/.venv/lib/python3.10/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
[2023-08-25 20:12:28,697]::[uvicorn.error]::INFO --> Started server process [3918]
[2023-08-25 20:12:28,698]::[uvicorn.error]::INFO --> Waiting for application startup.
[2023-08-25 20:12:28,699]::[InvokeAI]::INFO --> InvokeAI version 3.0.2post1
[2023-08-25 20:12:28,699]::[InvokeAI]::INFO --> Root directory = /root/invokeai
[2023-08-25 20:12:28,706]::[InvokeAI]::INFO --> GPU device = cuda AMD Radeon RX 5700 XT
[2023-08-25 20:12:28,736]::[InvokeAI]::INFO --> Scanning /root/invokeai/models for new models
[2023-08-25 20:12:29,361]::[InvokeAI]::INFO --> Scanned 6 files and directories, imported 0 models
[2023-08-25 20:12:29,389]::[InvokeAI]::INFO --> Model manager service initialized
[2023-08-25 20:12:29,678]::[uvicorn.error]::INFO --> Application startup complete.
[2023-08-25 20:12:29,679]::[uvicorn.error]::INFO --> Uvicorn running on http://192.168.12.21:9090 (Press CTRL+C to quit)
[2023-08-25 20:12:31,246]::[uvicorn.access]::INFO --> 192.168.12.167:58756 - "GET /socket.io/?EIO=4&transport=polling&t=OekKx2P HTTP/1.1" 200
[2023-08-25 20:12:31,261]::[uvicorn.access]::INFO --> 192.168.12.167:58756 - "POST /socket.io/?EIO=4&transport=polling&t=OekKx2a&sid=a9GXgm7q-8ToYsSQAAAA HTTP/1.1" 200
[2023-08-25 20:12:31,267]::[uvicorn.error]::INFO --> ('192.168.12.167', 58760) - "WebSocket /socket.io/?EIO=4&transport=websocket&sid=a9GXgm7q-8ToYsSQAAAA" [accepted]
[2023-08-25 20:12:31,269]::[uvicorn.error]::INFO --> connection open
[2023-08-25 20:12:31,270]::[uvicorn.access]::INFO --> 192.168.12.167:58762 - "GET /socket.io/?EIO=4&transport=polling&t=OekKx2b&sid=a9GXgm7q-8ToYsSQAAAA HTTP/1.1" 200
[2023-08-25 20:12:31,275]::[uvicorn.access]::INFO --> 192.168.12.167:58756 - "GET /socket.io/?EIO=4&transport=polling&t=OekKx2w&sid=a9GXgm7q-8ToYsSQAAAA HTTP/1.1" 200
[2023-08-25 20:12:31,286]::[uvicorn.access]::INFO --> 192.168.12.167:58756 - "POST /socket.io/?EIO=4&transport=polling&t=OekKx35&sid=a9GXgm7q-8ToYsSQAAAA HTTP/1.1" 200
[2023-08-25 20:12:31,346]::[uvicorn.access]::INFO --> 192.168.12.167:58756 - "GET /socket.io/?EIO=4&transport=polling&t=OekKx3r&sid=a9GXgm7q-8ToYsSQAAAA HTTP/1.1" 200
[2023-08-25 20:12:31,359]::[uvicorn.access]::INFO --> 192.168.12.167:58762 - "GET /api/v1/models/?base_models=sd-1&base_models=sd-2&base_models=sdxl&model_type=main HTTP/1.1" 200
[2023-08-25 20:12:31,365]::[uvicorn.access]::INFO --> 192.168.12.167:58756 - "GET /api/v1/models/?model_type=vae HTTP/1.1" 200
[2023-08-25 20:12:31,367]::[uvicorn.access]::INFO --> 192.168.12.167:58762 - "GET /api/v1/app/version HTTP/1.1" 200
[2023-08-25 20:12:31,373]::[uvicorn.access]::INFO --> 192.168.12.167:58778 - "GET /api/v1/models/?model_type=lora HTTP/1.1" 200
[2023-08-25 20:12:31,381]::[uvicorn.access]::INFO --> 192.168.12.167:58784 - "GET /api/v1/models/?model_type=controlnet HTTP/1.1" 200
[2023-08-25 20:12:31,385]::[uvicorn.access]::INFO --> 192.168.12.167:58794 - "GET /api/v1/models/?model_type=embedding HTTP/1.1" 200
[2023-08-25 20:12:37,517]::[InvokeAI]::INFO --> NSFW checker initialized
[2023-08-25 20:12:37,518]::[uvicorn.access]::INFO --> 192.168.12.167:58808 - "GET /api/v1/app/config HTTP/1.1" 200
[2023-08-25 20:12:37,810]::[uvicorn.access]::INFO --> 192.168.12.167:58762 - "POST /api/v1/sessions/ HTTP/1.1" 200
[2023-08-25 20:12:38,017]::[uvicorn.access]::INFO --> 192.168.12.167:58762 - "PUT /api/v1/sessions/6bcd07f1-fbb5-4cdd-b783-50c08c038994/invoke?all=true HTTP/1.1" 202
[2023-08-25 20:12:38,416]::[InvokeAI]::INFO --> Loading model /root/invokeai/models/sd-1/main/Analog-Diffusion, type sd-1:main:tokenizer
[2023-08-25 20:12:39,295]::[InvokeAI]::INFO --> Loading model /root/invokeai/models/sd-1/main/Analog-Diffusion, type sd-1:main:text_encoder

This is where it just stays indefinitely while the WebUI looks like it should be Generating.

Screenshots

Additional context

No response

Contact Details

No response

Millu commented 1 year ago

Could you try selecting a different model and running? Looks like it never actually starts generating, and is only loading the model

ryuichi983 commented 1 year ago

me too, never starts whatever i change a different model and different data. Please help.

psychedelicious commented 1 year ago

Please check the browser's JS console for errors or warnings. If there are any, please copy them here:

Open your browser's JavaScript console
Find the red error message at the bottom of the console's output
Right click the Object -> Copy Object
Paste the object here

puresick commented 1 year ago

I am experiencing the same issue with a similar setup. The browser console (both Firefox and Chromium) do no throw any error. The output looks like the following:

Firefox:

Chromium:

psychedelicious commented 1 year ago

@puresick what operating system, gpu and python version do you have?

tokenwizard commented 1 year ago

Browser Console does not show any errors. The Console just stops here with these messages. The InvokeAI interface still looks like it is "Generating" but never even gives a preview of the first step. I upgraded from 3.0.2post1 to 3.1.0 and the problem persists.

I confirmed in the Bash console that it is detecting my GPU and there are no obvious errors there either.

Here is the JS Console output:

System Info is below:

psychedelicious commented 1 year ago

Maybe it's nothing, but I'm suspicious of how both you @puresick and @tokenwizard are on AMD. We may be doing something wrong somehow.

Can you please try forcing generation to CPU? If you run the configure script, it should have an option to force CPU. Then try generating. It'll be slow as hell but if it works, we have a clue.

Also can you please activate the venv (easy way is run the script to start the app and choose developer console) and run python --version

tokenwizard commented 1 year ago

I will say I ran this on my AMD CPU desktop in CPU mode and it worked. Trying to run it now in a Proxmox LXC with the AMD GPU pass through.

I'll try these suggestions tomorrow.

Sent from Proton Mail mobile

-------- Original Message -------- On Sep 5, 2023, 7:31 PM, psychedelicious wrote:

Maybe it's nothing, but I'm suspicious of how both you @.(https://github.com/puresick) and @.(https://github.com/tokenwizard) are on AMD. We may be doing something wrong somehow.

Can you please try forcing CPU? If you run the configure script, it should have an option to force CPU. Then try generating. It'll be slow as hell but if it works, we have a clue.

Also can you please activate the venv (easy way is run the script to start the app and choose developer console) and run python --version

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

psychedelicious commented 1 year ago

Ah, yeah I meant both of you having an AMD GPU caught my eye. If CPU works for you, and GPU doesn't, we may have some issues related to AMD GPUs.

puresick commented 1 year ago

@psychedelicious Here are my system specs:

OS: Arch Linux Kernel: 6.4.12-arch1-1 CPU: AMD Ryzen 7 3700X GPU: AMD ATI Radeon RX 5500 XT 8GB VRAM Python: 3.11.5

Generating images on my CPU runs fine.

lstein commented 1 year ago

Hi!

Let’s see if it is an InvokeAI problem or an issue with the upstream diffusers library on AMD.

Using a text editor, please enter the following script:

from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_safetensors=True)
pipeline.to("cuda")
image = pipeline("An image of a squirrel in Picasso style").images[0]
image.save("image_of_squirrel_painting.png")

(This is the “getting started” script for Diffusers from Hugging Face: https://huggingface.co/docs/diffusers/quicktour)

Save the script as test_diffusers.py
Activate the InvokeAI virtual environment, either by starting the launcher script and selecting the “developer’s console” option, or by giving the command source ~/invokeai/.venv/bin/activate (where ~/invokeai is the location of your InvokeAI directory).
Run the script with python test_diffusers.py
The script may re-download stable-diffusion-v1.5 (sorry, but it’s more foolproof) and then start generating. Generation should be fast - no more than 10s.
If all goes well, it will leave you with a PNG named image_of_a_squirrel_painting.png.

If this works, then the bug is in InvokeAI. If not, then there is a problem with some library, such as pytorch, ROCM, or diffusers itself.

tokenwizard commented 1 year ago

This is what I get when I try to run that script from the Developer Console:

psychedelicious commented 1 year ago

@tokenwizard That's... weird...

Let's grab some additional diagnostic data:

Run the invoke script like you did before
Select developer console
Run pip list, copy and paste the output here for us

Next, while still in the dev console, we will try running the test code differently:

Run python to enter the python REPL
Copy, paste and run each line of the test script individually
After it's finished (successfully or not), copy everything from when you ran python onwards and paste here
Quit the REPL by running quit() at the >>> prompt

tokenwizard commented 1 year ago

Here is the output of pip list:

(InvokeAI) root@AI-Server ~/invokeai> pip list
Package                 Version
----------------------- ----------------
absl-py                 1.4.0
accelerate              0.21.0
addict                  2.4.0
aiohttp                 3.8.5
aiosignal               1.3.1
albumentations          1.3.1
antlr4-python3-runtime  4.9.3
anyio                   3.7.1
async-timeout           4.0.3
attrs                   23.1.0
basicsr                 1.4.2
bidict                  0.22.1
boltons                 23.0.0
cachetools              5.3.1
certifi                 2023.7.22
cffi                    1.15.1
charset-normalizer      3.2.0
click                   8.1.7
clip-anytorch           2.5.2
cmake                   3.27.2
coloredlogs             15.0.1
compel                  2.0.2
contourpy               1.1.0
controlnet-aux          0.0.6
cycler                  0.11.0
datasets                2.14.4
diffusers               0.20.2
dill                    0.3.7
dnspython               2.4.2
dynamicprompts          0.29.0
easing-functions        1.0.4
einops                  0.6.1
eventlet                0.33.3
exceptiongroup          1.1.3
facexlib                0.3.0
fastapi                 0.88.0
fastapi-events          0.8.0
fastapi-socketio        0.0.10
filelock                3.12.2
filterpy                1.4.5
Flask                   2.1.3
Flask-Cors              3.0.10
Flask-SocketIO          5.3.0
flaskwebgui             1.0.3
flatbuffers             23.5.26
fonttools               4.42.1
frozenlist              1.4.0
fsspec                  2023.6.0
ftfy                    6.1.1
future                  0.18.3
gfpgan                  1.3.8
google-auth             2.22.0
google-auth-oauthlib    1.0.0
greenlet                2.0.2
grpcio                  1.57.0
h11                     0.14.0
httptools               0.6.0
huggingface-hub         0.16.4
humanfriendly           10.0
idna                    3.4
imageio                 2.31.1
importlib-metadata      6.8.0
invisible-watermark     0.2.0
InvokeAI                3.1.0
itsdangerous            2.1.2
Jinja2                  3.1.2
joblib                  1.3.2
kiwisolver              1.4.5
lazy_loader             0.3
lightning-utilities     0.9.0
lit                     16.0.6
llvmlite                0.40.1
lmdb                    1.4.1
Markdown                3.4.4
markdown-it-py          3.0.0
MarkupSafe              2.1.3
matplotlib              3.7.2
mdurl                   0.1.2
mediapipe               0.10.3
mpmath                  1.3.0
multidict               6.0.4
multiprocess            0.70.15
networkx                3.1
npyscreen               4.10.5
numba                   0.57.1
numpy                   1.24.4
oauthlib                3.2.2
omegaconf               2.3.0
onnx                    1.14.0
onnxruntime             1.15.1
opencv-contrib-python   4.8.0.76
opencv-python           4.8.0.76
opencv-python-headless  4.8.0.76
packaging               23.1
pandas                  2.0.3
picklescan              0.0.11
Pillow                  10.0.0
pip                     22.0.2
platformdirs            3.10.0
prompt-toolkit          3.0.39
protobuf                3.20.3
psutil                  5.9.4
pyarrow                 13.0.0
pyasn1                  0.5.0
pyasn1-modules          0.3.0
pycparser               2.21
pydantic                1.10.12
Pygments                2.16.1
Pympler                 1.0.1
pyparsing               3.0.9
PyPatchMatch            1.0.1
pyperclip               1.8.2
pyreadline3             3.4.1
python-dateutil         2.8.2
python-dotenv           1.0.0
python-engineio         4.6.1
python-multipart        0.0.6
python-socketio         5.8.0
pytorch-lightning       2.0.7
pytorch-triton-rocm     2.0.2
pytz                    2023.3
PyWavelets              1.4.1
PyYAML                  6.0.1
qudida                  0.0.4
realesrgan              0.3.0
regex                   2023.8.8
requests                2.28.2
requests-oauthlib       1.3.1
rich                    13.5.2
rsa                     4.9
safetensors             0.3.1
scikit-image            0.21.0
scikit-learn            1.3.0
scipy                   1.11.2
Send2Trash              1.8.2
setuptools              59.6.0
six                     1.16.0
sniffio                 1.3.0
sounddevice             0.4.6
starlette               0.22.0
sympy                   1.12
tb-nightly              2.15.0a20230825
tensorboard             2.14.0
tensorboard-data-server 0.7.1
test-tube               0.7.5
threadpoolctl           3.2.0
tifffile                2023.8.12
timm                    0.6.13
tokenizers              0.13.3
tomli                   2.0.1
torch                   2.0.1+rocm5.4.2
torchmetrics            0.11.4
torchsde                0.2.5
torchvision             0.15.2+rocm5.4.2
tqdm                    4.66.1
trampoline              0.1.2
transformers            4.31.0
typing_extensions       4.7.1
tzdata                  2023.3
urllib3                 1.26.16
uvicorn                 0.21.1
uvloop                  0.17.0
watchfiles              0.20.0
wcwidth                 0.2.6
websockets              11.0.3
Werkzeug                2.3.7
wheel                   0.37.1
xxhash                  3.3.0
yapf                    0.40.1
yarl                    1.9.2
zipp                    3.16.2

I just realized when I copied the script content I somehow missing the first line that imports the diffusers module. I updated the script and it seems to be running now. I assume it is downloading SD 1.5 which will take some time on my current connection.

tokenwizard commented 1 year ago

So it has been "stuck" here for the past half hour. Should I see some sort of output when it completes?

I manually exited with CTRL-Z and I do not see the expected output file.

tokenwizard commented 1 year ago

Well, I let it sit there for over two hours and it never seems to complete. I do see that the CPU usage on the server spikes for the entire time I leave it running, but it never generates the image.

tokenwizard commented 1 year ago

I should also add that before I transferred the AMD GPU into my server, I had it in my desktop and Automatic1111 was running on the GPU just fine.

lstein commented 1 year ago

So it has been "stuck" here for the past half hour. Should I see some sort of output when it completes?

I manually exited with CTRL-Z and I do not see the expected output file.

It shouldn't get stuck like that.

I think this is pointing to a problem with either torch or with ROCm driver that is installed on your machine. Could you try running the following command from your terminal and pasting a screenshot of the result?

rocm-smi

There's some additional ROCm debugging tips here: https://invoke-ai.github.io/InvokeAI/installation/030_INSTALL_CUDA_AND_ROCM

tokenwizard commented 1 year ago

Hmmm, In Ubuntu 22.04 it seems I don't have rocm-smi. I do have the rocm-utils package installed. I have the rocminfo tool and the output is below. I will check the link you provided for additional troubleshooting.

The Agent 3 section shows the GPU details.

root@AI-Server:~/invokeai# rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                                                       
  Uuid:                    CPU-XX                             
  Marketing Name:                                             
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3300                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32911484(0x1f6307c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32911484(0x1f6307c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32911484(0x1f6307c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                                                       
  Uuid:                    CPU-XX                             
  Marketing Name:                                             
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3300                               
  BDFID:                   0                                  
  Internal Node ID:        1                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    49500392(0x2f350e8) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    49500392(0x2f350e8) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    49500392(0x2f350e8) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 3                  
*******                  
  Name:                    gfx1010                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 5700 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      4096(0x1000) KB                    
  Chip ID:                 29471(0x731f)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2100                               
  BDFID:                   17408                              
  Internal Node ID:        2                                  
  Compute Unit:            40                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    1280(0x500)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1010:xnack-  
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

puresick commented 1 year ago

Hi!

Let’s see if it is an InvokeAI problem or an issue with the upstream diffusers library on AMD.

1. Using a text editor, please enter the following script:

from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_safetensors=True)
pipeline.to("cuda")
image = pipeline("An image of a squirrel in Picasso style").images[0]
image.save("image_of_squirrel_painting.png")

(This is the “getting started” script for Diffusers from Hugging Face: https://huggingface.co/docs/diffusers/quicktour)

2. Save the script as `test_diffusers.py`

3. Activate the InvokeAI virtual environment, either by starting the launcher script and selecting the “developer’s console” option, or by giving the command `source ~/invokeai/.venv/bin/activate` (where `~/invokeai` is the location of your InvokeAI directory).

4. Run the script with `python test_diffusers.py`

5. The script may re-download `stable-diffusion-v1.5` (sorry, but it’s more foolproof) and then start generating. Generation should be fast - no more than 10s.

6. If all goes well, it will leave you with a PNG named `image_of_a_squirrel_painting.png`.

If this works, then the bug is in InvokeAI. If not, then there is a problem with some library, such as pytorch, ROCM, or diffusers itself.

@lstein I tried your python test script above running it inside of the Invoke AI developer console. It becomes stuck with the following screen:

`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["bos_token_id"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["eos_token_id"]` will be overriden.
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 11.07it/s]
/pytorch/aten/src/ATen/native/hip/Indexing.hip:1148: iiiiiiiieeeeeeeeIndex: Device-side assertion `ssssssss        llllllllize' failed.
/pytorch/aten/src/ATen/native/hip/Indexing.hip:1148: iiiiiiiieeeeeeeeIndex: Device-side assertion `ssssssss        llllllllize' failed.
/pytorch/aten/src/ATen/native/hip/Indexing.hip:1148: iiiiiiiieeeeeeeeIndex: Device-side assertion `ssssssss        llllllllize' failed.
/pytorch/aten/src/ATen/native/hip/Indexing.hip:1148: iiiiiiiieeeeeeeeIndex: Device-side assertion `ssssssss        llllllllize' failed.
/pytorch/aten/src/ATen/native/hip/Indexing.hip:1148: iiiiiiiieeeeeeeeIndex: Device-side assertion `ssssssss        llllllllize' failed.

Downloading the model works fine and I assume loading the model at least into the GPUs VRAM also seems to work. With radeontop (https://github.com/clbr/radeontop) I observed the GPU usage hitting 100% and the VRAM usage increasing until it stops at ~ 5 GB.

rocm-smi outputs the following:

========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU  Temp (DieEdge)  AvgPwr  SCLK  MCLK    Fan  Perf  PwrCap  VRAM%  GPU%
0    49.0c           4.0W    0Mhz  100Mhz  0%   auto  135.0W   19%   4%
====================================================================================
=============================== End of ROCm SMI Log ================================

I should also add that I need to set the environment variable HSA_OVERRIDE_GFX_VERSION=10.3.0 before executing/running InvokeAI or the test script to not end in a SEGFAULT. As far as I know that is needed in general if you use this generation of AMD GPUs to get them working at all with ROCm.

puresick commented 11 months ago

Invoke 3.2.0 still has the same issue.

psychedelicious commented 11 months ago

@puresick Sorry for the late follow-up. The test script has no invokeai code in it - it's from the diffusers library: https://github.com/huggingface/diffusers

diffusers is the library that powers stable diffusion generation for invokeai.

I think this issue needs to be raised with them. I'm not sure if anybody on the invokeai team has access to an amd gpu to test.

Would you mind raising an issue with diffusers? Please link back to this issue. Thanks.