vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.56k stars 409 forks source link

SDXL causes segmentation fault #2252

Closed lavadrop closed 1 year ago

lavadrop commented 1 year ago

Issue Description

I followed the instructions to configure the webui for using SDXL and after putting the HuggingFace SD-XL files in the models directory. I restarted the server which was stuck for a while. I pressed enter on the terminal and asked me if I wanted to download the base model, I pressed n then the server restarted and finally it wrote: Segmentation fault (core dumped)

Version Platform Description

Version: 2023-09-20 Linux with rocm 5.7.0.50700-45~22.04 Firefox 117.0.1 no extensions AMD Radeon RX 7800 XT

20:56:32-039249 INFO     Starting SD.Next                                                                                                                                                                   
20:56:32-042026 INFO     Python 3.10.12 on Linux                                                                                                                                                            
20:56:32-047905 INFO     Version: app=sd.next updated=2023-09-20 hash=89ba8e3c url=https://github.com/vladmandic/automatic/tree/master                                                                      
20:56:32-504901 INFO     Platform: arch=x86_64 cpu=x86_64 system=Linux release=6.5.3-1-default python=3.10.12                                                                                               
20:56:32-506331 INFO     AMD ROCm toolkit detected                                                                                                                                                          
20:56:32-573223 INFO     Extensions: disabled=[]                                                                                                                                                            
20:56:32-574142 INFO     Extensions: enabled=['LDSR', 'Lora', 'ScuNET', 'SwinIR', 'a1111-sd-webui-lycoris', 'clip-interrogator-ext', 'multidiffusion-upscaler-for-automatic1111',                           
                         'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sd-webui-controlnet', 'stable-diffusion-webui-images-browser', 'stable-diffusion-webui-rembg'] extensions-builtin         
20:56:32-576341 INFO     Extensions: enabled=[] extensions                                                                                                                                                  
20:56:32-577330 INFO     No changes detected: Quick launch active                                                                                                                                           
20:56:32-578142 INFO     Verifying requirements                                                                                                                                                             
20:56:32-592619 INFO     Verifying packages                                                                                                                                                                 
20:56:32-594741 INFO     Extensions: disabled=[]                                                                                                                                                            
20:56:32-595574 INFO     Extensions: enabled=['LDSR', 'Lora', 'ScuNET', 'SwinIR', 'a1111-sd-webui-lycoris', 'clip-interrogator-ext', 'multidiffusion-upscaler-for-automatic1111',                           
                         'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sd-webui-controlnet', 'stable-diffusion-webui-images-browser', 'stable-diffusion-webui-rembg'] extensions-builtin         
20:56:32-597435 INFO     Extensions: enabled=[] extensions                                                                                                                                                  
20:56:32-601261 INFO     Extension preload: {'extensions-builtin': 0.0, 'extensions': 0.0}                                                                                                                  
20:56:32-602621 INFO     Command line args: []                                                                                                                                                              
20:56:36-298016 INFO     Engine: backend=Backend.DIFFUSERS compute=rocm mode=no_grad device=cuda                                                                                                            
20:56:36-300672 INFO     Device: device=AMD Radeon Graphics n=1 hip=5.4.22803-474e8620                                                                                                                      
20:56:36-613232 INFO     Available VAEs: models/VAE items=0                                                                                                                                                 
20:56:36-614513 INFO     Diffusers disabling uncompatible extensions: ['sd-webui-controlnet', 'multidiffusion-upscaler-for-automatic1111', 'a1111-sd-webui-lycoris']                                        
20:56:36-615708 INFO     Available models: models/Stable-diffusion items=0 time=0.00s                                                                                                                       
Download the default model? (y/N) n
20:56:48-041033 INFO     Extensions time: 1.29s { clip-interrogator-ext=0.37s Lora=0.13s sd-webui-agent-scheduler=0.22s stable-diffusion-webui-rembg=0.41s }                                                
20:56:51-151787 INFO     Loading UI theme: name=black-teal style=Auto                                                                                                                                       
20:56:51-322256 INFO     Themes: builtin=6 default=5 external=54                                                                                                                                            
20:56:51-941169 INFO     Local URL: http://127.0.0.1:7860/                                                                                                                                                  
20:56:51-941964 INFO     Initializing middleware                                                                                                                                                            
20:56:52-067331 INFO     [AgentScheduler] Task queue is empty                                                                                                                                               
20:56:52-068020 INFO     [AgentScheduler] Registering APIs                                                                                                                                                  
Segmentation fault (core dumped)

Relevant log output

dmesg -kuT

[Mon Sep 25 20:37:11 2023] show_signal_msg: 55 callbacks suppressed
[Mon Sep 25 20:37:11 2023] python3[15122]: segfault at 20 ip 00007f44bdeb40a7 sp 00007f417aa34010 error 4 in libamdhip64.so[7f44bde00000+3f3000] likely on CPU 10 (core 2, socket 0)
[Mon Sep 25 20:37:11 2023] Code: 8d 15 5d 6d 25 00 48 8d 3d f6 6c 25 00 be 32 00 00 00 e8 dc ed 1f 00 e8 c7 ed 1f 00 48 8b 45 b8 48 8b 50 28 4c 8b 24 da 31 c0 <41> 80 7c 24 20 00 74 11 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d
[Mon Sep 25 20:37:46 2023] python3[15718]: segfault at 20 ip 00007fab586b40a7 sp 00007fa8136fc010 error 4 in libamdhip64.so[7fab58600000+3f3000] likely on CPU 6 (core 6, socket 0)
[Mon Sep 25 20:37:46 2023] Code: 8d 15 5d 6d 25 00 48 8d 3d f6 6c 25 00 be 32 00 00 00 e8 dc ed 1f 00 e8 c7 ed 1f 00 48 8b 45 b8 48 8b 50 28 4c 8b 24 da 31 c0 <41> 80 7c 24 20 00 74 11 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d
[Mon Sep 25 20:47:55 2023] python3[16707]: segfault at 20 ip 00007fe5106b40a7 sp 00007fe1cb5fb010 error 4 in libamdhip64.so[7fe510600000+3f3000] likely on CPU 6 (core 6, socket 0)
[Mon Sep 25 20:47:55 2023] Code: 8d 15 5d 6d 25 00 48 8d 3d f6 6c 25 00 be 32 00 00 00 e8 dc ed 1f 00 e8 c7 ed 1f 00 48 8b 45 b8 48 8b 50 28 4c 8b 24 da 31 c0 <41> 80 7c 24 20 00 74 11 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d


### Backend

Diffusers

### Model

SD-XL

### Acknowledgements

- [X] I have read the above and searched for existing issues
- [X] I confirm that this is classified correctly and its not an extension issue
vladmandic commented 1 year ago

this error is crash deep inside libamdhip64.so which is part of AMD ROCm libraries, not much higher level application can do about that. i suggest to try reinstalling rocm.

also, make sure that your gpu is on supported list for specific version of rocm as different versions of rocm support different gpus.

and if needed, set environment variable HSA_OVERRIDE_GFX_VERSION to a correct value for your gpu. sdnext tries to set it, but amd is notoriously bad in detecting capabilities of its own gpus from inside rocm.

you may get more information if you try to start webui --debug, it will log a line like this:

log.debug(f'ROCm agent used by default: idx={idx} gpu={gpu} arch={arch}')

lavadrop commented 1 year ago

This is what I got from ./webui.sh --debug

17:31:57-130638 INFO     Starting SD.Next                                                                                                                                                                   
17:31:57-133540 INFO     Python 3.10.12 on Linux                                                                                                                                                            
17:31:57-140238 INFO     Version: app=sd.next updated=2023-09-20 hash=89ba8e3c url=https://github.com/vladmandic/automatic/tree/master                                                                      
17:31:57-657869 INFO     Platform: arch=x86_64 cpu=x86_64 system=Linux release=6.5.3-1-default python=3.10.12                                                                                               
17:31:57-659062 DEBUG    Setting environment tuning                                                                                                                                                         
17:31:57-660042 DEBUG    Torch overrides: cuda=False rocm=False ipex=False diml=False openvino=False                                                                                                        
17:31:57-660925 DEBUG    Torch allowed: cuda=True rocm=True ipex=True diml=True openvino=True                                                                                                               
17:31:57-661885 INFO     AMD ROCm toolkit detected                                                                                                                                                          
17:31:57-681716 DEBUG    ROCm agents detected: ['gfx1101']                                                                                                                                                  
17:31:57-682488 DEBUG    ROCm agent used by default: idx=0 gpu=gfx1101 arch=navi3x                                                                                                                          
17:31:57-722330 DEBUG    ROCm version detected: 5.7                                                                                                                                                         
17:31:57-756009 DEBUG    Repository update time: Wed Sep 20 06:39:56 2023                                                                                                                                   
17:31:57-756831 DEBUG    Previous setup time: Mon Sep 25 20:37:32 2023                                                                                                                                      
17:31:57-757512 INFO     Extensions: disabled=[]                                                                                                                                                            
17:31:57-758096 INFO     Extensions: enabled=['LDSR', 'Lora', 'ScuNET', 'SwinIR', 'a1111-sd-webui-lycoris', 'clip-interrogator-ext', 'multidiffusion-upscaler-for-automatic1111',                           
                         'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sd-webui-controlnet', 'stable-diffusion-webui-images-browser', 'stable-diffusion-webui-rembg'] extensions-builtin         
17:31:57-759447 INFO     Extensions: enabled=[] extensions                                                                                                                                                  
17:31:57-760015 DEBUG    Latest extensions time: Mon Sep 25 20:37:27 2023                                                                                                                                   
17:31:57-760604 DEBUG    Timestamps: version:1695213596 setup:1695695852 extension:1695695847                                                                                                               
17:31:57-761209 INFO     No changes detected: Quick launch active                                                                                                                                           
17:31:57-761743 INFO     Verifying requirements                                                                                                                                                             
17:31:57-771072 INFO     Verifying packages                                                                                                                                                                 
17:31:57-772465 INFO     Extensions: disabled=[]                                                                                                                                                            
17:31:57-773033 INFO     Extensions: enabled=['LDSR', 'Lora', 'ScuNET', 'SwinIR', 'a1111-sd-webui-lycoris', 'clip-interrogator-ext', 'multidiffusion-upscaler-for-automatic1111',                           
                         'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sd-webui-controlnet', 'stable-diffusion-webui-images-browser', 'stable-diffusion-webui-rembg'] extensions-builtin         
17:31:57-774238 INFO     Extensions: enabled=[] extensions                                                                                                                                                  
17:31:57-777279 INFO     Extension preload: {'extensions-builtin': 0.0, 'extensions': 0.0}                                                                                                                  
17:31:57-778106 DEBUG    Starting module: <module 'webui' from '/home/user/bin/vladSD/webui.py'>                                                                                                     
17:31:57-778912 INFO     Command line args: ['--debug'] debug=True                                                                                                                                          
17:32:01-451931 DEBUG    Loaded packages: torch=2.0.1+rocm5.4.2 diffusers=0.20.2 gradio=3.43.2                                                                                                              
17:32:01-661394 DEBUG    Reading: config.json len=14                                                                                                                                                        
17:32:01-662530 INFO     Engine: backend=Backend.DIFFUSERS compute=rocm mode=no_grad device=cuda                                                                                                            
17:32:01-663483 INFO     Device: device=AMD Radeon Graphics n=1 hip=5.4.22803-474e8620                                                                                                                      
17:32:01-899500 DEBUG    Entering start sequence                                                                                                                                                            
17:32:01-900580 DEBUG    Initializing                                                                                                                                                                       
17:32:01-901473 INFO     Available VAEs: models/VAE items=0                                                                                                                                                 
17:32:01-902162 INFO     Diffusers disabling uncompatible extensions: ['sd-webui-controlnet', 'multidiffusion-upscaler-for-automatic1111', 'a1111-sd-webui-lycoris']                                        
17:32:01-902969 DEBUG    Scanning diffusers cache: models/Diffusers models/Diffusers items=0 time=0.00s                                                                                                     
17:32:01-903624 INFO     Available models: models/Stable-diffusion items=0 time=0.00s                                                                                                                       
Download the default model? (y/N) n
17:32:04-382133 DEBUG    Loading extensions                                                                                                                                                                 
17:32:05-683755 INFO     Extensions time: 1.30s { clip-interrogator-ext=0.39s Lora=0.13s sd-webui-agent-scheduler=0.22s stable-diffusion-webui-rembg=0.42s }                                                
17:32:05-685696 DEBUG    FS walk error: [Errno 2] No such file or directory: '/home/user/bin/vladSD/models/RealESRGAN' /home/user/bin/vladSD/models/RealESRGAN                                
17:32:05-686765 DEBUG    Loaded upscalers: items=14                                                                                                                                                         
17:32:07-914723 INFO     Loading UI theme: name=black-teal style=Auto                                                                                                                                       
17:32:07-916130 DEBUG    Loaded styles: folder=models/styles items=0                                                                                                                                        
17:32:07-917759 DEBUG    Creating UI                                                                                                                                                                        
17:32:07-920207 DEBUG    Reading: ui-config.json len=0                                                                                                                                                      
17:32:07-939492 DEBUG    Extra networks: page='model' items=0 subdirs=1 tab=txt2img dirs=['models/Stable-diffusion', 'models/Diffusers', '/home/user/bin/vladSD/models/Stable-diffusion'] time=0.0   
17:32:07-941337 DEBUG    Extra networks: page='style' items=0 subdirs=0 tab=txt2img dirs=['models/styles'] time=0.0                                                                                         
17:32:07-942848 DEBUG    Extra networks: page='embedding' items=0 subdirs=0 tab=txt2img dirs=['models/embeddings'] time=0.0                                                                                 
17:32:07-944569 DEBUG    Extra networks: page='hypernetwork' items=0 subdirs=0 tab=txt2img dirs=['models/hypernetworks'] time=0.0                                                                           
17:32:07-946445 DEBUG    Extra networks: page='lora' items=0 subdirs=0 tab=txt2img dirs=['models/Lora'] time=0.0                                                                                            
17:32:08-056282 DEBUG    Reading: ui-config.json len=0                                                                                                                                                      
17:32:08-080403 INFO     Themes: builtin=6 default=5 external=54                                                                                                                                            
17:32:08-319416 DEBUG    Script: 0.18s ui_tabs /home/user/bin/vladSD/extensions-builtin/stable-diffusion-webui-images-browser/scripts/image_browser.py                                               
17:32:08-321328 DEBUG    Extensions list failed to load: /home/user/bin/vladSD/html/extensions.json                                                                                                  
17:32:08-371907 DEBUG    Extension list refresh: processed=12 installed=12 enabled=9 disabled=3 visible=12 hidden=0                                                                                         
17:32:08-697943 INFO     Local URL: http://127.0.0.1:7860/                                                                                                                                                  
17:32:08-698829 DEBUG    Gradio registered functions: 1442                                                                                                                                                  
17:32:08-699707 INFO     Initializing middleware                                                                                                                                                            
17:32:08-702038 DEBUG    Creating API                                                                                                                                                                       
17:32:08-820313 INFO     [AgentScheduler] Task queue is empty                                                                                                                                               
17:32:08-821013 INFO     [AgentScheduler] Registering APIs                                                                                                                                                  
17:32:08-891308 DEBUG    Scripts setup: ['X/Y/Z Grid:0.005s']                                                                                                                                               
17:32:08-892017 DEBUG    Model metadata: metadata.json no changes                                                                                                                                           
Segmentation fault (core dumped)

The latest rocm version is the one I installed and it explicitly lists compatibility with my GPU. From what I can tell from the log, rocm detects my GPU as gfx1101 which is correct (OpenGL renderer string: AMD Radeon Graphics (gfx1101, LLVM 16.0.6, DRM 3.54, 6.5.3-1-default))

vladmandic commented 1 year ago

one possible issue is that installed rocm drivers and torch that gets installed by default are too wide apart.

try forcing latest torch-rocm-5.6 manually by setting environment variable

TORCH_COMMAND="torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6"

and either delete venv or force reinstall using --reinstall flag.

lavadrop commented 1 year ago

Thanks. It appears torch-rocm 5.6 only added support for Radeon PRO W7900 and RX 7900 XTX which are Navi 31 gfx1100. I'd better wait.